1
|
Song Y, Liu Z, Li G, Xie J, Wu Q, Zeng D, Xu L, Zhang T, Wang J. EMS: A Large-Scale Eye Movement Dataset, Benchmark, and New Model for Schizophrenia Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9451-9462. [PMID: 39178070 DOI: 10.1109/tnnls.2024.3441928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
Schizophrenia (SZ) is a common and disabling mental illness, and most patients encounter cognitive deficits. The eye-tracking technology has been increasingly used to characterize cognitive deficits for its reasonable time and economic costs. However, there is no large-scale and publicly available eye movement dataset and benchmark for SZ recognition. To address these issues, we release a large-scale Eye Movement dataset for SZ recognition (EMS), which consists of eye movement data from 104 schizophrenics and 104 healthy controls (HCs) based on the free-viewing paradigm with 100 stimuli. We also conduct the first comprehensive benchmark, which has been absent for a long time in this field, to compare the related 13 psychosis recognition methods using six metrics. Besides, we propose a novel mean-shift-based network (MSNet) for eye movement-based SZ recognition, which elaborately combines the mean shift algorithm with convolution to extract the cluster center as the subject feature. In MSNet, first, a stimulus feature branch (SFB) is adopted to enhance each stimulus feature with similar information from all stimulus features, and then, the cluster center branch (CCB) is utilized to generate the cluster center as subject feature and update it by the mean shift vector. The performance of our MSNet is superior to prior contenders, thus, it can act as a powerful baseline to advance subsequent study. To pave the road in this research field, the EMS dataset, the benchmark results, and the code of MSNet are publicly available at https://github.com/YingjieSong1/EMS.
Collapse
|
2
|
Chen Y, Xiao Z, Pan Y, Zhao L, Dai H, Wu Z, Li C, Zhang T, Li C, Zhu D, Liu T, Jiang X. Mask-Guided Vision Transformer for Few-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9636-9647. [PMID: 38976473 DOI: 10.1109/tnnls.2024.3418527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Learning with little data is challenging but often inevitable in various application scenarios where the labeled data are limited and costly. Recently, few-shot learning (FSL) gained increasing attention because of its generalizability of prior knowledge to new tasks that contain only a few samples. However, for data-intensive models such as vision transformer (ViT), current fine-tuning-based FSL approaches are inefficient in knowledge generalization and, thus, degenerate the downstream task performances. In this article, we propose a novel mask-guided ViT (MG-ViT) to achieve an effective and efficient FSL on the ViT model. The key idea is to apply a mask on image patches to screen out the task-irrelevant ones and to guide the ViT focusing on task-relevant and discriminative patches during FSL. Particularly, MG-ViT only introduces an additional mask operation and a residual connection, enabling the inheritance of parameters from pretrained ViT without any other cost. To optimally select representative few-shot samples, we also include an active learning-based sample selection method to further improve the generalizability of MG-ViT-based FSL. We evaluate the proposed MG-ViT on classification, object detection, and segmentation tasks using gradient-weighted class activation mapping (Grad-CAM) to generate masks. The experimental results show that the MG-ViT model significantly improves the performance and efficiency compared with general fine-tuning-based ViT and ResNet models, providing novel insights and a concrete approach toward generalizing data-intensive and large-scale deep learning models for FSL.
Collapse
|
3
|
Zhuge Y, Gu H, Zhang L, Qi J, Lu H. Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9084-9097. [PMID: 38976474 DOI: 10.1109/tnnls.2024.3418980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious interframe interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features, aiming to generate increasingly precise segmentation masks. As a result, MTNet provides a strong and compact framework that explores both temporal and cross-modality knowledge to robustly localize and track the primary object accurately in various challenging scenarios efficiently. Extensive experiments across diverse benchmarks conclusively show that our method not only attains state-of-the-art performance in UVOS but also delivers competitive results in video salient object detection (VSOD). These findings highlight the method's robust versatility and its adeptness in adapting to a range of segmentation tasks. The source code is available at https://github.com/hy0523/MTNet.
Collapse
|
4
|
Qian Y, Xiao Z, Deng Z. Fine-grained crop pest classification based on multi-scale feature fusion and mixed attention mechanisms. FRONTIERS IN PLANT SCIENCE 2025; 16:1500571. [PMID: 40247936 PMCID: PMC12003288 DOI: 10.3389/fpls.2025.1500571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Accepted: 03/06/2025] [Indexed: 04/19/2025]
Abstract
Pests are a major cause of crop loss globally, and accurate pest identification is crucial for effective prevention and control strategies. This paper proposes a novel deep-learning architecture for crop pest classification, addressing the limitations of existing methods that struggle with distinguishing the fine details of pests and background interference. The proposed model is designed to balance fine-grained feature extraction with deep semantic understanding, utilizing a parallel structure composed of two main components: the Feature Fusion Module (FFM) and the Mixed Attention Module (MAM). FFM focuses on extracting key fine-grained features and fusing them across multiple scales, while MAM leverages an attention mechanism to model long-range dependencies within the channel domain, further enhancing feature representation. Additionally, a Transformer block is integrated to overcome the limitations of traditional convolutional approaches in capturing global contextual information. The proposed architecture is evaluated on three benchmark datasets-IP102, D0, and Li-demonstrating its superior performance over state-of-the-art methods. The model achieves accuracies of 75.74% on IP102, 99.82% on D0, and 98.77% on Li, highlighting its robustness and effectiveness in complex crop pest recognition tasks. These results indicate that the proposed method excels in multi-scale feature fusion and long-range dependency modeling, offering a new competitive approach to pest classification in agricultural settings.
Collapse
Affiliation(s)
| | - Zhiyong Xiao
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | | |
Collapse
|
5
|
Bai X, Yang M, Chen B, Zhou F. REMI: Few-Shot ISAR Target Classification Via Robust Embedding and Manifold Inference. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6000-6013. [PMID: 38683708 DOI: 10.1109/tnnls.2024.3391330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Unknown image deformation and few-shot issues have posed significant challenges to inverse synthetic aperture radar (ISAR) target classification. To achieve robust feature representation and precise correlation modeling, this article proposes a novel two-stage few-shot ISAR classification network, dubbed as robust embedding and manifold inference (REMI). In the robust embedding stage, a multihead spatial transformation network (MH-STN) is designed to adjust unknown image deformations from multiple perspectives. Then, the grouped embedding network (GEN) integrates and compresses diverse information by grouped feature extraction, intermediate feature fusion, and global feature embedding. In the manifold inference stage, a masked Gaussian graph attention network (MG-GAT) is devised to capture the irregular manifold of samples in the embedding space. In particular, the node features are described by Gaussian distributions, with interactions guided by the masked attention mechanism. Experimental results on two ISAR datasets demonstrate that REMI significantly improves the performance of few-shot classification and exhibits robustness in various scenarios.
Collapse
|
6
|
Ho QH, Nguyen TNQ, Tran TT, Pham VT. LiteMamba-Bound: A lightweight Mamba-based model with boundary-aware and normalized active contour loss for skin lesion segmentation. Methods 2025; 235:10-25. [PMID: 39864606 DOI: 10.1016/j.ymeth.2025.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 01/05/2025] [Accepted: 01/13/2025] [Indexed: 01/28/2025] Open
Abstract
In the field of medical science, skin segmentation has gained significant importance, particularly in dermatology and skin cancer research. This domain demands high precision in distinguishing critical regions (such as lesions or moles) from healthy skin in medical images. With growing technological advancements, deep learning models have emerged as indispensable tools in addressing these challenges. One of the state-of-the-art modules revealed in recent years, the 2D Selective Scan (SS2D), based on state-space models that have already seen great success in natural language processing, has been increasingly adopted and is gradually replacing Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Leveraging the strength of this module, this paper introduces LiteMamba-Bound, a lightweight model with approximately 957K parameters, designed for skin image segmentation tasks. Notably, the Channel Attention Dual Mamba (CAD-Mamba) block is proposed within both the encoder and decoder alongside the Mix Convolution with Simple Attention bottleneck block to emphasize key features. Additionally, we propose the Reverse Attention Boundary Module to highlight challenging boundary features. Also, the Normalized Active Contour loss function presented in this paper significantly improves the model's performance compared to other loss functions. To validate performance, we conducted tests on two skin image datasets, ISIC2018 and PH2, with results consistently showing superior performance compared to other models. Our code will be made publicly available at: https://github.com/kwanghwi242/A-new-segmentation-model.
Collapse
Affiliation(s)
- Quang-Huy Ho
- School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Viet Nam
| | - Thi-Nhu-Quynh Nguyen
- School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Viet Nam
| | - Thi-Thao Tran
- School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Viet Nam
| | - Van-Truong Pham
- School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Viet Nam.
| |
Collapse
|
7
|
Shi Y, Sun M, Wang Y, Ma J, Chen Z. EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention. IEEE TRANSACTIONS ON CYBERNETICS 2025; 55:1288-1300. [PMID: 40031751 DOI: 10.1109/tcyb.2025.3532282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Owing to advancements in deep learning technology, vision transformers (ViTs) have demonstrated impressive performance in various computer vision tasks. Nonetheless, ViTs still face some challenges, such as high computational complexity and the absence of desirable inductive biases. To alleviate these issues, the potential advantages of combining eagle vision with ViTs are explored. A bi-fovea visual interaction (BFVI) structure inspired by the unique physiological and visual characteristics of eagle eyes is introduced. Based on this structural design approach, a novel bi-fovea self-attention (BFSA) mechanism and bi-fovea feedforward network (BFFN) are proposed. These components are employed to mimic the hierarchical and parallel information processing scheme of the biological visual cortex, thereby enabling networks to learn the feature representations of the targets in a coarse-to-fine manner. Furthermore, a bionic eagle vision (BEV) block is designed as the basic building unit based on the BFSA mechanism and the BFFN. By stacking the BEV blocks, a unified and efficient family of pyramid backbone networks called eagle ViTs (EViTs) is developed. Experimental results indicate that the EViTs exhibit highly competitive performance in various computer vision tasks, demonstrating their potential as backbone networks. In terms of computational efficiency and scalability, EViTs show significant advantages compared with other counterparts. The developed code is available at https://github.com/nkusyl/EViT.
Collapse
|
8
|
Liang P, Jiang J, Liu X, Ma J. Image Deblurring by Exploring In-Depth Properties of Transformer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4652-4663. [PMID: 38381646 DOI: 10.1109/tnnls.2024.3359810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we introduce the pretrained transformers to address this problem. In particular, we leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. The pretrained transformer can capture the global topological relations (i.e., self-similarity) of image, and we observe that the captured topological relationships about the sharp image will change when blur occurs. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information, which is critical in measuring the sharpness of the deblurred image. On the basis of the advantages, we present two types of novel perceptual losses to guide image deblurring. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space. The other type considers the features extracted from an image as a distribution and compares the distribution discrepancy between recovered image and target one. We demonstrate the effectiveness of transformer properties in improving the perceptual quality while not sacrificing the quantitative scores peak signal-to-noise ratio (PSNR) over the most competitive models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and motion deblurring tasks. The code is available at https://github. com/erfect2020/TransformerPerceptualLoss.
Collapse
|
9
|
Xie T, Dai K, Jiang Z, Li R, Mao S, Wang K, Zhao L. ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3027-3041. [PMID: 38127606 DOI: 10.1109/tnnls.2023.3342141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
In this work, we seek to learn multiple mainstream vision tasks concurrently using a unified network, which is storage-efficient as numerous networks with task-shared parameters can be implanted into a single consolidated network. Our framework, vision transformer (ViT)-MVT, built on a plain and nonhierarchical ViT, incorporates numerous visual tasks into a modest supernet and optimizes them jointly across various dataset domains. For the design of ViT-MVT, we augment the ViT with a multihead self-attention (MHSE) to offer complementary cues in the channel and spatial dimension, as well as a local perception unit (LPU) and locality feed-forward network (locality FFN) for information exchange in the local region, thus endowing ViT-MVT with the ability to effectively optimize multiple tasks. Besides, we construct a search space comprising potential architectures with a broad spectrum of model sizes to offer various optimum candidates for diverse tasks. After that, we design a layer-adaptive sharing technique that automatically determines whether each layer of the transformer block is shared or not for all tasks, enabling ViT-MVT to obtain task-shared parameters for a reduction of storage and task-specific parameters to learn task-related features such that boosting performance. Finally, we introduce a joint-task evolutionary search algorithm to discover an optimal backbone for all tasks under total model size constraint, which challenges the conventional wisdom that visual tasks are typically supplied with backbone networks developed for image classification. Extensive experiments reveal that ViT-MVT delivers exceptional performances for multiple visual tasks over state-of-the-art methods while necessitating considerably fewer total storage costs. We further demonstrate that once ViT-MVT has been trained, ViT-MVT is capable of incremental learning when generalized to new tasks while retaining identical performances for trained tasks. The code is available at https://github.com/XT-1997/vitmvt.
Collapse
|
10
|
Trigka M, Dritsas E. A Comprehensive Survey of Machine Learning Techniques and Models for Object Detection. SENSORS (BASEL, SWITZERLAND) 2025; 25:214. [PMID: 39797004 PMCID: PMC11723456 DOI: 10.3390/s25010214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 12/18/2024] [Accepted: 12/30/2024] [Indexed: 01/13/2025]
Abstract
Object detection is a pivotal research domain within computer vision, with applications spanning from autonomous vehicles to medical diagnostics. This comprehensive survey presents an in-depth analysis of the evolution and significant advancements in object detection, emphasizing the critical role of machine learning (ML) and deep learning (DL) techniques. We explore a wide spectrum of methodologies, ranging from traditional approaches to the latest DL models, thoroughly evaluating their performance, strengths, and limitations. Additionally, the survey delves into various metrics for assessing model effectiveness, including precision, recall, and intersection over union (IoU), while addressing ongoing challenges in the field, such as managing occlusions, varying object scales, and improving real-time processing capabilities. Furthermore, we critically examine recent breakthroughs, including advanced architectures like Transformers, and discuss challenges and future research directions aimed at overcoming existing barriers. By synthesizing current advancements, this survey provides valuable insights for enhancing the robustness, accuracy, and efficiency of object detection systems across diverse and challenging applications.
Collapse
Affiliation(s)
| | - Elias Dritsas
- Industrial Systems Institute, Athena Research and Innovation Center, 26504 Patras, Greece;
| |
Collapse
|
11
|
Chen D, Wu Z, Liu F, Yang Z, Zheng S, Tan Y, Zhou E. ProtoCLIP: Prototypical Contrastive Language Image Pretraining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:610-624. [PMID: 38048244 DOI: 10.1109/tnnls.2023.3335859] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Abstract
Contrastive language image pretraining (CLIP) has received widespread attention since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image-text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. Based on this understanding, in this article, prototypical contrastive language image pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap. Specifically, ProtoCLIP sets up prototype-level discrimination between image and text spaces, which efficiently transfers higher level structural knowledge. Furthermore, prototypical back translation (PBT) is proposed to decouple representation grouping from representation alignment, resulting in effective learning of meaningful representations under a large modality gap. The PBT also enables us to introduce additional external teachers with richer prior language knowledge. ProtoCLIP is trained with an online episodic training strategy, which means it can be scaled up to unlimited amounts of data. We trained our ProtoCLIP on conceptual captions (CCs) and achieved an +5.81% ImageNet linear probing improvement and an +2.01% ImageNet zero-shot classification improvement. On the larger YFCC-15M dataset, ProtoCLIP matches the performance of CLIP with 33% of training time.
Collapse
|
12
|
Wang D, Wu C, Bai Y, Li Y, Shang C, Shen Q. A Multitask Network for Joint Multispectral Pansharpening on Diverse Satellite Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17635-17649. [PMID: 37672369 DOI: 10.1109/tnnls.2023.3306896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Despite the rapid advance in multispectral (MS) pansharpening, existing convolutional neural network (CNN)-based methods require training on separate CNNs for different satellite datasets. However, such a single-task learning (STL) paradigm often leads to overlooking any underlying correlations between datasets. Aiming at this challenging problem, a multitask network (MTNet) is presented to accomplish joint MS pansharpening in a unified framework for images acquired by different satellites. Particularly, the pansharpening process of each satellite is treated as a specific task, while MTNet simultaneously learns from all data obtained from these satellites following the multitask learning (MTL) paradigm. MTNet shares the generic knowledge between datasets via task-agnostic subnetwork (TASNet), utilizing task-specific subnetworks (TSSNets) to facilitate the adaptation of such knowledge to a certain satellite. To tackle the limitation of the local connectivity property of the CNN, TASNet incorporates Transformer modules to derive global information. In addition, band-aware dynamic convolutions (BDConvs) are proposed that can accommodate various ground scenes and bands by adjusting their respective receptive field (RF) size. Systematic experimental results over different datasets demonstrate that the proposed approach outperforms the existing state-of-the-art (SOTA) techniques.
Collapse
|
13
|
Hsu BWY, Tseng VS. LightDPH: Lightweight Dual-Projection-Head Hierarchical Contrastive Learning for Skin Lesion Classification. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:619-639. [PMID: 39463858 PMCID: PMC11499555 DOI: 10.1007/s41666-024-00174-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/25/2024] [Accepted: 09/13/2024] [Indexed: 10/29/2024]
Abstract
Effective skin cancer detection is crucial for early intervention and improved treatment outcomes. Previous studies have primarily focused on enhancing the performance of skin lesion classification models. However, there is a growing need to consider the practical requirements of real-world scenarios, such as portable applications that require lightweight models embedded in devices. Therefore, this study aims to propose a novel method that can address the major-type misclassification problem with a lightweight model. This study proposes an innovative Lightweight Dual Projection-Head Hierarchical contrastive learning (LightDPH) method. We introduce a dual projection-head mechanism to a contrastive learning framework. This mechanism is utilized to train a model with our proposed multi-level contrastive loss (MultiCon Loss), which can effectively learn hierarchical information from samples. Meanwhile, we present a distance-based weight (DBW) function to adjust losses based on hierarchical levels. This unique combination of MultiCon Loss and DBW function in LightDPH tackles the problem of major-type misclassification with lightweight models and enhances the model's sensitivity in skin lesion classification. The experimental results demonstrate that LightDPH significantly reduces the number of parameters by 52.6% and computational complexity by 29.9% in GFLOPs while maintaining high classification performance comparable to state-of-the-art methods. This study also presented a novel evaluation metric, model efficiency score (MES), to evaluate the cost-effectiveness of models with scaling and classification performance. The proposed LightDPH effectively mitigates major-type misclassification and works in a resource-efficient manner, making it highly suitable for clinical applications in resource-constrained environments. To the best of our knowledge, this is the first work that develops an effective lightweight hierarchical classification model for skin lesion detection.
Collapse
Affiliation(s)
- Benny Wei-Yun Hsu
- Institute of Computer Science and Engineering, National Yang Ming Chiao Tung University, No. 1001, Daxue Rd., Hsinchu City, 300093 Taiwan Republic of China
| | - Vincent S. Tseng
- Institute of Computer Science and Engineering, National Yang Ming Chiao Tung University, No. 1001, Daxue Rd., Hsinchu City, 300093 Taiwan Republic of China
- Department of Computer Science, National Yang Ming Chiao Tung University, No. 1001, Daxue Rd., Hsinchu City, 300093 Taiwan Republic of China
| |
Collapse
|
14
|
Comes MC, Fanizzi A, Bove S, Boldrini L, Latorre A, Guven DC, Iacovelli S, Talienti T, Rizzo A, Zito FA, Massafra R. Monitoring Over Time of Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer Patients Through an Ensemble Vision Transformers-Based Model. Cancer Med 2024; 13:e70482. [PMID: 39692281 DOI: 10.1002/cam4.70482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 11/15/2024] [Accepted: 11/28/2024] [Indexed: 12/19/2024] Open
Abstract
BACKGROUND Morphological and vascular characteristics of breast cancer can change during neoadjuvant chemotherapy (NAC). Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI)-acquired pre- and mid-treatment quantitatively capture information about tumor heterogeneity as potential earlier indicators of pathological complete response (pCR) to NAC in breast cancer. AIMS This study aimed to develop an ensemble deep learning-based model, exploiting a Vision Transformer (ViT) architecture, which merges features automatically extracted from five segmented slices of both pre- and mid-treatment exams containing the maximum tumor area, to predict and monitor pCR to NAC. MATERIALS AND METHODS Imaging data analyzed in this study referred to a cohort of 86 breast cancer patients, randomly split into training and test sets at a ratio of 8:2, who underwent NAC and for which information regarding the pCR status was available (37.2% of patients achieved pCR). We further validated our model using a subset of 20 patients selected from the publicly available I-SPY2 trial dataset (independent test). RESULTS The performances of the proposed model were assessed using standard evaluation metrics, and promising results were achieved: area under the curve (AUC) value of 91.4%, accuracy value of 82.4%, a specificity value of 80.0%, a sensitivity value of 85.7%, precision value of 75.0%, F-score value of 80.0%, and G-mean value of 82.8%. The results obtained from the independent test show an AUC of 81.3%, an accuracy of 80.0%, a specificity value of 76.9%, a sensitivity of 85.0%, a precision of 66.7%, an F-score of 75.0%, and a G-mean of 81.2%. DISCUSSION As far as we know, our research is the first proposal using ViTs on DCE-MRI exams to monitor pCR over time during NAC. CONCLUSION Finally, the changes in DCE-MRI at pre- and mid-treatment could affect the accuracy of pCR prediction to NAC.
Collapse
Affiliation(s)
- Maria Colomba Comes
- Laboratorio di Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Annarita Fanizzi
- Laboratorio di Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Samantha Bove
- Laboratorio di Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Luca Boldrini
- Unità Operativa Complessa di Radioterapia Oncologica, Fondazione Policlinico Universitario Agostino Gemelli I.R.C.C.S, Rome, Italy
| | - Agnese Latorre
- Unità Operativa Complessa di Oncologia Medica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II"Bari, Bari, Italy
| | - Deniz Can Guven
- Department of Medical Oncology, Hacettepe University, Cancer Institute, Ankara, Turkey
| | - Serena Iacovelli
- Trial Office, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II" Bari, Bari, Italy
| | - Tiziana Talienti
- Unità Operativa Complessa di Oncologia Medica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II"Bari, Bari, Italy
| | - Alessandro Rizzo
- Struttura Semplice Dipartimentale di Oncologia Medica per la Presa in Carico Globale del Paziente Oncologico "Don Tonino Bello", I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Francesco Alfredo Zito
- Unità Operativa Complessa di Anatomia Patologica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Raffaella Massafra
- Laboratorio di Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy
| |
Collapse
|
15
|
Dalva Y, Pehlivan H, Altindis SF, Dundar A. Benchmarking the Robustness of Instance Segmentation Models. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17021-17035. [PMID: 37721888 DOI: 10.1109/tnnls.2023.3310985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/20/2023]
Abstract
This article presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions as well as out-of-domain image collections, e.g., images captured by a different set-up than the training dataset. The out-of-domain image evaluation shows the generalization capability of models, an essential aspect of real-world applications, and an extensively studied topic of domain adaptation. These presented robustness and generalization evaluations are important when designing instance segmentation models for real-world applications and picking an off-the-shelf pretrained model to directly use for the task at hand. Specifically, this benchmark study includes state-of-the-art network architectures, network backbones, normalization layers, models trained starting from scratch versus pretrained networks, and the effect of multitask training on robustness and generalization. Through this study, we gain several insights. For example, we find that group normalization (GN) enhances the robustness of networks across corruptions where the image contents stay the same but corruptions are added on top. On the other hand, batch normalization (BN) improves the generalization of the models across different datasets where statistics of image features change. We also find that single-stage detectors do not generalize well to larger image resolutions than their training size. On the other hand, multistage detectors can easily be used on images of different sizes. We hope that our comprehensive study will motivate the development of more robust and reliable instance segmentation models.
Collapse
|
16
|
Xiang W, Xiong Z, Chen H, Xiong J, Zhang W, Fu Z, Zheng M, Liu B, Shi Q. FAPM: functional annotation of proteins using multimodal models beyond structural modeling. Bioinformatics 2024; 40:btae680. [PMID: 39540736 PMCID: PMC11630832 DOI: 10.1093/bioinformatics/btae680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 10/12/2024] [Accepted: 11/12/2024] [Indexed: 11/16/2024] Open
Abstract
MOTIVATION Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and "tail labels" with few known examples. Previous methods mainly focused on protein sequence features, overlooking the semantic meaning of protein labels. RESULTS We introduce functional annotation of proteins using multimodal models (FAPM), a contrastive multimodal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. AVAILABILITY AND IMPLEMENTATION The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.
Collapse
Affiliation(s)
- Wenkai Xiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- Lingang Laboratory, Shanghai 200031, China
| | | | - Huan Chen
- BioBank, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- Lingang Laboratory, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Liu
- BioBank, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Qian Shi
- Lingang Laboratory, Shanghai 200031, China
| |
Collapse
|
17
|
Li Z, Zhang J, Wei S, Gao Y, Cao C, Wu Z. TPAFNet: Transformer-Driven Pyramid Attention Fusion Network for 3D Medical Image Segmentation. IEEE J Biomed Health Inform 2024; 28:6803-6814. [PMID: 39283776 DOI: 10.1109/jbhi.2024.3460745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
The field of 3D medical image segmentation is witnessing a growing trend in the utilization of combined networks that integrate convolutional neural networks and transformers. Nevertheless, prevailing hybrid networks are confronted with limitations in their straightforward serial or parallel combination methods and lack an effective mechanism to fuse channel and spatial feature attention. To address these limitations, we present a robust multi-scale 3D medical image segmentation network, the Transformer-Driven Pyramid Attention Fusion Network, which is denoted as TPAFNet, leveraging a hybrid structure of CNN and transformer. Within this framework, we exploit the characteristics of atrous convolution to extract multi-scale information effectively, thereby enhancing the encoding results of the transformer. Furthermore, we introduce the TPAF block in the encoder to seamlessly fuse channel and spatial feature attention from multi-scale feature inputs. In contrast to conventional skip connections that simply concatenate or add features, our decoder is enriched with a TPAF connection, elevating the integration of feature attention between low-level and high-level features. Additionally, we propose a low-level encoding shortcut from the original input to the decoder output, preserving more original image features and contributing to enhanced results. Finally, the deep supervision is implemented using a novel CNN-based voxel-wise classifier to facilitate better network convergence. Experimental results demonstrate that TPAFNet significantly outperforms other state-of-the-art networks on two public datasets, indicating that our research can effectively improve the accuracy of medical image segmentation, thereby assisting doctors in making more precise diagnoses.
Collapse
|
18
|
Wan X, Ju J, Tang J, Lin M, Rao N, Chen D, Liu T, Li J, Bian F, Xiong N. MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer. SENSORS (BASEL, SWITZERLAND) 2024; 24:7029. [PMID: 39517945 PMCID: PMC11548048 DOI: 10.3390/s24217029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 10/17/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024]
Abstract
The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets.
Collapse
Affiliation(s)
- Xiangan Wan
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Jianping Ju
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Jianying Tang
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Mingyu Lin
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Ning Rao
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Deng Chen
- Hubei Province Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430079, China;
| | - Tingting Liu
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Jing Li
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Fan Bian
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| | - Nicholas Xiong
- School of Computer Science and Technology, Hubei Business College, Wuhan 430079, China; (X.W.); (J.T.); (N.R.); (T.L.); (J.L.); (F.B.); (N.X.)
| |
Collapse
|
19
|
Zhao T, Wu H, Leng D, Yao E, Gu S, Yao M, Zhang Q, Wang T, Wu D, Xie L. An artificial intelligence grading system of apical periodontitis in cone-beam computed tomography data. Dentomaxillofac Radiol 2024; 53:447-458. [PMID: 38960866 DOI: 10.1093/dmfr/twae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/02/2024] [Accepted: 06/18/2024] [Indexed: 07/05/2024] Open
Abstract
OBJECTIVES In order to assist junior doctors in better diagnosing apical periodontitis (AP), an artificial intelligence AP grading system was developed based on deep learning (DL) and its reliability and accuracy were evaluated. METHODS One hundred and twenty cone-beam computed tomography (CBCT) images were selected to construct a classification dataset with four categories, which were divided by CBCT periapical index (CBCTPAI), including normal periapical tissue, CBCTPAI 1-2, CBCTPAI 3-5, and young permanent teeth. Three classic algorithms (ResNet50/101/152) as well as one self-invented algorithm (PAINet) were compared with each other. PAINet were also compared with two recent Transformer-based models and three attention models. Their performance was evaluated by accuracy, precision, recall, balanced F score (F1-score), and the area under the macro-average receiver operating curve (AUC). Reliability was evaluated by Cohen's kappa to compare the consistency of model predicted labels with expert opinions. RESULTS PAINet performed best among the four algorithms. The accuracy, precision, recall, F1-score, and AUC on the test set were 0.9333, 0.9415, 0.9333, 0.9336, and 0.9972, respectively. Cohen's kappa was 0.911, which represented almost perfect consistency. CONCLUSIONS PAINet can accurately distinguish between normal periapical tissues, CBCTPAI 1-2, CBCTPAI 3-5, and young permanent teeth. Its results were highly consistent with expert opinions. It can help junior doctors diagnose and score AP, reducing the burden. It can also be promoted in areas where experts are lacking to provide professional diagnostic opinions.
Collapse
Affiliation(s)
- Tianyin Zhao
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Huili Wu
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Diya Leng
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Enhui Yao
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Shuyun Gu
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Minhui Yao
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Qinyu Zhang
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Tong Wang
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Daming Wu
- Department of Endodontics, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| | - Lizhe Xie
- Department of Oral & Maxillofacial Imaging, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing, 210029, China
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing, 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing, 210029, China
| |
Collapse
|
20
|
Huang F, Qiu A. Ensemble Vision Transformer for Dementia Diagnosis. IEEE J Biomed Health Inform 2024; 28:5551-5561. [PMID: 38889030 DOI: 10.1109/jbhi.2024.3412812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
In recent years, deep learning has gained momentum in computer-aided Alzheimer's Disease (AD) diagnosis. This study introduces a novel approach, Monte Carlo Ensemble Vision Transformer (MC-ViT), which develops an ensemble approach with Vision transformer (ViT). Instead of using traditional ensemble methods that deploy multiple learners, our approach employs a single vision transformer learner. By harnessing Monte Carlo sampling, this method produces a broad spectrum of classification decisions, enhancing the MC-ViT performance. This novel technique adeptly overcomes the limitation of 3D patch convolutional neural networks that only characterize partial of the whole brain anatomy, paving the way for a neural network adept at discerning 3D inter-feature correlations. Evaluations using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset with 7199 scans and Open Access Series of Imaging Studies-3 (OASIS-3) with 1992 scans showcased its performance. With minimal preprocessing, our approach achieved an impressive 90% accuracy in AD classification, surpassing both 2D-slice CNNs and 3D CNNs.
Collapse
|
21
|
Liu Y, Zhang L, Wei Z, Wang T, Yang X, Tian J, Hui H. Transformer for low concentration image denoising in magnetic particle imaging. Phys Med Biol 2024; 69:175014. [PMID: 39137818 DOI: 10.1088/1361-6560/ad6ede] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 08/13/2024] [Indexed: 08/15/2024]
Abstract
Objective.Magnetic particle imaging (MPI) is an emerging tracer-basedin vivoimaging technology. The use of MPI at low superparamagnetic iron oxide nanoparticle concentrations has the potential to be a promising area of clinical application due to the inherent safety for humans. However, low tracer concentrations reduce the signal-to-noise ratio of the magnetization signal, leading to severe noise artifacts in the reconstructed MPI images. Hardware improvements have high complexity, while traditional methods lack robustness to different noise levels, making it difficult to improve the quality of low concentration MPI images.Approach.Here, we propose a novel deep learning method for MPI image denoising and quality enhancing based on a sparse lightweight transformer model. The proposed residual-local transformer structure reduces model complexity to avoid overfitting, in which an information retention block facilitates feature extraction capabilities for the image details. Besides, we design a noisy concentration dataset to train our model. Then, we evaluate our method with both simulated and real MPI image data.Main results.Simulation experiment results show that our method can achieve the best performance compared with the existing deep learning methods for MPI image denoising. More importantly, our method is effectively performed on the real MPI image of samples with an Fe concentration down to 67μgFeml-1.Significance.Our method provides great potential for obtaining high quality MPI images at low concentrations.
Collapse
Affiliation(s)
- Yuanduo Liu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Liwen Zhang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
| | - Zechen Wei
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Tan Wang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Xin Yang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | - Jie Tian
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
- Key Laboratory of Big Data-Based Precision Medicine (Beihang University), Ministry of Industry and Information Technology, Beijing 100191, People's Republic of China
- School of Engineering Medicine & School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, People's Republic of China
- National Key Laboratory of Kidney Diseases, Beijing 100853, People's Republic of China
| | - Hui Hui
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, People's Republic of China
- Beijing Key Laboratory of Molecular Imaging, Beijing 100190, People's Republic of China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- National Key Laboratory of Kidney Diseases, Beijing 100853, People's Republic of China
| |
Collapse
|
22
|
Lin J, Zhang X, Qin Y, Yang S, Wen X, Cernava T, Migheli Q, Chen X. Local and Global Feature-Aware Dual-Branch Networks for Plant Disease Recognition. PLANT PHENOMICS (WASHINGTON, D.C.) 2024; 6:0208. [PMID: 39130161 PMCID: PMC11315374 DOI: 10.34133/plantphenomics.0208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/08/2024] [Indexed: 08/13/2024]
Abstract
Accurate identification of plant diseases is important for ensuring the safety of agricultural production. Convolutional neural networks (CNNs) and visual transformers (VTs) can extract effective representations of images and have been widely used for the intelligent recognition of plant disease images. However, CNNs have excellent local perception with poor global perception, and VTs have excellent global perception with poor local perception. This makes it difficult to further improve the performance of both CNNs and VTs on plant disease recognition tasks. In this paper, we propose a local and global feature-aware dual-branch network, named LGNet, for the identification of plant diseases. More specifically, we first design a dual-branch structure based on CNNs and VTs to extract the local and global features. Then, an adaptive feature fusion (AFF) module is designed to fuse the local and global features, thus driving the model to dynamically perceive the weights of different features. Finally, we design a hierarchical mixed-scale unit-guided feature fusion (HMUFF) module to mine the key information in the features at different levels and fuse the differentiated information among them, thereby enhancing the model's multiscale perception capability. Subsequently, extensive experiments were conducted on the AI Challenger 2018 dataset and the self-collected corn disease (SCD) dataset. The experimental results demonstrate that our proposed LGNet achieves state-of-the-art recognition performance on both the AI Challenger 2018 dataset and the SCD dataset, with accuracies of 88.74% and 99.08%, respectively.
Collapse
Affiliation(s)
- Jianwu Lin
- Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
- Guizhou-Europe Environmental Biotechnology and Agricultural Informatics Oversea Innovation Center in Guizhou University, Guizhou Provincial Science and Technology Department, Guiyang 550025, China
| | - Xin Zhang
- Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
- Guizhou-Europe Environmental Biotechnology and Agricultural Informatics Oversea Innovation Center in Guizhou University, Guizhou Provincial Science and Technology Department, Guiyang 550025, China
| | - Yongbin Qin
- Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Shengxian Yang
- College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
- Guizhou-Europe Environmental Biotechnology and Agricultural Informatics Oversea Innovation Center in Guizhou University, Guizhou Provincial Science and Technology Department, Guiyang 550025, China
| | - Xingtian Wen
- College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China
| | - Tomislav Cernava
- School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton S017 1BJ, UK
| | - Quirico Migheli
- Dipartimento di Agraria and NRD—Nucleo di Ricerca sulla Desertificazione, Università degli Studi di Sassari, Sassari, Italy
| | - Xiaoyulong Chen
- Guizhou-Europe Environmental Biotechnology and Agricultural Informatics Oversea Innovation Center in Guizhou University, Guizhou Provincial Science and Technology Department, Guiyang 550025, China
- College of Life Sciences, Guizhou University, Guiyang 550025, China
| |
Collapse
|
23
|
Liu Y, Wen Z, Wang Y, Zhong Y, Wang J, Hu Y, Zhou P, Guo S. Artificial intelligence in ischemic stroke images: current applications and future directions. Front Neurol 2024; 15:1418060. [PMID: 39050128 PMCID: PMC11266078 DOI: 10.3389/fneur.2024.1418060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024] Open
Abstract
This paper reviews the current research progress in the application of Artificial Intelligence (AI) based on ischemic stroke imaging, analyzes the main challenges, and explores future research directions. This study emphasizes the application of AI in areas such as automatic segmentation of infarct areas, detection of large vessel occlusion, prediction of stroke outcomes, assessment of hemorrhagic transformation risk, forecasting of recurrent ischemic stroke risk, and automatic grading of collateral circulation. The research indicates that Machine Learning (ML) and Deep Learning (DL) technologies have tremendous potential for improving diagnostic accuracy, accelerating disease identification, and predicting disease progression and treatment responses. However, the clinical application of these technologies still faces challenges such as limitations in data volume, model interpretability, and the need for real-time monitoring and updating. Additionally, this paper discusses the prospects of applying large language models, such as the transformer architecture, in ischemic stroke imaging analysis, emphasizing the importance of establishing large public databases and the need for future research to focus on the interpretability of algorithms and the comprehensiveness of clinical decision support. Overall, AI has significant application value in the management of ischemic stroke; however, existing technological and practical challenges must be overcome to achieve its widespread application in clinical practice.
Collapse
Affiliation(s)
- Ying Liu
- School of Nursing, Southwest Medical University, Luzhou, China
- Department of Oncology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Zhongjian Wen
- School of Nursing, Southwest Medical University, Luzhou, China
- Wound Healing Basic Research and Clinical Applications Key Laboratory of Luzhou, Southwest Medical University, Luzhou, China
| | - Yiren Wang
- School of Nursing, Southwest Medical University, Luzhou, China
- Wound Healing Basic Research and Clinical Applications Key Laboratory of Luzhou, Southwest Medical University, Luzhou, China
| | - Yuxin Zhong
- School of Nursing, Guizhou Medical University, Guiyang, China
| | - Jianxiong Wang
- Department of Rehabilitation, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Yiheng Hu
- Department of Medical Imaging, Southwest Medical University, Luzhou, China
| | - Ping Zhou
- Department of Radiology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Shengmin Guo
- Nursing Department, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| |
Collapse
|
24
|
Liu J, Zhang X, Luo Z. TransConv: Transformer Meets Contextual Convolution for Unsupervised Domain Adaptation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:469. [PMID: 38920478 PMCID: PMC11202584 DOI: 10.3390/e26060469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/26/2024] [Accepted: 05/26/2024] [Indexed: 06/27/2024]
Abstract
Unsupervised domain adaptation (UDA) aims to reapply the classifier to be ever-trained on a labeled source domain to a related unlabeled target domain. Recent progress in this line has evolved with the advance of network architectures from convolutional neural networks (CNNs) to transformers or both hybrids. However, this advance has to pay the cost of high computational overheads or complex training processes. In this paper, we propose an efficient alternative hybrid architecture by marrying transformer to contextual convolution (TransConv) to solve UDA tasks. Different from previous transformer based UDA architectures, TransConv has two special aspects: (1) reviving the multilayer perception (MLP) of transformer encoders with Gaussian channel attention fusion for robustness, and (2) mixing contextual features to highly efficient dynamic convolutions for cross-domain interaction. As a result, TransConv enables to calibrate interdomain feature semantics from the global features and the local ones. Experimental results on five benchmarks show that TransConv attains remarkable results with high efficiency as compared to the existing UDA methods.
Collapse
Affiliation(s)
- Junchi Liu
- School of Computer Science, National University of Defense Technology, Changsha 410073, China; (X.Z.); (Z.L.)
| | | | | |
Collapse
|
25
|
Trujillano F, Jimenez G, Manrique E, Kahamba NF, Okumu F, Apollinaire N, Carrasco-Escobar G, Barrett B, Fornace K. Using image segmentation models to analyse high-resolution earth observation data: new tools to monitor disease risks in changing environments. Int J Health Geogr 2024; 23:13. [PMID: 38764024 PMCID: PMC11102859 DOI: 10.1186/s12942-024-00371-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/29/2024] [Indexed: 05/21/2024] Open
Abstract
BACKGROUND In the near future, the incidence of mosquito-borne diseases may expand to new sites due to changes in temperature and rainfall patterns caused by climate change. Therefore, there is a need to use recent technological advances to improve vector surveillance methodologies. Unoccupied Aerial Vehicles (UAVs), often called drones, have been used to collect high-resolution imagery to map detailed information on mosquito habitats and direct control measures to specific areas. Supervised classification approaches have been largely used to automatically detect vector habitats. However, manual data labelling for model training limits their use for rapid responses. Open-source foundation models such as the Meta AI Segment Anything Model (SAM) can facilitate the manual digitalization of high-resolution images. This pre-trained model can assist in extracting features of interest in a diverse range of images. Here, we evaluated the performance of SAM through the Samgeo package, a Python-based wrapper for geospatial data, as it has not been applied to analyse remote sensing images for epidemiological studies. RESULTS We tested the identification of two land cover classes of interest: water bodies and human settlements, using different UAV acquired imagery across five malaria-endemic areas in Africa, South America, and Southeast Asia. We employed manually placed point prompts and text prompts associated with specific classes of interest to guide the image segmentation and assessed the performance in the different geographic contexts. An average Dice coefficient value of 0.67 was obtained for buildings segmentation and 0.73 for water bodies using point prompts. Regarding the use of text prompts, the highest Dice coefficient value reached 0.72 for buildings and 0.70 for water bodies. Nevertheless, the performance was closely dependent on each object, landscape characteristics and selected words, resulting in varying performance. CONCLUSIONS Recent models such as SAM can potentially assist manual digitalization of imagery by vector control programs, quickly identifying key features when surveying an area of interest. However, accurate segmentation still requires user-provided manual prompts and corrections to obtain precise segmentation. Further evaluations are necessary, especially for applications in rural areas.
Collapse
Affiliation(s)
- Fedra Trujillano
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, Scotland, UK.
- School of Geographical & Earth Sciences, University of Glasgow, Glasgow, Scotland, UK.
| | - Gabriel Jimenez
- Sorbonne Université, Institute du Cerveau - ICM, CNRS, Inria, AP-HP, Paris, Inserm, France
| | - Edgar Manrique
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, Scotland, UK
| | - Najat F Kahamba
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, Scotland, UK
- Environmental Health and Ecological Sciences Department, Ifakara Health Institute, P. O. Box 53, Ifakara, Tanzania
| | - Fredros Okumu
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, Scotland, UK
- Environmental Health and Ecological Sciences Department, Ifakara Health Institute, P. O. Box 53, Ifakara, Tanzania
| | - Nombre Apollinaire
- Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou, Burkina Faso
| | - Gabriel Carrasco-Escobar
- Health Innovation Laboratory, Institute of Tropical Medicine "Alexander von Humboldt", Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Brian Barrett
- School of Geographical & Earth Sciences, University of Glasgow, Glasgow, Scotland, UK
| | - Kimberly Fornace
- School of Biodiversity, One Health & Veterinary Medicine, University of Glasgow, Glasgow, Scotland, UK
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| |
Collapse
|
26
|
Jaramillo-Hernández JF, Julian V, Marco-Detchart C, Rincón JA. Application of Machine Vision Techniques in Low-Cost Devices to Improve Efficiency in Precision Farming. SENSORS (BASEL, SWITZERLAND) 2024; 24:937. [PMID: 38339654 PMCID: PMC10857338 DOI: 10.3390/s24030937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/12/2024]
Abstract
In the context of recent technological advancements driven by distributed work and open-source resources, computer vision stands out as an innovative force, transforming how machines interact with and comprehend the visual world around us. This work conceives, designs, implements, and operates a computer vision and artificial intelligence method for object detection with integrated depth estimation. With applications ranging from autonomous fruit-harvesting systems to phenotyping tasks, the proposed Depth Object Detector (DOD) is trained and evaluated using the Microsoft Common Objects in Context dataset and the MinneApple dataset for object and fruit detection, respectively. The DOD is benchmarked against current state-of-the-art models. The results demonstrate the proposed method's efficiency for operation on embedded systems, with a favorable balance between accuracy and speed, making it well suited for real-time applications on edge devices in the context of the Internet of things.
Collapse
Affiliation(s)
- Juan Felipe Jaramillo-Hernández
- Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València (UPV), Camí de Vera s/n, 46022 Valencia, Spain; (V.J.); (C.M.-D.)
- Valencian Graduate School and Research Network of Artificial Intelligence (VALGRAI), Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain
| | - Vicente Julian
- Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València (UPV), Camí de Vera s/n, 46022 Valencia, Spain; (V.J.); (C.M.-D.)
- Valencian Graduate School and Research Network of Artificial Intelligence (VALGRAI), Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain
| | - Cedric Marco-Detchart
- Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València (UPV), Camí de Vera s/n, 46022 Valencia, Spain; (V.J.); (C.M.-D.)
| | - Jaime Andrés Rincón
- Departamento de Digitalización, Escuela Politécnica Superior, Universidad de Burgos, 09006 Miranda de Ebro, Spain;
| |
Collapse
|
27
|
Wang S, Jiang F, Xu B. Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:8802. [PMID: 37960501 PMCID: PMC10650861 DOI: 10.3390/s23218802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/09/2023] [Accepted: 10/24/2023] [Indexed: 11/15/2023]
Abstract
Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.
Collapse
Affiliation(s)
| | | | - Boqian Xu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (S.W.); (F.J.)
| |
Collapse
|