1
|
Yan Y, Yang T, Jiao C, Yang A, Miao J. IWNeXt: an image-wavelet domain ConvNeXt-based network for self-supervised multi-contrast MRI reconstruction. Phys Med Biol 2024; 69:085005. [PMID: 38479022 DOI: 10.1088/1361-6560/ad33b4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 03/13/2024] [Indexed: 04/04/2024]
Abstract
Objective.Multi-contrast magnetic resonance imaging (MC MRI) can obtain more comprehensive anatomical information of the same scanning object but requires a longer acquisition time than single-contrast MRI. To accelerate MC MRI speed, recent studies only collect partial k-space data of one modality (target contrast) to reconstruct the remaining non-sampled measurements using a deep learning-based model with the assistance of another fully sampled modality (reference contrast). However, MC MRI reconstruction mainly performs the image domain reconstruction with conventional CNN-based structures by full supervision. It ignores the prior information from reference contrast images in other sparse domains and requires fully sampled target contrast data. In addition, because of the limited receptive field, conventional CNN-based networks are difficult to build a high-quality non-local dependency.Approach.In the paper, we propose an Image-Wavelet domain ConvNeXt-based network (IWNeXt) for self-supervised MC MRI reconstruction. Firstly, INeXt and WNeXt based on ConvNeXt reconstruct undersampled target contrast data in the image domain and refine the initial reconstructed result in the wavelet domain respectively. To generate more tissue details in the refinement stage, reference contrast wavelet sub-bands are used as additional supplementary information for wavelet domain reconstruction. Then we design a novel attention ConvNeXt block for feature extraction, which can capture the non-local information of the MC image. Finally, the cross-domain consistency loss is designed for self-supervised learning. Especially, the frequency domain consistency loss deduces the non-sampled data, while the image and wavelet domain consistency loss retain more high-frequency information in the final reconstruction.Main results.Numerous experiments are conducted on the HCP dataset and the M4Raw dataset with different sampling trajectories. Compared with DuDoRNet, our model improves by 1.651 dB in the peak signal-to-noise ratio.Significance.IWNeXt is a potential cross-domain method that can enhance the accuracy of MC MRI reconstruction and reduce reliance on fully sampled target contrast images.
Collapse
Affiliation(s)
- Yanghui Yan
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| | - Tiejun Yang
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, People's Republic of China
- Key Laboratory of Grain Information Processing and Control (HAUT), Ministry of Education, Zhengzhou, People's Republic of China
- Henan Key Laboratory of Grain Photoelectric Detection and Control (HAUT), Zhengzhou, Henan, People's Republic of China
| | - Chunxia Jiao
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| | - Aolin Yang
- School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, People's Republic of China
| | - Jianyu Miao
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, People's Republic of China
| |
Collapse
|
2
|
Islam M, Zunair H, Mohammed N. CosSIF: Cosine similarity-based image filtering to overcome low inter-class variation in synthetic medical image datasets. Comput Biol Med 2024; 172:108317. [PMID: 38492455 DOI: 10.1016/j.compbiomed.2024.108317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 01/27/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
Crafting effective deep learning models for medical image analysis is a complex task, particularly in cases where the medical image dataset lacks significant inter-class variation. This challenge is further aggravated when employing such datasets to generate synthetic images using generative adversarial networks (GANs), as the output of GANs heavily relies on the input data. In this research, we propose a novel filtering algorithm called Cosine Similarity-based Image Filtering (CosSIF). We leverage CosSIF to develop two distinct filtering methods: Filtering Before GAN Training (FBGT) and Filtering After GAN Training (FAGT). FBGT involves the removal of real images that exhibit similarities to images of other classes before utilizing them as the training dataset for a GAN. On the other hand, FAGT focuses on eliminating synthetic images with less discriminative features compared to real images used for training the GAN. The experimental results reveal that the utilization of either the FAGT or FBGT method reduces low inter-class variation in clinical image classification datasets and enables GANs to generate synthetic images with greater discriminative features. Moreover, modern transformer and convolutional-based models, trained with datasets that utilize these filtering methods, lead to less bias toward the majority class, more accurate predictions of samples in the minority class, and overall better generalization capabilities. Code and implementation details are available at: https://github.com/mominul-ssv/cossif.
Collapse
Affiliation(s)
- Mominul Islam
- Department of Electrical and Computer Engineering, North South University, Bashundhara, Dhaka, Bangladesh.
| | - Hasib Zunair
- Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada.
| | - Nabeel Mohammed
- Department of Electrical and Computer Engineering, North South University, Bashundhara, Dhaka, Bangladesh.
| |
Collapse
|
3
|
Zhang L, Song W, Zhu T, Liu Y, Chen W, Cao Y. ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model. Brief Bioinform 2024; 25:bbae133. [PMID: 38561979 PMCID: PMC10985285 DOI: 10.1093/bib/bbae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/11/2024] [Accepted: 03/02/2024] [Indexed: 04/04/2024] Open
Abstract
Peptide binding to major histocompatibility complex (MHC) proteins plays a critical role in T-cell recognition and the specificity of the immune response. Experimental validation such peptides is extremely resource-intensive. As a result, accurate computational prediction of binding peptides is highly important, particularly in the context of cancer immunotherapy applications, such as the identification of neoantigens. In recent years, there is a significant need to continually improve the existing prediction methods to meet the demands of this field. We developed ConvNeXt-MHC, a method for predicting MHC-I-peptide binding affinity. It introduces a degenerate encoding approach to enhance well-established panspecific methods and integrates transfer learning and semi-supervised learning methods into the cutting-edge deep learning framework ConvNeXt. Comprehensive benchmark results demonstrate that ConvNeXt-MHC outperforms state-of-the-art methods in terms of accuracy. We expect that ConvNeXt-MHC will help us foster new discoveries in the field of immunoinformatics in the distant future. We constructed a user-friendly website at http://www.combio-lezhang.online/predict/, where users can access our data and application.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Wenkai Song
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Tinghao Zhu
- College of Computer Science, Sichuan University, Chengdu 610065, China
- Nuclear Power Institute of China, Chengdu 610213, China
| | - Yang Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| |
Collapse
|
4
|
Zhang Z, Wen Y, Zhang X, Ma Q. CI-UNet: melding convnext and cross-dimensional attention for robust medical image segmentation. Biomed Eng Lett 2024; 14:341-353. [PMID: 38374903 PMCID: PMC10874369 DOI: 10.1007/s13534-023-00341-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/30/2023] [Accepted: 12/01/2023] [Indexed: 02/21/2024] Open
Abstract
Deep learning-based methods have recently shown great promise in medical image segmentation task. However, CNN-based frameworks struggle with inadequate long-range spatial dependency capture, whereas Transformers suffer from computational inefficiency and necessitate substantial volumes of labeled data for effective training. To tackle these issues, this paper introduces CI-UNet, a novel architecture that utilizes ConvNeXt as its encoder, amalgamating the computational efficiency and feature extraction capabilities. Moreover, an advanced attention mechanism is proposed to captures intricate cross-dimensional interactions and global context. Extensive experiments on two segmentation datasets, namely BCSD, and CT2USforKidneySeg, confirm the excellent performance of the proposed CI-UNet as compared to other segmentation methods.
Collapse
Affiliation(s)
- Zhuo Zhang
- School of Electronic and Information Engineering, Tiangong University, Tianjin, 300387 China
| | - Yihan Wen
- International School of Information Science and Engineering, Dalian University of Technology, Dalian, 116620 LiaoNing China
| | - Xiaochen Zhang
- Tianjin Cerebral Vascular and Neural Degenerative Disease Key Laboratory, Tianjin Huanhu Hospital, Tianjin, 300350 China
| | - Quanfeng Ma
- Tianjin Cerebral Vascular and Neural Degenerative Disease Key Laboratory, Tianjin Huanhu Hospital, Tianjin, 300350 China
| |
Collapse
|
5
|
Tagnamas J, Ramadan H, Yahyaouy A, Tairi H. Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images. Vis Comput Ind Biomed Art 2024; 7:2. [PMID: 38273164 PMCID: PMC10811315 DOI: 10.1186/s42492-024-00155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/11/2024] [Indexed: 01/27/2024] Open
Abstract
Accurate segmentation of breast ultrasound (BUS) images is crucial for early diagnosis and treatment of breast cancer. Further, the task of segmenting lesions in BUS images continues to pose significant challenges due to the limitations of convolutional neural networks (CNNs) in capturing long-range dependencies and obtaining global context information. Existing methods relying solely on CNNs have struggled to address these issues. Recently, ConvNeXts have emerged as a promising architecture for CNNs, while transformers have demonstrated outstanding performance in diverse computer vision tasks, including the analysis of medical images. In this paper, we propose a novel breast lesion segmentation network CS-Net that combines the strengths of ConvNeXt and Swin Transformer models to enhance the performance of the U-Net architecture. Our network operates on BUS images and adopts an end-to-end approach to perform segmentation. To address the limitations of CNNs, we design a hybrid encoder that incorporates modified ConvNeXt convolutions and Swin Transformer. Furthermore, to enhance capturing the spatial and channel attention in feature maps we incorporate the Coordinate Attention Module. Second, we design an Encoder-Decoder Features Fusion Module that facilitates the fusion of low-level features from the encoder with high-level semantic features from the decoder during the image reconstruction. Experimental results demonstrate the superiority of our network over state-of-the-art image segmentation methods for BUS lesions segmentation.
Collapse
Affiliation(s)
- Jaouad Tagnamas
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco.
| | - Hiba Ramadan
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco
| | - Ali Yahyaouy
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco
| | - Hamid Tairi
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco
| |
Collapse
|
6
|
Alharthi AG, Alzahrani SM. Multi-Slice Generation sMRI and fMRI for Autism Spectrum Disorder Diagnosis Using 3D-CNN and Vision Transformers. Brain Sci 2023; 13:1578. [PMID: 38002538 PMCID: PMC10670036 DOI: 10.3390/brainsci13111578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 11/03/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023] Open
Abstract
Researchers have explored various potential indicators of ASD, including changes in brain structure and activity, genetics, and immune system abnormalities, but no definitive indicator has been found yet. Therefore, this study aims to investigate ASD indicators using two types of magnetic resonance images (MRI), structural (sMRI) and functional (fMRI), and to address the issue of limited data availability. Transfer learning is a valuable technique when working with limited data, as it utilizes knowledge gained from a pre-trained model in a domain with abundant data. This study proposed the use of four vision transformers namely ConvNeXT, MobileNet, Swin, and ViT using sMRI modalities. The study also investigated the use of a 3D-CNN model with sMRI and fMRI modalities. Our experiments involved different methods of generating data and extracting slices from raw 3D sMRI and 4D fMRI scans along the axial, coronal, and sagittal brain planes. To evaluate our methods, we utilized a standard neuroimaging dataset called NYU from the ABIDE repository to classify ASD subjects from typical control subjects. The performance of our models was evaluated against several baselines including studies that implemented VGG and ResNet transfer learning models. Our experimental results validate the effectiveness of the proposed multi-slice generation with the 3D-CNN and transfer learning methods as they achieved state-of-the-art results. In particular, results from 50-middle slices from the fMRI and 3D-CNN showed a profound promise in ASD classifiability as it obtained a maximum accuracy of 0.8710 and F1-score of 0.8261 when using the mean of 4D images across the axial, coronal, and sagittal. Additionally, the use of the whole slices in fMRI except the beginnings and the ends of brain views helped to reduce irrelevant information and showed good performance of 0.8387 accuracy and 0.7727 F1-score. Lastly, the transfer learning with the ConvNeXt model achieved results higher than other transformers when using 50-middle slices sMRI along the axial, coronal, and sagittal planes.
Collapse
Affiliation(s)
| | - Salha M. Alzahrani
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia;
| |
Collapse
|
7
|
Sui D, Liu W, Zhang Y, Li Y, Luo G, Wang K, Guo M. ColonNet: A novel polyp segmentation framework based on LK-RFB and GPPD. Comput Biol Med 2023; 166:107541. [PMID: 37804779 DOI: 10.1016/j.compbiomed.2023.107541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 09/11/2023] [Accepted: 09/28/2023] [Indexed: 10/09/2023]
Abstract
Colorectal cancer (CRC) holds the distinction of being the most prevalent malignant tumor affecting the digestive system. It is a formidable global health challenge, as it ranks as the fourth leading cause of cancer-related fatalities around the world. Despite considerable advancements in comprehending and addressing colorectal cancer (CRC), the likelihood of recurring tumors and metastasis remains a major cause of high morbidity and mortality rates during treatment. Currently, colonoscopy is the predominant method for CRC screening. Artificial intelligence has emerged as a promising tool in aiding the diagnosis of polyps, which have demonstrated significant potential. Unfortunately, most segmentation methods face challenges in terms of limited accuracy and generalization to different datasets, especially the slow processing and analysis speed has become a major obstacle. In this study, we propose a fast and efficient polyp segmentation framework based on the Large-Kernel Receptive Field Block (LK-RFB) and Global Parallel Partial Decoder(GPPD). Our proposed ColonNet has been extensively tested and proven effective, achieving a DICE coefficient of over 0.910 and an FPS of over 102 on the CVC-300 dataset. In comparison to the state-of-the-art (SOTA) methods, ColonNet outperforms or achieves comparable performance on five publicly available datasets, establishing a new SOTA. Compared to state-of-the-art methods, ColonNet achieves the highest FPS (over 102 FPS) while maintaining excellent segmentation results, achieving the best or comparable performance on the five public datasets. The code will be released at: https://github.com/SPECTRELWF/ColonNet.
Collapse
Affiliation(s)
- Dong Sui
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| | - Weifeng Liu
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Yue Zhang
- College of Computer Science and Technology, Harbin Engineering University, Harbin, China.
| | - Yang Li
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Gongning Luo
- Perceptual Computing Research Center, Harbin Institute of Technology, Harbin, China
| | - Kuanquan Wang
- Perceptual Computing Research Center, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
8
|
Wang H, Wang K, Yan T, Zhou H, Cao E, Lu Y, Wang Y, Luo J, Pang Y. Endoscopic image classification algorithm based on Poolformer. Front Neurosci 2023; 17:1273686. [PMID: 37811325 PMCID: PMC10551176 DOI: 10.3389/fnins.2023.1273686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Image desmoking is a significant aspect of endoscopic image processing, effectively mitigating visual field obstructions without the need for additional surgical interventions. However, current smoke removal techniques tend to apply comprehensive video enhancement to all frames, encompassing both smoke-free and smoke-affected images, which not only escalates computational costs but also introduces potential noise during the enhancement of smoke-free images. In response to this challenge, this paper introduces an approach for classifying images that contain surgical smoke within endoscopic scenes. This classification method provides crucial target frame information for enhancing surgical smoke removal, improving the scientific robustness, and enhancing the real-time processing capabilities of image-based smoke removal method. The proposed endoscopic smoke image classification algorithm based on the improved Poolformer model, augments the model's capacity for endoscopic image feature extraction. This enhancement is achieved by transforming the Token Mixer within the encoder into a multi-branch structure akin to ConvNeXt, a pure convolutional neural network. Moreover, the conversion to a single-path topology during the prediction phase elevates processing speed. Experiments use the endoscopic dataset sourced from the Hamlyn Centre Laparoscopic/Endoscopic Video Dataset, augmented by Blender software rendering. The dataset comprises 3,800 training images and 1,200 test images, distributed in a 4:1 ratio of smoke-free to smoke-containing images. The outcomes affirm the superior performance of this paper's approach across multiple parameters. Comparative assessments against existing models, such as mobilenet_v3, efficientnet_b7, and ViT-B/16, substantiate that the proposed method excels in accuracy, sensitivity, and inference speed. Notably, when contrasted with the Poolformer_s12 network, the proposed method achieves a 2.3% enhancement in accuracy, an 8.2% boost in sensitivity, while incurring a mere 6.4 frames per second reduction in processing speed, maintaining 87 frames per second. The results authenticate the improved performance of the refined Poolformer model in endoscopic smoke image classification tasks. This advancement presents a lightweight yet effective solution for the automatic detection of smoke-containing images in endoscopy. This approach strikes a balance between the accuracy and real-time processing requirements of endoscopic image analysis, offering valuable insights for targeted desmoking process.
Collapse
Affiliation(s)
- Huiqian Wang
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
- Chongqing Xishan Science & Technology Co., Ltd., Chongqing, China
| | - Kun Wang
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Tian Yan
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Hekai Zhou
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Enling Cao
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yi Lu
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yuanfa Wang
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
- Chongqing Xishan Science & Technology Co., Ltd., Chongqing, China
| | - Jiasai Luo
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yu Pang
- Postdoctoral Research Station, Chongqing Key Laboratory of Photoelectronic Information Sensing and Transmitting Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
9
|
Alsharif B, Altaher AS, Altaher A, Ilyas M, Alalwany E. Deep Learning Technology to Recognize American Sign Language Alphabet. Sensors (Basel) 2023; 23:7970. [PMID: 37766026 PMCID: PMC10535774 DOI: 10.3390/s23187970] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/08/2023] [Accepted: 09/17/2023] [Indexed: 09/29/2023]
Abstract
Historically, individuals with hearing impairments have faced neglect, lacking the necessary tools to facilitate effective communication. However, advancements in modern technology have paved the way for the development of various tools and software aimed at improving the quality of life for hearing-disabled individuals. This research paper presents a comprehensive study employing five distinct deep learning models to recognize hand gestures for the American Sign Language (ASL) alphabet. The primary objective of this study was to leverage contemporary technology to bridge the communication gap between hearing-impaired individuals and individuals with no hearing impairment. The models utilized in this research include AlexNet, ConvNeXt, EfficientNet, ResNet-50, and VisionTransformer were trained and tested using an extensive dataset comprising over 87,000 images of the ASL alphabet hand gestures. Numerous experiments were conducted, involving modifications to the architectural design parameters of the models to obtain maximum recognition accuracy. The experimental results of our study revealed that ResNet-50 achieved an exceptional accuracy rate of 99.98%, the highest among all models. EfficientNet attained an accuracy rate of 99.95%, ConvNeXt achieved 99.51% accuracy, AlexNet attained 99.50% accuracy, while VisionTransformer yielded the lowest accuracy of 88.59%.
Collapse
Affiliation(s)
- Bader Alsharif
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA; (B.A.); (A.S.A.)
- Department of Computer Science and Engineering, College of Telecommunication and Information, Technical and Vocational Training Corporation (TVTC), Riyadh 11564, Saudi Arabia
| | - Ali Salem Altaher
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA; (B.A.); (A.S.A.)
| | - Ahmed Altaher
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA; (B.A.); (A.S.A.)
- Electronic Computer Center, Al-Nahrain University, Jadriya, Baghdad 64074, Iraq
| | - Mohammad Ilyas
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA; (B.A.); (A.S.A.)
| | - Easa Alalwany
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA; (B.A.); (A.S.A.)
- College of Computer Science and Engineering, Taibah University, Yanbu 46421, Saudi Arabia
| |
Collapse
|
10
|
Zheng Z, Yao H, Lin C, Huang K, Chen L, Shao Z, Zhou H, Zhao G. KD_ ConvNeXt: knowledge distillation-based image classification of lung tumor surgical specimen sections. Front Genet 2023; 14:1254435. [PMID: 37790704 PMCID: PMC10544998 DOI: 10.3389/fgene.2023.1254435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 08/10/2023] [Indexed: 10/05/2023] Open
Abstract
Introduction: Lung cancer is currently among the most prevalent and lethal cancers in the world in terms of incidence and fatality rates. In clinical practice, identifying the specific subtypes of lung cancer is essential in diagnosing and treating lung lesions. Methods: This paper aims to collect histopathological section images of lung tumor surgical specimens to construct a clinical dataset for researching and addressing the classification problem of specific subtypes of lung tumors. Our method proposes a teacher-student network architecture based on a knowledge distillation mechanism for the specific subtype classification of lung tumor histopathological section images to assist clinical applications, namely KD_ConvNeXt. The proposed approach enables the student network (ConvNeXt) to extract knowledge from the intermediate feature layers of the teacher network (Swin Transformer), improving the feature extraction and fitting capabilities of ConvNeXt. Meanwhile, Swin Transformer provides soft labels containing information about the distribution of images in various categories, making the model focused more on the information carried by types with smaller sample sizes while training. Results: This work has designed many experiments on a clinical lung tumor image dataset, and the KD_ConvNeXt achieved a superior classification accuracy of 85.64% and an F1-score of 0.7717 compared with other advanced image classification methods.
Collapse
Affiliation(s)
- Zhaoliang Zheng
- South China Normal University, Guangzhou, China
- Key Lab on Cloud Security and Assessment Technology of Guangzhou, Guangzhou, China
- SCNU & VeChina Joint Lab on BlockChain Technology and Application, Guangzhou, China
| | - Henian Yao
- The First School of Clinical Medicine, Guangdong Medical University, Zhanjiang, China
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Chengchuang Lin
- South China Normal University, Guangzhou, China
- Key Lab on Cloud Security and Assessment Technology of Guangzhou, Guangzhou, China
- SCNU & VeChina Joint Lab on BlockChain Technology and Application, Guangzhou, China
| | - Kaixin Huang
- South China Normal University, Guangzhou, China
- Key Lab on Cloud Security and Assessment Technology of Guangzhou, Guangzhou, China
- SCNU & VeChina Joint Lab on BlockChain Technology and Application, Guangzhou, China
| | - Luoxuan Chen
- South China Normal University, Guangzhou, China
- Key Lab on Cloud Security and Assessment Technology of Guangzhou, Guangzhou, China
- SCNU & VeChina Joint Lab on BlockChain Technology and Application, Guangzhou, China
| | - Ziling Shao
- Jinan University-University of Birmingham Joint Institute at Jinan University, Guangdong, China
| | - Haiyu Zhou
- The First School of Clinical Medicine, Guangdong Medical University, Zhanjiang, China
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Gansen Zhao
- South China Normal University, Guangzhou, China
- Key Lab on Cloud Security and Assessment Technology of Guangzhou, Guangzhou, China
- SCNU & VeChina Joint Lab on BlockChain Technology and Application, Guangzhou, China
| |
Collapse
|
11
|
Zhang H, Li Z, Wang W, Hu L, Xu J, Yuan M, Wang Z, Ren Y, Ye Y. Multi-supervised bidirectional fusion network for road-surface condition recognition. PeerJ Comput Sci 2023; 9:e1446. [PMID: 37705628 PMCID: PMC10495952 DOI: 10.7717/peerj-cs.1446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/30/2023] [Indexed: 09/15/2023]
Abstract
Rapid developments in automatic driving technology have given rise to new experiences for passengers. Safety is a main priority in automatic driving. A strong familiarity with road-surface conditions during the day and night is essential to ensuring driving safety. Existing models used for recognizing road-surface conditions lack the required robustness and generalization abilities. Most studies only validated the performance of these models on daylight images. To address this problem, we propose a novel multi-supervised bidirectional fusion network (MBFN) model to detect weather-induced road-surface conditions on the path of automatic vehicles at both daytime and nighttime. We employed ConvNeXt to extract the basic features, which were further processed using a new bidirectional fusion module to create a fused feature. Then, the basic and fused features were concatenated to generate a refined feature with greater discriminative and generalization abilities. Finally, we designed a multi-supervised loss function to train the MBFN model based on the extracted features. Experiments were conducted using two public datasets. The results clearly demonstrated that the MBFN model could classify diverse road-surface conditions, such as dry, wet, and snowy conditions, with a satisfactory accuracy and outperform state-of-the-art baseline models. Notably, the proposed model has multiple variants that could also achieve competitive performances under different road conditions. The code for the MBFN model is shared at https://zenodo.org/badge/latestdoi/607014079.
Collapse
Affiliation(s)
- Hongbin Zhang
- School of Software, East China JiaoTong University, Nanchang, China
| | - Zhijie Li
- School of Software, East China JiaoTong University, Nanchang, China
| | - Wengang Wang
- School of Software, East China JiaoTong University, Nanchang, China
| | - Lang Hu
- School of Software, East China JiaoTong University, Nanchang, China
| | - Jiayue Xu
- School of Business School, Changzhou University, Changzhou, China
| | - Meng Yuan
- School of Software, East China JiaoTong University, Nanchang, China
| | - Zelin Wang
- School of Information Science and Technology, Nantong University, Nantong, China
| | - Yafeng Ren
- School of Interpreting and Translation Studies, Guangdong University of Foreign Studies, Guangzhou, China
| | - Yiyuan Ye
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| |
Collapse
|
12
|
He X, Wang Y, Poiesi F, Song W, Xu Q, Feng Z, Wan Y. Exploiting multi-granularity visual features for retinal layer segmentation in human eyes. Front Bioeng Biotechnol 2023; 11:1191803. [PMID: 37324431 PMCID: PMC10267414 DOI: 10.3389/fbioe.2023.1191803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/22/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate segmentation of retinal layer boundaries can facilitate the detection of patients with early ophthalmic disease. Typical segmentation algorithms operate at low resolutions without fully exploiting multi-granularity visual features. Moreover, several related studies do not release their datasets that are key for the research on deep learning-based solutions. We propose a novel end-to-end retinal layer segmentation network based on ConvNeXt, which can retain more feature map details by using a new depth-efficient attention module and multi-scale structures. In addition, we provide a semantic segmentation dataset containing 206 retinal images of healthy human eyes (named NR206 dataset), which is easy to use as it does not require any additional transcoding processing. We experimentally show that our segmentation approach outperforms state-of-the-art approaches on this new dataset, achieving, on average, a Dice score of 91.3% and mIoU of 84.4%. Moreover, our approach achieves state-of-the-art performance on a glaucoma dataset and a diabetic macular edema (DME) dataset, showing that our model is also suitable for other applications. We will make our source code and the NR206 dataset publicly available at (https://github.com/Medical-Image-Analysis/Retinal-layer-segmentation).
Collapse
Affiliation(s)
- Xiang He
- School of Mechanical Engineering, Shandong University, Jinan, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | | | | | - Weiye Song
- School of Mechanical Engineering, Shandong University, Jinan, China
| | - Quanqing Xu
- School of Mechanical Engineering, Shandong University, Jinan, China
| | - Zixuan Feng
- School of Mechanical Engineering, Shandong University, Jinan, China
| | - Yi Wan
- School of Mechanical Engineering, Shandong University, Jinan, China
| |
Collapse
|
13
|
Zhang H, Zhong X, Li G, Liu W, Liu J, Ji D, Li X, Wu J. BCU-Net: Bridging ConvNeXt and U-Net for medical image segmentation. Comput Biol Med 2023; 159:106960. [PMID: 37099973 DOI: 10.1016/j.compbiomed.2023.106960] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 04/12/2023] [Accepted: 04/17/2023] [Indexed: 04/28/2023]
Abstract
Medical image segmentation enables doctors to observe lesion regions better and make accurate diagnostic decisions. Single-branch models such as U-Net have achieved great progress in this field. However, the complementary local and global pathological semantics of heterogeneous neural networks have not yet been fully explored. The class-imbalance problem remains a serious issue. To alleviate these two problems, we propose a novel model called BCU-Net, which leverages the advantages of ConvNeXt in global interaction and U-Net in local processing. We propose a new multilabel recall loss (MRL) module to relieve the class imbalance problem and facilitate deep-level fusion of local and global pathological semantics between the two heterogeneous branches. Extensive experiments were conducted on six medical image datasets including retinal vessel and polyp images. The qualitative and quantitative results demonstrate the superiority and generalizability of BCU-Net. In particular, BCU-Net can handle diverse medical images with diverse resolutions. It has a flexible structure owing to its plug-and-play characteristics, which promotes its practicality.
Collapse
Affiliation(s)
- Hongbin Zhang
- School of Software, East China Jiaotong University, China.
| | - Xiang Zhong
- School of Software, East China Jiaotong University, China.
| | - Guangli Li
- School of Information Engineering, East China Jiaotong University, China.
| | - Wei Liu
- School of Software, East China Jiaotong University, China.
| | - Jiawei Liu
- School of Software, East China Jiaotong University, China.
| | - Donghong Ji
- School of Cyber Science and Engineering, Wuhan University, China.
| | - Xiong Li
- School of Software, East China Jiaotong University, China.
| | - Jianguo Wu
- The Second Affiliated Hospital of Nanchang University, China.
| |
Collapse
|
14
|
Encío L, Díaz C, del-Blanco CR, Jaureguizar F, García N. Visual Parking Occupancy Detection Using Extended Contextual Image Information via a Multi-Branch Output ConvNeXt Network. Sensors (Basel) 2023; 23:3329. [PMID: 36992039 PMCID: PMC10051634 DOI: 10.3390/s23063329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/09/2023] [Accepted: 03/20/2023] [Indexed: 06/19/2023]
Abstract
Along with society's development, transportation has become a key factor in human daily life, increasing the number of vehicles on the streets. Consequently, the task of finding free parking slots in metropolitan areas can be dramatically challenging, increasing the chance of getting involved in an accident and the carbon footprint, and negatively affecting the driver's health. Therefore, technological resources to deal with parking management and real-time monitoring have become key players in this scenario to speed up the parking process in urban areas. This work proposes a new computer-vision-based system that detects vacant parking spaces in challenging situations using color imagery processed by a novel deep-learning algorithm. This is based on a multi-branch output neural network that maximizes the contextual image information to infer the occupancy of every parking space. Every output infers the occupancy of a specific parking slot using all the input image information, unlike existing approaches, which only use a neighborhood around every slot. This allows it to be very robust to changing illumination conditions, different camera perspectives, and mutual occlusions between parked cars. An extensive evaluation has been performed using several public datasets, proving that the proposed system outperforms existing approaches.
Collapse
|
15
|
Jin Y, Lu H, Zhu W, Huo W. Deep learning based classification of multi-label chest X-ray images via dual-weighted metric loss. Comput Biol Med 2023; 157:106683. [PMID: 36905869 DOI: 10.1016/j.compbiomed.2023.106683] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/17/2022] [Accepted: 11/06/2022] [Indexed: 02/17/2023]
Abstract
-Thoracic disease, like many other diseases, can lead to complications. Existing multi-label medical image learning problems typically include rich pathological information, such as images, attributes, and labels, which are crucial for supplementary clinical diagnosis. However, the majority of contemporary efforts exclusively focus on regression from input to binary labels, ignoring the relationship between visual features and semantic vectors of labels. In addition, there is an imbalance in data amount between diseases, which frequently causes intelligent diagnostic systems to make erroneous disease predictions. Therefore, we aim to improve the accuracy of the multi-label classification of chest X-ray images. Chest X-ray14 pictures were utilized as the multi-label dataset for the experiments in this study. By fine-tuning the ConvNeXt network, we got visual vectors, which we combined with semantic vectors encoded by BioBert to map the two different forms of features into a common metric space and made semantic vectors the prototype of each class in metric space. The metric relationship between images and labels is then considered from the image level and disease category level, respectively, and a new dual-weighted metric loss function is proposed. Finally, the average AUC score achieved in the experiment reached 0.826, and our model outperformed the comparison models.
Collapse
Affiliation(s)
- Yufei Jin
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| | - Huijuan Lu
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| | - Wenjie Zhu
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| | - Wanli Huo
- College of Information Engineering, China Jiliang University, Hangzhou, China.
| |
Collapse
|
16
|
Jiang L, Yuan B, Ma W, Wang Y. JujubeNet: A high-precision lightweight jujube surface defect classification network with an attention mechanism. Front Plant Sci 2023; 13:1108437. [PMID: 36743544 PMCID: PMC9889997 DOI: 10.3389/fpls.2022.1108437] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 12/29/2022] [Indexed: 06/18/2023]
Abstract
Surface Defect Detection (SDD) is a significant research content in Industry 4.0 field. In the real complex industrial environment, SDD is often faced with many challenges, such as small difference between defect imaging and background, low contrast, large variation of defect scale and diverse types, and large amount of noise in defect images. Jujubes are naturally growing plants, and the appearance of the same type of surface defect can vary greatly, so it is more difficult than industrial products produced according to the prescribed process. In this paper, a ConvNeXt-based high-precision lightweight classification network JujubeNet is presented to address the practical needs of Jujube Surface Defect (JSD) classification. In the proposed method, a Multi-branching module using Depthwise separable Convolution (MDC) is designed to extract more feature information through multi-branching and substantially reduces the number of parameters in the model by using depthwise separable convolutions. What's more, in our proposed method, the Convolutional Block Attention Module (CBAM) is introduced to make the model concentrate on different classes of JSD features. The proposed JujubeNet is compared with other mainstream networks in the actual production environment. The experimental results show that the proposed JujubeNet can achieve 99.1% classification accuracy, which is significantly better than the current mainstream classification models. The FLOPS and parameters are only 30.7% and 30.6% of ConvNeXt-Tiny respectively, indicating that the model can quickly and effectively classify JSD and is of great practical value.
Collapse
Affiliation(s)
- Lingjie Jiang
- School of Electronic Information, Xijing University, Xi’an, China
- Shaanxi Key Laboratory of Integrated and Intelligent Navigation, The 20th Research Institute of China Electronics Technology Group Corporation, Xi’an, China
- Xi’an Key Laboratory of High Precision Industrial Intelligent Vision Measurement Technology, Xijing University, Xi’an, China
| | - Baoxi Yuan
- School of Electronic Information, Xijing University, Xi’an, China
- Shaanxi Key Laboratory of Integrated and Intelligent Navigation, The 20th Research Institute of China Electronics Technology Group Corporation, Xi’an, China
- Xi’an Key Laboratory of High Precision Industrial Intelligent Vision Measurement Technology, Xijing University, Xi’an, China
| | - Wenyun Ma
- Humanities Teaching Department, Gansu University of Chinese Medicine, Dingxi, China
| | - Yuqian Wang
- Graduate Office, Xijing University, Xi’an, China
| |
Collapse
|
17
|
Tian G, Wang Z, Wang C, Chen J, Liu G, Xu H, Lu Y, Han Z, Zhao Y, Li Z, Luo X, Peng L. A deep ensemble learning-based automated detection of COVID-19 using lung CT images and Vision Transformer and ConvNeXt. Front Microbiol 2022; 13:1024104. [PMID: 36406463 PMCID: PMC9672374 DOI: 10.3389/fmicb.2022.1024104] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 09/16/2022] [Indexed: 09/19/2023] Open
Abstract
Since the outbreak of COVID-19, hundreds of millions of people have been infected, causing millions of deaths, and resulting in a heavy impact on the daily life of countless people. Accurately identifying patients and taking timely isolation measures are necessary ways to stop the spread of COVID-19. Besides the nucleic acid test, lung CT image detection is also a path to quickly identify COVID-19 patients. In this context, deep learning technology can help radiologists identify COVID-19 patients from CT images rapidly. In this paper, we propose a deep learning ensemble framework called VitCNX which combines Vision Transformer and ConvNeXt for COVID-19 CT image identification. We compared our proposed model VitCNX with EfficientNetV2, DenseNet, ResNet-50, and Swin-Transformer which are state-of-the-art deep learning models in the field of image classification, and two individual models which we used for the ensemble (Vision Transformer and ConvNeXt) in binary and three-classification experiments. In the binary classification experiment, VitCNX achieves the best recall of 0.9907, accuracy of 0.9821, F1-score of 0.9855, AUC of 0.9985, and AUPR of 0.9991, which outperforms the other six models. Equally, in the three-classification experiment, VitCNX computes the best precision of 0.9668, an accuracy of 0.9696, and an F1-score of 0.9631, further demonstrating its excellent image classification capability. We hope our proposed VitCNX model could contribute to the recognition of COVID-19 patients.
Collapse
Affiliation(s)
- Geng Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Ziwei Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Chang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Jianhua Chen
- Hunan Storm Information Technology Co., Ltd., Changsha, China
| | - Guangyi Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - He Xu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yuankang Lu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Zhuoran Han
- High School Attached to Northeast Normal University, Changchun, China
| | - Yubo Zhao
- No. 2 Middle School of Shijiazhuang, Shijiazhuang, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Xueming Luo
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|