1
|
Kong F, Wang X, Xiang J, Yang S, Wang X, Yue M, Zhang J, Zhao J, Han X, Dong Y, Zhu B, Wang F, Liu Y. Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading. Comput Struct Biotechnol J 2024; 23:1439-1449. [PMID: 38623561 PMCID: PMC11016961 DOI: 10.1016/j.csbj.2024.03.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 03/29/2024] [Accepted: 03/29/2024] [Indexed: 04/17/2024] Open
Abstract
Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data. This study introduces a federated attention-consistent learning (FACL) framework to address challenges associated with large-scale pathological images and data heterogeneity. FACL enhances model generalization by maximizing attention consistency between local clients and the server model. To ensure privacy and validate robustness, we incorporated differential privacy by introducing noise during parameter transfer. We assessed the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole-slide images of prostate cancer from multiple centers. In the diagnosis task, FACL achieved an area under the curve (AUC) of 0.9718, outperforming seven centers with an average AUC of 0.9499 when categories are relatively balanced. For the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards.
Collapse
Affiliation(s)
- Fei Kong
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xiyue Wang
- College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China
| | | | - Sen Yang
- AI Lab, Tencent, Shenzhen, 518057, China
| | - Xinran Wang
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, 050035, China
| | - Meng Yue
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, 050035, China
| | - Jun Zhang
- AI Lab, Tencent, Shenzhen, 518057, China
| | - Junhan Zhao
- Massachusetts General Hospital, Boston, MA, 02114, United States
- Harvard T.H. Chan School of Public Health, Boston, MA, 02115, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, United States
| | - Xiao Han
- AI Lab, Tencent, Shenzhen, 518057, China
| | - Yuhan Dong
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Biyue Zhu
- Department of Pharmacy, Children's Hospital of Chongqing Medical University, Chongqing, 400014, China
| | - Fang Wang
- Department of Pathology, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, 264000, China
| | - Yueping Liu
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, 050035, China
| |
Collapse
|
2
|
Tian C, Xiao J, Zhang B, Zuo W, Zhang Y, Lin CW. A self-supervised network for image denoising and watermark removal. Neural Netw 2024; 174:106218. [PMID: 38518709 DOI: 10.1016/j.neunet.2024.106218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 10/18/2023] [Accepted: 02/27/2024] [Indexed: 03/24/2024]
Abstract
In image watermark removal, popular methods depend on given reference non-watermark images in a supervised way to remove watermarks. However, reference non-watermark images are difficult to be obtained in the real world. At the same time, they often suffer from the influence of noise when captured by digital devices. To resolve these issues, in this paper, we present a self-supervised network for image denoising and watermark removal (SSNet). SSNet uses a parallel network in a self-supervised learning way to remove noise and watermarks. Specifically, each sub-network contains two sub-blocks. The upper sub-network uses the first sub-block to remove noise, according to noise-to-noise. Then, the second sub-block in the upper sub-network is used to remove watermarks, according to the distributions of watermarks. To prevent the loss of important information, the lower sub-network is used to simultaneously learn noise and watermarks in a self-supervised learning way. Moreover, two sub-networks interact via attention to extract more complementary salient information. The proposed method does not depend on paired images to learn a blind denoising and watermark removal model, which is very meaningful for real applications. Also, it is more effective than the popular image watermark removal methods in public datasets. Codes can be found at https://github.com/hellloxiaotian/SSNet.
Collapse
Affiliation(s)
- Chunwei Tian
- PAMI Research Group, University of Macau, 999078, Macao Special Administrative Region of China
| | - Jingyu Xiao
- School of Computer Science, Central South University, Changsha, 410083, China
| | - Bob Zhang
- PAMI Research Group, University of Macau, 999078, Macao Special Administrative Region of China.
| | - Wangmeng Zuo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yudong Zhang
- School of Computing and Mathematics, University of Leicester, Leicester, LE1 7RH, UK
| | - Chia-Wen Lin
- Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
3
|
Li Q, Feng B, Tang X, Yu H, Song H. MuLAN: Multi-level attention-enhanced matching network for few-shot knowledge graph completion. Neural Netw 2024; 174:106222. [PMID: 38442490 DOI: 10.1016/j.neunet.2024.106222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 01/23/2024] [Accepted: 02/27/2024] [Indexed: 03/07/2024]
Abstract
Recent years have witnessed increasing interest in the few-shot knowledge graph completion due to its potential to augment the coverage of few-shot relations in knowledge graphs. Existing methods often use the one-hop neighbors of the entity to enhance its embedding and match the query instance and support set at the instance level. However, such methods cannot handle inter-neighbor interaction, local entity matching and the varying significance of feature dimensions. To bridge this gap, we propose the Multi-Level Attention-enhanced matching Network (MuLAN) for few-shot knowledge graph completion. In MuLAN, a multi-head self-attention neighbor encoder is designed to capture the inter-neighbor interaction and learn the entity embeddings. Then, entity-level attention and instance-level attention are responsible for matching the query instance and support set from the local and global perspectives, respectively, while feature-level attention is utilized to calculate the weights of the feature dimensions. Furthermore, we design a consistency constraint to ensure the support instance embeddings are close to each other. Extensive experiments based on two well-known datasets (i.e., NELL-One and Wiki-One) demonstrate significant advantages of MuLAN over 11 state-of-the-art competitors. Compared to the best-performing baseline, MuLAN achieves 14.5% higher MRR and 13.3% higher Hits@K on average.
Collapse
Affiliation(s)
- Qianyu Li
- School of Software Engineering, South China University of Technology, Guangzhou, China
| | - Bozheng Feng
- School of Software Engineering, South China University of Technology, Guangzhou, China
| | - Xiaoli Tang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Han Yu
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Hengjie Song
- School of Software Engineering, South China University of Technology, Guangzhou, China.
| |
Collapse
|
4
|
Fu K, Li H, Shi X. CTF-former: A novel simplified multi-task learning strategy for simultaneous multivariate chaotic time series prediction. Neural Netw 2024; 174:106234. [PMID: 38521015 DOI: 10.1016/j.neunet.2024.106234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 02/22/2024] [Accepted: 03/11/2024] [Indexed: 03/25/2024]
Abstract
Multivariate chaotic time series prediction is a challenging task, especially when multiple variables are predicted simultaneously. For multiple related prediction tasks typically require multiple models, however, multiple models are difficult to keep synchronization, making immediate communication between predicted values challenging. Although multi-task learning can be applied to this problem, the principles of allocation and layout options between shared and specific representations are ambiguous. To address this issue, a novel simplified multi-task learning method was proposed for the precise implementation of simultaneous multiple chaotic time series prediction tasks. The scheme proposed consists of a cross-convolution operator designed to capture variable correlations and sequence correlations, and an attention module proposed to capture the information embedded in the sequence structure. In the attention module, a non-linear transformation was implemented with convolution, and its local receptive field and the global dependency of the attention mechanism achieve complementarity. In addition, an attention weight calculation was devised that takes into account not only the synergy of time and frequency domain features, but also the fusion of series and channel information. Notably the scheme proposed a purely simplified design principle of multi-task learning by reducing the specific network to single neuron. The precision of the proposed solution and its potential for engineering applications were verified with the Lorenz system and power consumption. The mean absolute error of the proposed method was reduced by an average of 82.9% in the Lorenz system and 19.83% in power consumption compared to the Gated Recurrent Unit.
Collapse
Affiliation(s)
- Ke Fu
- School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China
| | - He Li
- School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China.
| | - Xiaotian Shi
- School of Mechanical Engineering & Automation, Northeastern University, Shenyang 110819, China
| |
Collapse
|
5
|
Deng R, Cui C, Remedios LW, Bao S, Womick RM, Chiron S, Li J, Roland JT, Lau KS, Liu Q, Wilson KT, Wang Y, Coburn LA, Landman BA, Huo Y. Cross-scale multi-instance learning for pathological image diagnosis. Med Image Anal 2024; 94:103124. [PMID: 38428271 PMCID: PMC11016375 DOI: 10.1016/j.media.2024.103124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 02/16/2024] [Accepted: 02/26/2024] [Indexed: 03/03/2024]
Abstract
Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20× magnification) of WSIs, disregarding the vital inter-scale information that is key to diagnoses by human pathologists. In this study, we propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale relationships into a single MIL network for pathological image diagnosis. The contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL) algorithm that integrates the multi-scale information and the inter-scale relationships is proposed; (2) A toy dataset with scale-specific morphological features is created and released to examine and visualize differential cross-scale attention; (3) Superior performance on both in-house and public datasets is demonstrated by our simple cross-scale MIL strategy. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.
Collapse
Affiliation(s)
| | - Can Cui
- Vanderbilt University, Nashville, TN 37215, USA
| | | | | | - R Michael Womick
- The University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Sophie Chiron
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jia Li
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Joseph T Roland
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Ken S Lau
- Vanderbilt University, Nashville, TN 37215, USA
| | - Qi Liu
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Keith T Wilson
- Vanderbilt University Medical Center, Nashville, TN 37232, USA; Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, USA
| | - Yaohong Wang
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Lori A Coburn
- Vanderbilt University Medical Center, Nashville, TN 37232, USA; Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, USA
| | - Bennett A Landman
- Vanderbilt University, Nashville, TN 37215, USA; Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yuankai Huo
- Vanderbilt University, Nashville, TN 37215, USA.
| |
Collapse
|
6
|
Bao LL, Zhang JS, Zhang CX. Spatial multi-attention conditional neural processes. Neural Netw 2024; 173:106201. [PMID: 38447305 DOI: 10.1016/j.neunet.2024.106201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 01/03/2024] [Accepted: 02/20/2024] [Indexed: 03/08/2024]
Abstract
Spatial prediction tasks are challenging when observed samples are sparse and prediction samples are abundant. Gaussian processes (GPs) are commonly used in spatial prediction tasks and have the advantage of measuring the uncertainty of the interpolation result. However, as the sample size increases, GPs suffer from significant overhead. Standard neural networks (NNs) provide a powerful and scalable solution for modeling spatial data, but they often overfit small sample data. Based on conditional neural processes (CNPs), which combine the advantages of GPs and NNs, we propose a new framework called Spatial Multi-Attention Conditional Neural Processes (SMACNPs) for spatial small sample prediction tasks. SMACNPs are a modular model that can predict targets by employing different attention mechanisms to extract relevant information from different forms of sample data. The task representation is inferred by measuring the spatial correlation contained in different sample points and the relationship contained in attribute variables, respectively. The distribution of the target variable is predicted by GPs parameterized by NNs. SMACNPs allow us to obtain accurate predictions of the target value while quantifying the prediction uncertainty. Experiments on spatial prediction tasks on simulated and real-world datasets demonstrate that this framework flexibly incorporates spatial context and correlation into the model, achieving state-of-the-art results in spatial small sample prediction tasks in terms of both predictive performance and reliability. For example, on the California housing dataset, our method reduces MAE by 8% and MSE by 7% compared to the second-best method. In addition, a spatiotemporal prediction task to forecast traffic speed further confirms the effectiveness and generality of our method.
Collapse
Affiliation(s)
- Li-Li Bao
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an Shaanxi, 710049, China
| | - Jiang-She Zhang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an Shaanxi, 710049, China.
| | - Chun-Xia Zhang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an Shaanxi, 710049, China
| |
Collapse
|
7
|
Zhang S, He C, Wan Z, Shi N, Wang B, Liu X, Hou D. Diagnosis of pulmonary tuberculosis with 3D neural network based on multi-scale attention mechanism. Med Biol Eng Comput 2024; 62:1589-1600. [PMID: 38319503 DOI: 10.1007/s11517-024-03022-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/03/2024] [Indexed: 02/07/2024]
Abstract
This paper presents a novel multi-scale attention residual network (MAResNet) for diagnosing patients with pulmonary tuberculosis (PTB) by computed tomography (CT) images. First, a three-dimensional (3D) network structure is applied in MAResNet based on the continuity and correlation of nodal features on different slices of CT images. Secondly, MAResNet incorporates the residual module and Convolutional Block Attention Module (CBAM) to reuse the shallow features of CT images and focus on key features to enhance the feature distinguishability of images. In addition, multi-scale inputs can increase the global receptive field of the network, extract the location information of PTB, and capture the local details of nodules. The expression ability of both high-level and low-level semantic information in the network can also be enhanced. The proposed MAResNet shows excellent results, with overall 94% accuracy in PTB classification. MAResNet based on 3D CT images can assist doctors make more accurate diagnosis of PTB and alleviate the burden of manual screening. In the experiment, a called Grad-CAM was employed to enhance the class activation mapping (CAM) technique for analyzing the model's output, which can identify lesions in important parts of the lungs and make transparent decisions.
Collapse
Affiliation(s)
- Shidong Zhang
- Key Laboratory of Digital Medical Engineering of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China
| | - Cong He
- Key Laboratory of Digital Medical Engineering of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China.
| | - Zhenzhen Wan
- Key Laboratory of Digital Medical Engineering of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China
| | - Ning Shi
- Key Laboratory of Digital Medical Engineering of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China
| | - Bing Wang
- Department of Radiology, Beijing Chest Hospital, Capital Medical University, Beijing, 101149, China.
| | - Xiuling Liu
- Key Laboratory of Digital Medical Engineering of Hebei Province, College of Electronic and Information Engineering, Hebei University, Baoding, 071002, China
| | - Dailun Hou
- Department of Radiology, Beijing Chest Hospital, Capital Medical University, Beijing, 101149, China.
| |
Collapse
|
8
|
Nissar I, Alam S, Masood S, Kashif M. MOB-CBAM: A dual-channel attention-based deep learning generalizable model for breast cancer molecular subtypes prediction using mammograms. Comput Methods Programs Biomed 2024; 248:108121. [PMID: 38531147 DOI: 10.1016/j.cmpb.2024.108121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/15/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024]
Abstract
BACKGROUND AND OBJECTIVE Deep Learning models have emerged as a significant tool in generating efficient solutions for complex problems including cancer detection, as they can analyze large amounts of data with high efficiency and performance. Recent medical studies highlight the significance of molecular subtype detection in breast cancer, aiding the development of personalized treatment plans as different subtypes of cancer respond better to different therapies. METHODS In this work, we propose a novel lightweight dual-channel attention-based deep learning model MOB-CBAM that utilizes the backbone of MobileNet-V3 architecture with a Convolutional Block Attention Module to make highly accurate and precise predictions about breast cancer. We used the CMMD mammogram dataset to evaluate the proposed model in our study. Nine distinct data subsets were created from the original dataset to perform coarse and fine-grained predictions, enabling it to identify masses, calcifications, benign, malignant tumors and molecular subtypes of cancer, including Luminal A, Luminal B, HER-2 Positive, and Triple Negative. The pipeline incorporates several image pre-processing techniques, including filtering, enhancement, and normalization, for enhancing the model's generalization ability. RESULTS While identifying benign versus malignant tumors, i.e., coarse-grained classification, the MOB-CBAM model produced exceptional results with 99 % accuracy, precision, recall, and F1-score values of 0.99 and MCC of 0.98. In terms of fine-grained classification, the MOB-CBAM model has proven to be highly efficient in accurately identifying mass with (benign/malignant) and calcification with (benign/malignant) classification tasks with an impressive accuracy rate of 98 %. We have also cross-validated the efficiency of the proposed MOB-CBAM deep learning architecture on two datasets: MIAS and CBIS-DDSM. On the MIAS dataset, an accuracy of 97 % was reported for the task of classifying benign, malignant, and normal images, while on the CBIS-DDSM dataset, an accuracy of 98 % was achieved for the classification of mass with either benign or malignant, and calcification with benign and malignant tumors. CONCLUSION This study presents lightweight MOB-CBAM, a novel deep learning framework, to address breast cancer diagnosis and subtype prediction. The model's innovative incorporation of the CBAM enhances precise predictions. The extensive evaluation of the CMMD dataset and cross-validation on other datasets affirm the model's efficacy.
Collapse
Affiliation(s)
- Iqra Nissar
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India.
| | - Shahzad Alam
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India
| | - Sarfaraz Masood
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India
| | - Mohammad Kashif
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India
| |
Collapse
|
9
|
Ren J, An N, Zhang Y, Wang D, Sun Z, Lin C, Cui W, Wang W, Zhou Y, Zhang W, Hu Q, Zhang P, Hu D, Wang D, Liu H. SUGAR: Spherical ultrafast graph attention framework for cortical surface registration. Med Image Anal 2024; 94:103122. [PMID: 38428270 DOI: 10.1016/j.media.2024.103122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 01/25/2024] [Accepted: 02/22/2024] [Indexed: 03/03/2024]
Abstract
Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration to enhance the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 min. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies.
Collapse
Affiliation(s)
| | - Ning An
- Changping Laboratory, Beijing, China
| | | | | | | | - Cong Lin
- Changping Laboratory, Beijing, China
| | - Weigang Cui
- School of Engineering Medicine, Beihang University, Beijing, China
| | | | - Ying Zhou
- Changping Laboratory, Beijing, China
| | - Wei Zhang
- Changping Laboratory, Beijing, China; Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Qingyu Hu
- Changping Laboratory, Beijing, China
| | | | - Dan Hu
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA
| | - Danhong Wang
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA, USA
| | - Hesheng Liu
- Changping Laboratory, Beijing, China; Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
| |
Collapse
|
10
|
Kim SH, Kim DY, Chun SW, Kim J, Woo J. Impartial feature selection using multi-agent reinforcement learning for adverse glycemic event prediction. Comput Biol Med 2024; 173:108257. [PMID: 38520922 DOI: 10.1016/j.compbiomed.2024.108257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/02/2024] [Accepted: 03/06/2024] [Indexed: 03/25/2024]
Abstract
We developed an attention model to predict future adverse glycemic events 30 min in advance based on the observation of past glycemic values over a 35 min period. The proposed model effectively encodes insulin administration and meal intake time using Time2Vec (T2V) for glucose prediction. The proposed impartial feature selection algorithm is designed to distribute rewards proportionally according to agent contributions. Agent contributions are calculated by a step-by-step negation of updated agents. Thus, the proposed feature selection algorithm optimizes features from electronic medical records to improve performance. For evaluation, we collected continuous glucose monitoring data from 102 patients with type 2 diabetes admitted to Cheonan Hospital, Soonchunhyang University. Using our proposed model, we achieved F1-scores of 89.0%, 60.6%, and 89.8% for normoglycemia, hypoglycemia, and hyperglycemia, respectively.
Collapse
Affiliation(s)
- Seo-Hee Kim
- Department of ICT Convergence, Soonchunhyang University, Asan, South Korea
| | - Dae-Yeon Kim
- Department of Internal Medicine, Soonchunhyang University Cheonan Hospital, Cheonan, South Korea.
| | - Sung-Wan Chun
- Department of Internal Medicine, Soonchunhyang University Cheonan Hospital, Cheonan, South Korea
| | - Jaeyun Kim
- Department of AI and Big Data, Soonchunhyang University, Asan, South Korea
| | - Jiyoung Woo
- Department of AI and Big Data, Soonchunhyang University, Asan, South Korea.
| |
Collapse
|
11
|
Gao M, Zhang D, Chen Y, Zhang Y, Wang Z, Wang X, Li S, Guo Y, Webb GI, Nguyen ATN, May L, Song J. GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction. Comput Biol Med 2024; 173:108339. [PMID: 38547658 DOI: 10.1016/j.compbiomed.2024.108339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/05/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.
Collapse
Affiliation(s)
- Mengmeng Gao
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Daokun Zhang
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia.
| | - Yi Chen
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Lauren May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
| |
Collapse
|
12
|
Chen Y, Zhan W, Jiang Y, Zhu D, Xu X, Hao Z, Li J, Guo J. A feature refinement and adaptive generative adversarial network for thermal infrared image colorization. Neural Netw 2024; 173:106184. [PMID: 38387204 DOI: 10.1016/j.neunet.2024.106184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/13/2024] [Accepted: 02/15/2024] [Indexed: 02/24/2024]
Abstract
Colorizing thermal infrared images poses a significant challenge as current methods struggle with issues such as unrealistic color saturation and limited texture. To address these challenges, we propose the Feature Refinement and Adaptive Generative Adversarial Network (FRAGAN). Our approach enhances the detailed, semantic, and contextual capabilities of image coloring by combining multi-level interactions that integrate the lost detailed information from the encoding stage with the semantic information from the decoding stage. Additionally, we introduce the Residual Feature Refinement Module (RFRM) to improve both the accuracy and generalization ability of the model, thereby elevating the quality of colorization results. The Feature Adaptation Module (FAM) is employed to mitigate sub-region information loss during downsampling. Furthermore, we introduce the Trinity Attention Module (TAM) to accurately capture the spatial and channel-wise interaction features of local semantic information. Extensive experimentation on the KAIST dataset and the FLIR dataset demonstrates the superiority of our proposed FRAGAN methodology, surpassing both the performance metrics and visual quality of current state-of-the-art methods. The colorized images generated by our proposed FRAGAN exhibit enhanced clarity and realism. Our code and models are available at GitHub.
Collapse
Affiliation(s)
- Yu Chen
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China
| | - Weida Zhan
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China.
| | - Yichun Jiang
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China
| | - Depeng Zhu
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China
| | - Xiaoyu Xu
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China
| | - Ziqiang Hao
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China
| | - Jin Li
- Beihang University, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
| | - Jinxin Guo
- Changchun University of Science and Technology National Demonstration Center for Experimental Electrical, Changchun, Jilin, 130022, China
| |
Collapse
|
13
|
Ma W, Chen H, Zhang W, Huang H, Wu J, Peng X, Sun Q. DSYOLO-trash: An attention mechanism-integrated and object tracking algorithm for solid waste detection. Waste Manag 2024; 178:46-56. [PMID: 38377768 DOI: 10.1016/j.wasman.2024.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/29/2023] [Accepted: 02/07/2024] [Indexed: 02/22/2024]
Abstract
In a global context, the production of urban solid waste significantly varies with changes in living standards. This trend exhibits diversity across different countries and regions, reflecting shifts in lifestyles as well as varying needs and challenges in waste management strategies. However, current standards of waste recycling are too complex for the general public to follow. In this study, we propose a model called DSYOLO-Trash to identify solid waste by integrating the dual attention mechanisms convolutional block attention module (CBAM) and Contextual Transformer Networks(CotNet), which significantly enhance its ability to mine channel-related and spatial attention features while optimizing the learning process. We apply the deep simple online and realtime tracking (DeepSORT) object tracking algorithm to solid waste detection for the first time in the literature to enable the real-time identification and tracking of waste. We also develop a multi-label dataset of mixed solid waste, called MMTrash, to realistically simulate actual scenarios of waste classification. Our proposed DSYOLO-Trash delivered superior performance to classical detection algorithms on both the MMTrash and the TrashNet datasets. Our system combines the improved you only look once(YOLO) algorithm with DeepSORT technology by using industrial cameras and PLC-controlled robotic arms to intelligently sort waste. The work here constitutes an important contribution to intelligent waste management and the sustainable development of cities.
Collapse
Affiliation(s)
- Wanqi Ma
- School of Business, Jiangnan University, Wuxi 214122, PR China; Research Institute of National Security and Green Development, Jiangnan University, Wuxi 214122, PR China
| | - Hong Chen
- School of Business, Jiangnan University, Wuxi 214122, PR China; Research Institute of National Security and Green Development, Jiangnan University, Wuxi 214122, PR China.
| | - Wenkang Zhang
- State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, PR China
| | - Han Huang
- School of Economics and Management, China University of Mining and Technology, Xuzhou 221116, PR China
| | - Jian Wu
- School of Business, Jiangnan University, Wuxi 214122, PR China; Research Institute of National Security and Green Development, Jiangnan University, Wuxi 214122, PR China
| | - Xu Peng
- School of Business, Jiangnan University, Wuxi 214122, PR China; Research Institute of National Security and Green Development, Jiangnan University, Wuxi 214122, PR China
| | - Qingqing Sun
- School of Economics and Management, China University of Mining and Technology, Xuzhou 221116, PR China
| |
Collapse
|
14
|
Liang W, Muhammad Rehan Afzal H, Qiao Y, Fan A, Wang F, Hu Y, Yang P. Estimation of electrical muscle activity during gait using inertial measurement units with convolution attention neural network and small-scale dataset. J Biomech 2024; 167:112093. [PMID: 38615480 DOI: 10.1016/j.jbiomech.2024.112093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 04/16/2024]
Abstract
In general, muscle activity can be directly measured using Electromyography (EMG) or calculated with musculoskeletal models. However, both methods are not suitable for non-technical users and unstructured environments. It is desired to establish more portable and easy-to-use muscle activity estimation methods. Deep learning (DL) models combined with inertial measurement units (IMUs) have shown great potential to estimate muscle activity. However, it frequently occurs in clinical scenarios that a very small amount of data is available and leads to limited performance of the DL models, while the augmentation techniques to efficiently expand a small sample size for DL model training are rarely used. The primary aim of the present study was to develop a novel DL model to estimate the EMG envelope during gait using IMUs with high accuracy. A secondary aim was to develop a novel model-based data augmentation method to improve the performance of the estimation model with small-scale dataset. Therefore, in the present study, a time convolutional network-based generative adversarial network, namely MuscleGAN, was proposed for data augmentation. Moreover, a subject-independent regression DL model was developed to estimate EMG envelope. Results suggested that the proposed two-stage method has better generalization and estimation performance than the commonly used existing methods. Pearson correlation coefficient and normalized root-mean-square errors derived from the proposed method reached up to 0.72 and 0.13, respectively. It was indicated that the MuscleGAN indeed improved the estimation accuracy of lower limb EMG envelope from 70% to 72%. Thus, even using only two IMUs and a very small-scale dataset, the proposed model is still capable of accurately estimating lower limb EMG envelope, demonstrating considerable potential for its application in clinical and daily life scenarios.
Collapse
Affiliation(s)
- Wenqi Liang
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Hafiz Muhammad Rehan Afzal
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yongyu Qiao
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Ao Fan
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Fanjie Wang
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yiwei Hu
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Pengfei Yang
- Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
15
|
Gao Y, Lv G, Xiao D, Han X, Sun T, Li Z. Research on steel surface defect classification method based on deep learning. Sci Rep 2024; 14:8254. [PMID: 38589514 PMCID: PMC11001973 DOI: 10.1038/s41598-024-58643-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/01/2024] [Indexed: 04/10/2024] Open
Abstract
Surface defects on steel, arising from factors like steel composition and manufacturing techniques, pose significant challenges to industrial production. Efficient and precise detection of these defects is crucial for enhancing production efficiency and product quality. In accordance with these requisites, this paper elects to undertake the detection task predicated on the you only look once (YOLO) algorithm. In this study, we propose a novel approach for surface flaw identification based on the YOLOv5 algorithm, called YOLOv5-KBS. This method integrates attention mechanism and weighted Bidirectional Feature Pyramid Network (BiFPN) into YOLOv5 architecture. Our method addresses issues of background interference and defect size variability in images. Experimental results show that the YOLOv5-KBS model achieves a notable 4.2% increase in mean Average Precision (mAP) and reaches a detection speed of 70 Frames Per Second (FPS), outperforming the baseline model. These findings underscore the effectiveness and potential applications of our proposed method in industrial settings.
Collapse
Affiliation(s)
- Yang Gao
- The State Key Laboratory of Rolling and Automation, Northeastern University, Shenyang, 110819, China
| | - Gang Lv
- Information Science and Engineering School, Northeastern University, Shenyang, 110819, China
| | - Dong Xiao
- Information Science and Engineering School, Northeastern University, Shenyang, 110819, China.
| | - Xize Han
- Information Science and Engineering School, Northeastern University, Shenyang, 110819, China
| | - Tao Sun
- The State Key Laboratory of Rolling and Automation, Northeastern University, Shenyang, 110819, China
| | - Zhenni Li
- Information Science and Engineering School, Northeastern University, Shenyang, 110819, China
| |
Collapse
|
16
|
Zhu Q, Zhuang H, Zhao M, Xu S, Meng R. A study on expression recognition based on improved mobilenetV2 network. Sci Rep 2024; 14:8121. [PMID: 38582772 PMCID: PMC10998880 DOI: 10.1038/s41598-024-58736-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 04/02/2024] [Indexed: 04/08/2024] Open
Abstract
This paper proposes an improved strategy for the MobileNetV2 neural network(I-MobileNetV2) in response to problems such as large parameter quantities in existing deep convolutional neural networks and the shortcomings of the lightweight neural network MobileNetV2 such as easy loss of feature information, poor real-time performance, and low accuracy rate in facial emotion recognition tasks. The network inherits the characteristics of MobilenetV2 depthwise separated convolution, signifying a reduction in computational load while maintaining a lightweight profile. It utilizes a reverse fusion mechanism to retain negative features, which makes the information less likely to be lost. The SELU activation function is used to replace the RELU6 activation function to avoid gradient vanishing. Meanwhile, to improve the feature recognition capability, the channel attention mechanism (Squeeze-and-Excitation Networks (SE-Net)) is integrated into the MobilenetV2 network. Experiments conducted on the facial expression datasets FER2013 and CK + showed that the proposed network model achieved facial expression recognition accuracies of 68.62% and 95.96%, improving upon the MobileNetV2 model by 0.72% and 6.14% respectively, and the parameter count decreased by 83.8%. These results empirically verify the effectiveness of the improvements made to the network model.
Collapse
Affiliation(s)
- Qiming Zhu
- College of Equipment Support and Management, Engineering University of PAP, Xi'an, 710086, China
| | - Hongwei Zhuang
- College of Equipment Support and Management, Engineering University of PAP, Xi'an, 710086, China.
| | - Mi Zhao
- Basic Education, Engineering University of PAP, Xi'an, 710086, China
| | - Shuangchao Xu
- College of Equipment Support and Management, Engineering University of PAP, Xi'an, 710086, China
| | - Rui Meng
- College of Military Basic Education, Engineering University of PAP, Xi'an, 710086, China
| |
Collapse
|
17
|
Romero-Oraá R, Herrero-Tudela M, López MI, Hornero R, García M. Attention-based deep learning framework for automatic fundus image processing to aid in diabetic retinopathy grading. Comput Methods Programs Biomed 2024; 249:108160. [PMID: 38583290 DOI: 10.1016/j.cmpb.2024.108160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 01/26/2024] [Accepted: 03/30/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND AND OBJECTIVE Early detection and grading of Diabetic Retinopathy (DR) is essential to determine an adequate treatment and prevent severe vision loss. However, the manual analysis of fundus images is time consuming and DR screening programs are challenged by the availability of human graders. Current automatic approaches for DR grading attempt the joint detection of all signs at the same time. However, the classification can be optimized if red lesions and bright lesions are independently processed since the task gets divided and simplified. Furthermore, clinicians would greatly benefit from explainable artificial intelligence (XAI) to support the automatic model predictions, especially when the type of lesion is specified. As a novelty, we propose an end-to-end deep learning framework for automatic DR grading (5 severity degrees) based on separating the attention of the dark structures from the bright structures of the retina. As the main contribution, this approach allowed us to generate independent interpretable attention maps for red lesions, such as microaneurysms and hemorrhages, and bright lesions, such as hard exudates, while using image-level labels only. METHODS Our approach is based on a novel attention mechanism which focuses separately on the dark and the bright structures of the retina by performing a previous image decomposition. This mechanism can be seen as a XAI approach which generates independent attention maps for red lesions and bright lesions. The framework includes an image quality assessment stage and deep learning-related techniques, such as data augmentation, transfer learning and fine-tuning. We used the architecture Xception as a feature extractor and the focal loss function to deal with data imbalance. RESULTS The Kaggle DR detection dataset was used for method development and validation. The proposed approach achieved 83.7 % accuracy and a Quadratic Weighted Kappa of 0.78 to classify DR among 5 severity degrees, which outperforms several state-of-the-art approaches. Nevertheless, the main result of this work is the generated attention maps, which reveal the pathological regions on the image distinguishing the red lesions and the bright lesions. These maps provide explainability to the model predictions. CONCLUSIONS Our results suggest that our framework is effective to automatically grade DR. The separate attention approach has proven useful for optimizing the classification. On top of that, the obtained attention maps facilitate visual interpretation for clinicians. Therefore, the proposed method could be a diagnostic aid for the early detection and grading of DR.
Collapse
Affiliation(s)
- Roberto Romero-Oraá
- Biomedical Engineering Group, University of Valladolid, Valladolid, 47011, Spain; Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Spain.
| | - María Herrero-Tudela
- Biomedical Engineering Group, University of Valladolid, Valladolid, 47011, Spain
| | - María I López
- Biomedical Engineering Group, University of Valladolid, Valladolid, 47011, Spain; Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Spain
| | - Roberto Hornero
- Biomedical Engineering Group, University of Valladolid, Valladolid, 47011, Spain; Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Spain
| | - María García
- Biomedical Engineering Group, University of Valladolid, Valladolid, 47011, Spain; Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Spain
| |
Collapse
|
18
|
Li J, Ai L, Yao R. NVAM-Net: deep learning networks for reconstructing high-quality fiber orientation distributions. Neuroradiology 2024:10.1007/s00234-024-03341-y. [PMID: 38563964 DOI: 10.1007/s00234-024-03341-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 03/19/2024] [Indexed: 04/04/2024]
Abstract
PURPOSE Diffusion magnetic resonance imaging (dMRI) is a widely used non-invasive method for investigating brain anatomical structures. Conventional techniques for estimating fiber orientation distribution (FOD) from dMRI data often neglect voxel-level spatial relationships, leading to ambiguous associations between target voxels and their neighbors, which, in turn, adversely impacts FOD accuracy. This study aims to address this issue by introducing a novel neural network, the neighboring voxel attention mechanism network (NVAM-Net), designed to reconstruct high-quality FOD images. METHODS The NVAM-Net leverages a Transformer architecture and incorporates two innovative attention mechanisms: voxel attention and surface attention. These mechanisms are specifically designed to capture overlooked features among neighboring voxels. The processed features are subsequently passed through two fully connected layers, further enhancing FOD estimation accuracy by separately estimating spherical harmonics (SH) coefficients of varying orders. RESULTS The experimental findings, based on the Human Connectome Project (HCP) dataset, reveal that the reconstructed super-resolution FOD images achieve results comparable to those obtained through more advanced dMRI acquisition protocols. These results underscore the NVAM-Net's robust performance in reconstructing multi-shell multi-tissue constrained spherical deconvolution (MSMT-CSD). CONCLUSION In summary, this research underscores the NVAM-Net's advantages and practical feasibility in reconstructing high-quality FOD images. It provides a reliable reference point for clinical applications in the field of diffusion magnetic resonance imaging.
Collapse
Affiliation(s)
- Jiahao Li
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Lingmei Ai
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Ruoxia Yao
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| |
Collapse
|
19
|
Bai X, Wei X, Wang Z, Zhang M. CONet: Crowd and occlusion-aware network for occluded human pose estimation. Neural Netw 2024; 172:106109. [PMID: 38232431 DOI: 10.1016/j.neunet.2024.106109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 11/30/2023] [Accepted: 01/05/2024] [Indexed: 01/19/2024]
Abstract
Human pose estimation has numerous applications in motion recognition, virtual reality, human-computer interaction, and other related fields. However, multi-person pose estimation in crowded and occluded scenes is challenging. One major issue about the current top-down human pose estimation approaches is that they are limited to predicting the pose of a single person, even when the bounding box contains multiple individuals. To address this problem, we propose a novel Crowd and Occlusion-aware Network (CONet) using a divide-and-conquer strategy. Our approach includes a Crowd and Occlusion-aware Head (COHead) which estimates the pose of both the occluder and the occluded person using two separate branches. We also use the attention mechanism to guide the branches for differentiated learning, aiming to improve feature representation. Additionally, we propose a novel interference point loss to enhance the model's anti-interference ability. Our CONet is simple yet effective, and it outperforms the state-of-the-art model by +1.6 AP, achieving 71.6 AP on CrowdPose. Our proposed model has achieved state-of-the-art results on the CrowdPose dataset, demonstrating its effectiveness in improving the accuracy of human pose estimation in crowded and occluded scenes. This achievement highlights the potential of our model in many real-world applications where accurate human pose estimation is crucial, such as surveillance, sports analysis, and human-computer interaction.
Collapse
Affiliation(s)
- Xiuxiu Bai
- School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Xing Wei
- School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Zengying Wang
- School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Miao Zhang
- School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
20
|
Serrão MKM, Costa MGF, Fujimoto LBM, Ogusku MM, Costa Filho CFF. Automatic bright-field smear microscopy for diagnosis of pulmonary tuberculosis. Comput Biol Med 2024; 172:108167. [PMID: 38461699 DOI: 10.1016/j.compbiomed.2024.108167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/19/2024] [Accepted: 02/15/2024] [Indexed: 03/12/2024]
Abstract
In recent decades, many studies have been published on the use of automatic smear microscopy for diagnosing pulmonary tuberculosis (TB). Most of them deal with a preliminary step of the diagnosis, the bacilli detection, whereas sputum smear microscopy for diagnosis of pulmonary TB comprises detecting and reporting the number of bacilli found in at least 100 microscopic fields, according to the 5 grading scales (negative, scanty, 1+, 2+ and 3+) endorsed by the World Health Organization (WHO). Pulmonary TB diagnosis in bright-field smear microscopy, however, depends upon the attention of a trained and motivated technician, while the automated TB diagnosis requires little or no interpretation by a technician. As far as we know, this work proposes the first automatic method for pulmonary TB diagnosis in bright-field smear microscopy, according to the WHO recommendations. The proposed method comprises a semantic segmentation step, using a deep neural network, followed by a filtering step aiming to reduce the number of false positives (false bacilli): color and shape filtering. In semantic segmentation, different configurations of encoders are evaluated, using depth-wise separable convolution layers and channel attention mechanism. The proposed method was evaluated with a large, robust, and annotated image dataset designed for this purpose, consisting of 250 testing sets, 50 sets for each of the 5 TB diagnostic classes. The following performance metrics were obtained for automatic pulmonary TB diagnosis by smear microscopy: mean precision of 0.894, mean recall of 0.896, and mean F1-score of 0.895.
Collapse
|
21
|
Sun S, Mei Z, Li X, Tang T, Su Z, Wu Y. A label information fused medical image report generation framework. Artif Intell Med 2024; 150:102823. [PMID: 38553163 DOI: 10.1016/j.artmed.2024.102823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 02/21/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Medical imaging is an important tool for clinical diagnosis. Nevertheless, it is very time-consuming and error-prone for physicians to prepare imaging diagnosis reports. Therefore, it is necessary to develop some methods to generate medical imaging reports automatically. Currently, the task of medical imaging report generation is challenging in at least two aspects: (1) medical images are very similar to each other. The differences between normal and abnormal images and between different abnormal images are usually trivial; (2) unrelated or incorrect keywords describing abnormal findings in the generated reports lead to mis-communications. In this paper, we propose a medical image report generation framework composed of four modules, including a Transformer encoder, a MIX-MLP multi-label classification network, a co-attention mechanism (CAM) based semantic and visual feature fusion, and a hierarchical LSTM decoder. The Transformer encoder can be used to learn long-range dependencies between images and labels, effectively extract visual and semantic features of images, and establish long-term dependent relationships between visual and semantic information to accurately extract abnormal features from images. The MIX-MLP multi-label classification network, the co-attention mechanism and the hierarchical LSTM network can better identify abnormalities, achieving visual and text alignment fusion and multi-label diagnostic classification to better facilitate report generation. The results of the experiments performed on two widely used radiology report datasets, IU X-RAY and MIMIC-CXR, show that our proposed framework outperforms current report generation models in terms of both natural linguistic generation metrics and clinical efficacy assessment metrics. The code of this work is available online at https://github.com/watersunhznu/LIFMRG.
Collapse
Affiliation(s)
- Shuifa Sun
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China; Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China
| | - Zhoujunsen Mei
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Xiaolong Li
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Economics and Management, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Tinglong Tang
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Zhanglin Su
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China
| | - Yirong Wu
- Institute of Advanced Studies in Humanities and Social Sciences, Beijing Normal University, Zhuhai, 519087, Guangdong, China.
| |
Collapse
|
22
|
Cao Z, Wang K, Wen J, Li C, Wu Y, Wang X, Yu W. Fine-grained image classification on bats using VGG16-CBAM: a practical example with 7 horseshoe bats taxa (CHIROPTERA: Rhinolophidae: Rhinolophus) from Southern China. Front Zool 2024; 21:10. [PMID: 38561769 PMCID: PMC10983684 DOI: 10.1186/s12983-024-00531-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/18/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND Rapid identification and classification of bats are critical for practical applications. However, species identification of bats is a typically detrimental and time-consuming manual task that depends on taxonomists and well-trained experts. Deep Convolutional Neural Networks (DCNNs) provide a practical approach for the extraction of the visual features and classification of objects, with potential application for bat classification. RESULTS In this study, we investigated the capability of deep learning models to classify 7 horseshoe bat taxa (CHIROPTERA: Rhinolophus) from Southern China. We constructed an image dataset of 879 front, oblique, and lateral targeted facial images of live individuals collected during surveys between 2012 and 2021. All images were taken using a standard photograph protocol and setting aimed at enhancing the effectiveness of the DCNNs classification. The results demonstrated that our customized VGG16-CBAM model achieved up to 92.15% classification accuracy with better performance than other mainstream models. Furthermore, the Grad-CAM visualization reveals that the model pays more attention to the taxonomic key regions in the decision-making process, and these regions are often preferred by bat taxonomists for the classification of horseshoe bats, corroborating the validity of our methods. CONCLUSION Our finding will inspire further research on image-based automatic classification of chiropteran species for early detection and potential application in taxonomy.
Collapse
Affiliation(s)
- Zhong Cao
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Kunhui Wang
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Jiawei Wen
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Chuxian Li
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, 510006, China
| | - Yi Wu
- School of Life Sciences, Guangzhou University, Guangzhou, 510006, China
| | - Xiaoyun Wang
- School of Life Sciences, Guangzhou University, Guangzhou, 510006, China.
| | - Wenhua Yu
- School of Life Sciences, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
23
|
Zhou W, Zheng F, Zhao Y, Pang Y, Yi J. MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification. Neural Netw 2024; 172:106141. [PMID: 38301340 DOI: 10.1016/j.neunet.2024.106141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/17/2024] [Accepted: 01/21/2024] [Indexed: 02/03/2024]
Abstract
Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.
Collapse
Affiliation(s)
- Wei Zhou
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Fujian Zheng
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China; College of Optoelectronic Engineering, Chongqing University, Chongqing 400030, PR China.
| | - Yiheng Zhao
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| | - Yiran Pang
- Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, FL 33431, United States of America.
| | - Jun Yi
- College of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing 401331, PR China.
| |
Collapse
|
24
|
Zhang X, Ding T. Style classification of media painting images by integrating ResNet and attention mechanism. Heliyon 2024; 10:e27178. [PMID: 38496868 PMCID: PMC10944206 DOI: 10.1016/j.heliyon.2024.e27178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/24/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
The progress of deep learning technology has made image classification an important application field. Image style classification is a complex task involving the recognition of the whole picture, including the recognition of salient features and detailed features. This study is based on the ResNet algorithm and has improved its Resnet 50 version with excellent performance. In the model architecture, we introduce blur pool operation and replace the traditional Relu function with Celu activation function. In addition, the triplet attention mechanism was integrated to further enhance the model performance. Through a series of experiments, it is found that the improved ResNet50 model has the highest classification accuracy of 80.6% on large-scale image data sets, which is 11.7% higher than the traditional ResNet50 model. In terms of recognition of similar style images, the model incorporating triplet attention demonstrated higher average accuracy (74%) and recall (82%). This improvement has achieved certain results and has certain technical reference value for various styles of image classification fields.
Collapse
Affiliation(s)
- Xinyun Zhang
- The University of York, School of Arts and Creative Technologies(2013-2014), York, YO10 5DD, United Kingdom
| | - Tao Ding
- 1-19 Torrington Place. University College London, Gower Street, London, WC1E 6BT, United Kingdom
| |
Collapse
|
25
|
Wei X, Wang Z. TCN-attention-HAR: human activity recognition based on attention mechanism time convolutional network. Sci Rep 2024; 14:7414. [PMID: 38548859 PMCID: PMC10978978 DOI: 10.1038/s41598-024-57912-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open
Abstract
Wearable sensors are widely used in medical applications and human-computer interaction because of their portability and powerful privacy. Human activity identification based on sensor data plays a vital role in these fields. Therefore, it is important to improve the recognition performance of different types of actions. Aiming at the problems of insufficient time-varying feature extraction and gradient explosion caused by too many network layers, a time convolution network recognition model with attention mechanism (TCN-Attention-HAR) was proposed. The model effectively recognizes and emphasizes the key feature information. The ability of extracting temporal features from TCN (temporal convolution network) is improved by using the appropriate size of the receiver domain. In addition, attention mechanisms are used to assign higher weights to important information, enabling models to learn and identify human activities more effectively. The performance of the Open Data Set (WISDM, PAMAP2 and USC-HAD) is improved by 1.13%, 1.83% and 0.51%, respectively, compared with other advanced models, these results clearly show that the network model presented in this paper has excellent recognition performance. In the knowledge distillation experiment, the parameters of student model are only about 0.1% of those of teacher model, and the accuracy of the model has been greatly improved, and in the WISDM data set, compared with the teacher's model, the accuracy is 0.14% higher.
Collapse
Affiliation(s)
- Xiong Wei
- Wuhan Textile University, Wuhan, China
| | | |
Collapse
|
26
|
Wei L, Liu P, Ren H, Xiao D. Research on helmet wearing detection method based on deep learning. Sci Rep 2024; 14:7010. [PMID: 38528034 DOI: 10.1038/s41598-024-57433-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/18/2024] [Indexed: 03/27/2024] Open
Abstract
The vigorous development of the construction industry has also brought unprecedented safety risks. The wearing of safety helmets at the construction site can effectively reduce casualties. As a result, this paper suggests employing a deep learning-based approach for the real-time detection of safety helmet usage among construction workers. Based on the selected YOLOv5s network through experiments, this paper analyzes its training results. Considering its poor detection effect on small objects and occluded objects. Therefore, multiple attention mechanisms are used to improve the YOLOv5s network, the feature pyramid network is improved into a BiFPN bidirectional feature pyramid network, and the post-processing method NMS is improved into Soft-NMS. Based on the above-improved method, the loss function is improved to enhance the convergence speed of the model and improve the detection speed. We propose a network model called BiFEL-YOLOv5s, which combines the BiFPN network and Focal-EIoU Loss to improve YOLOv5s. The average precision of the model is increased by 0.9% the recall rate is increased by 2.8%, and the detection speed of the model does not decrease too much. It is better suited for real-time safety helmet object detection, addressing the requirements of helmet detection across various work scenarios.
Collapse
Affiliation(s)
- Lihong Wei
- School of Artificial Intelligence and Big Data, Hulunbeier University, Inner Mongolia, 021008, Hailar, China
| | - Panpan Liu
- Information Science and Engineering School, Northeastern University, Shenyang, 110004, China.
| | - Haihui Ren
- Information Science and Engineering School, Northeastern University, Shenyang, 110004, China
| | - Dong Xiao
- Information Science and Engineering School, Northeastern University, Shenyang, 110004, China
- Liaoning Key Laboratory of Intelligent Diagnosis and Safety for Metallurgical Industry, Northeastern University, Shenyang, 110819, China
| |
Collapse
|
27
|
Wang L, Zhang X, Tian C, Chen S, Deng Y, Liao X, Wang Q, Si W. PlaqueNet: deep learning enabled coronary artery plaque segmentation from coronary computed tomography angiography. Vis Comput Ind Biomed Art 2024; 7:6. [PMID: 38514491 DOI: 10.1186/s42492-024-00157-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 03/03/2024] [Indexed: 03/23/2024] Open
Abstract
Cardiovascular disease, primarily caused by atherosclerotic plaque formation, is a significant health concern. The early detection of these plaques is crucial for targeted therapies and reducing the risk of cardiovascular diseases. This study presents PlaqueNet, a solution for segmenting coronary artery plaques from coronary computed tomography angiography (CCTA) images. For feature extraction, the advanced residual net module was utilized, which integrates a deepwise residual optimization module into network branches, enhances feature extraction capabilities, avoiding information loss, and addresses gradient issues during training. To improve segmentation accuracy, a depthwise atrous spatial pyramid pooling based on bicubic efficient channel attention (DASPP-BICECA) module is introduced. The BICECA component amplifies the local feature sensitivity, whereas the DASPP component expands the network's information-gathering scope, resulting in elevated segmentation accuracy. Additionally, BINet, a module for joint network loss evaluation, is proposed. It optimizes the segmentation model without affecting the segmentation results. When combined with the DASPP-BICECA module, BINet enhances overall efficiency. The CCTA segmentation algorithm proposed in this study outperformed the other three comparative algorithms, achieving an intersection over Union of 87.37%, Dice of 93.26%, accuracy of 93.12%, mean intersection over Union of 93.68%, mean Dice of 96.63%, and mean pixel accuracy value of 96.55%.
Collapse
Affiliation(s)
- Linyuan Wang
- Department of Cardiovascular Surgery, the Affiliated Hospital of Shanxi Medical University, Shanxi Cardiovascular Hospital (Institute), Shanxi Clinical Medical Research Center for Cardiovascular Disease, Taiyuan, 030024, Shanxi, China
| | - Xiaofeng Zhang
- Department of Mechanical Engineering, Nantong University, Nantong, 226019, Jiangsu, China
| | - Congyu Tian
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Shu Chen
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, Hubei, China
| | - Yongzhi Deng
- Department of Cardiovascular Surgery, the Affiliated Hospital of Shanxi Medical University, Shanxi Cardiovascular Hospital (Institute), Shanxi Clinical Medical Research Center for Cardiovascular Disease, Taiyuan, 030024, Shanxi, China.
| | - Xiangyun Liao
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China.
| | - Qiong Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Weixin Si
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| |
Collapse
|
28
|
Yuan X, Fu Z, Zhang B, Xie Z, Gan R. Research on lightweight algorithm for gangue detection based on improved Yolov5. Sci Rep 2024; 14:6707. [PMID: 38509164 PMCID: PMC10954748 DOI: 10.1038/s41598-024-57259-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 03/15/2024] [Indexed: 03/22/2024] Open
Abstract
In order to solve the problems of slow detection speed, large number of parameters and large computational volume of deep learning based gangue target detection method, we propose an improved algorithm for gangue target detection based on Yolov5s. First, the lightweight network EfficientVIT is used as the backbone network to increase the target detection speed. Second, C3_Faster replaces the C3 part in the HEAD module, which reduces the model complexity. once again, the 20 × 20 feature map branch in the Neck region is deleted, which reduces the model complexity; thirdly, the CIOU loss function is replaced by the Mpdiou loss function. The introduction of the SE attention mechanism makes the model pay more attention to critical features to improve detection performance. Experimental results show that the improved model size of the coal gang detection algorithm reduces the compression by 77.8%, the number of parameters by 78.3% the computational cost is reduced by 77.8% and the number of frames is reduced by 30.6%, which can be used as a reference for intelligent coal gangue classification.
Collapse
Affiliation(s)
- Xinpeng Yuan
- School of Coal Engineering, Shanxi Datong University, Datong, 037000, China.
| | - Zhibo Fu
- School of Coal Engineering, Shanxi Datong University, Datong, 037000, China.
| | - Bowen Zhang
- School of Coal Engineering, Shanxi Datong University, Datong, 037000, China
| | - Zhengkun Xie
- School of Coal Engineering, Shanxi Datong University, Datong, 037000, China
| | - Rui Gan
- School of Coal Engineering, Shanxi Datong University, Datong, 037000, China
| |
Collapse
|
29
|
Chen L, Zhu J. Water surface garbage detection based on lightweight YOLOv5. Sci Rep 2024; 14:6133. [PMID: 38480741 PMCID: PMC10937728 DOI: 10.1038/s41598-024-55051-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open
Abstract
With the development of deep learning technology, researchers are increasingly paying attention to how to efficiently salvage surface garbage. Since the 1980s, the development of plastic products and economic growth has led to the accumulation of a large amount of garbage in rivers. Due to the large amount of garbage and the high risk of surface operations, the efficiency of manual garbage retrieval will be greatly reduced. Among existing methods, using YOLO algorithm to detect target objects is the most popular. Compared to traditional detection algorithms, YOLO algorithm not only has higher accuracy, but also is more lightweight. This article presents a lightweight YOLOv5 water surface garbage detection algorithm suitable for deployment on unmanned ships. This article has been validated on the Orca dataset, experimental results showed that the detection speed of the improved YOLOv5 increased by 4.3%, mAP value reached 84.9%, precision reached 88.7%, the parameter quantity only accounts for 12% of the original data. Compared with the original algorithm, the improved algorithm not only has higher accuracy, but also can be applied to more hardware devices due to its lighter weight.
Collapse
Affiliation(s)
- Luya Chen
- College of Engineering Science and Technology, Shanghai Ocean University, Shanghai, 21306, China.
| | - Jianping Zhu
- School of Engineering, Shanghai Ocean University, Shanghai, 21306, China.
| |
Collapse
|
30
|
Ma J, Zhao Z, Li T, Liu Y, Ma J, Zhang R. GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction. Interdiscip Sci 2024:10.1007/s12539-024-00609-y. [PMID: 38457109 DOI: 10.1007/s12539-024-00609-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 01/01/2024] [Accepted: 01/08/2024] [Indexed: 03/09/2024]
Abstract
Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.
Collapse
Affiliation(s)
- Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
- School of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou, 730020, China.
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
- Computer College, Qinghai Normal University, Xi'ning, 810016, China
| | - Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
31
|
Yin Y, Tang Z, Weng H. Application of visual transformer in renal image analysis. Biomed Eng Online 2024; 23:27. [PMID: 38439100 PMCID: PMC10913284 DOI: 10.1186/s12938-024-01209-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/22/2024] [Indexed: 03/06/2024] Open
Abstract
Deep Self-Attention Network (Transformer) is an encoder-decoder architectural model that excels in establishing long-distance dependencies and is first applied in natural language processing. Due to its complementary nature with the inductive bias of convolutional neural network (CNN), Transformer has been gradually applied to medical image processing, including kidney image processing. It has become a hot research topic in recent years. To further explore new ideas and directions in the field of renal image processing, this paper outlines the characteristics of the Transformer network model and summarizes the application of the Transformer-based model in renal image segmentation, classification, detection, electronic medical records, and decision-making systems, and compared with CNN-based renal image processing algorithm, analyzing the advantages and disadvantages of this technique in renal image processing. In addition, this paper gives an outlook on the development trend of Transformer in renal image processing, which provides a valuable reference for a lot of renal image analysis.
Collapse
Affiliation(s)
- Yuwei Yin
- The College of Health Sciences and Engineering, University of Shanghai for Science and Technology, 516 Jungong Highway, Yangpu Area, Shanghai, 200093, China
- The College of Medical Technology, Shanghai University of Medicine & Health Sciences, 279 Zhouzhu Highway, Pudong New Area, Shanghai, 201318, China
| | - Zhixian Tang
- The College of Medical Technology, Shanghai University of Medicine & Health Sciences, 279 Zhouzhu Highway, Pudong New Area, Shanghai, 201318, China.
| | - Huachun Weng
- The College of Health Sciences and Engineering, University of Shanghai for Science and Technology, 516 Jungong Highway, Yangpu Area, Shanghai, 200093, China.
- The College of Medical Technology, Shanghai University of Medicine & Health Sciences, 279 Zhouzhu Highway, Pudong New Area, Shanghai, 201318, China.
| |
Collapse
|
32
|
Yao X, Jiang X, Luo H, Liang H, Ye X, Wei Y, Cong S. MOCAT: multi-omics integration with auxiliary classifiers enhanced autoencoder. BioData Min 2024; 17:9. [PMID: 38444019 PMCID: PMC10916109 DOI: 10.1186/s13040-024-00360-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 02/29/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND Integrating multi-omics data is emerging as a critical approach in enhancing our understanding of complex diseases. Innovative computational methods capable of managing high-dimensional and heterogeneous datasets are required to unlock the full potential of such rich and diverse data. METHODS We propose a Multi-Omics integration framework with auxiliary Classifiers-enhanced AuToencoders (MOCAT) to utilize intra- and inter-omics information comprehensively. Additionally, attention mechanisms with confidence learning are incorporated for enhanced feature representation and trustworthy prediction. RESULTS Extensive experiments were conducted on four benchmark datasets to evaluate the effectiveness of our proposed model, including BRCA, ROSMAP, LGG, and KIPAN. Our model significantly improved most evaluation measurements and consistently surpassed the state-of-the-art methods. Ablation studies showed that the auxiliary classifiers significantly boosted classification accuracy in the ROSMAP and LGG datasets. Moreover, the attention mechanisms and confidence evaluation block contributed to improvements in the predictive accuracy and generalizability of our model. CONCLUSIONS The proposed framework exhibits superior performance in disease classification and biomarker discovery, establishing itself as a robust and versatile tool for analyzing multi-layer biological data. This study highlights the significance of elaborated designed deep learning methodologies in dissecting complex disease phenotypes and improving the accuracy of disease predictions.
Collapse
Affiliation(s)
- Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Xiaohan Jiang
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China
| | - Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Xiufen Ye
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, 1777 Sansha Rd, Qingdao, 266000, Shandong, China.
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, 145 Nantong St, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
33
|
Wang S, Qiao J, Feng S. Prediction of lncRNA and disease associations based on residual graph convolutional networks with attention mechanism. Sci Rep 2024; 14:5185. [PMID: 38431702 DOI: 10.1038/s41598-024-55957-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/29/2024] [Indexed: 03/05/2024] Open
Abstract
LncRNAs are non-coding RNAs with a length of more than 200 nucleotides. More and more evidence shows that lncRNAs are inextricably linked with diseases. To make up for the shortcomings of traditional methods, researchers began to collect relevant biological data in the database and used bioinformatics prediction tools to predict the associations between lncRNAs and diseases, which greatly improved the efficiency of the study. To improve the prediction accuracy of current methods, we propose a new lncRNA-disease associations prediction method with attention mechanism, called ResGCN-A. Firstly, we integrated lncRNA functional similarity, lncRNA Gaussian interaction profile kernel similarity, disease semantic similarity, and disease Gaussian interaction profile kernel similarity to obtain lncRNA comprehensive similarity and disease comprehensive similarity. Secondly, the residual graph convolutional network was used to extract the local features of lncRNAs and diseases. Thirdly, the new attention mechanism was used to assign the weight of the above features to further obtain the potential features of lncRNAs and diseases. Finally, the training set required by the Extra-Trees classifier was obtained by concatenating potential features, and the potential associations between lncRNAs and diseases were obtained by the trained Extra-Trees classifier. ResGCN-A combines the residual graph convolutional network with the attention mechanism to realize the local and global features fusion of lncRNA and diseases, which is beneficial to obtain more accurate features and improve the prediction accuracy. In the experiment, ResGCN-A was compared with five other methods through 5-fold cross-validation. The results show that the AUC value and AUPR value obtained by ResGCN-A are 0.9916 and 0.9951, which are superior to the other five methods. In addition, case studies and robustness evaluation have shown that ResGCN-A is an effective method for predicting lncRNA-disease associations. The source code for ResGCN-A will be available at https://github.com/Wangxiuxiun/ResGCN-A .
Collapse
Affiliation(s)
- Shengchang Wang
- School of Electronic and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Jiaqing Qiao
- School of Electronic and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Shou Feng
- College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China.
| |
Collapse
|
34
|
Zhang Y, Chen Z, Yang X. Light-M: An efficient lightweight medical image segmentation framework for resource-constrained IoMT. Comput Biol Med 2024; 170:108088. [PMID: 38320339 DOI: 10.1016/j.compbiomed.2024.108088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/22/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024]
Abstract
The Internet of Medical Things (IoMT) is being incorporated into current healthcare systems. This technology intends to connect patients, IoMT devices, and hospitals over mobile networks, allowing for more secure, quick, and convenient health monitoring and intelligent healthcare services. However, existing intelligent healthcare applications typically rely on large-scale AI models, and standard IoMT devices have significant resource constraints. To alleviate this paradox, in this paper, we propose a Knowledge Distillation (KD)-based IoMT end-edge-cloud orchestrated architecture for medical image segmentation tasks, called Light-M, aiming to deploy a lightweight medical model in resource-constrained IoMT devices. Specifically, Light-M trains a large teacher model in the cloud server and employs computation in local nodes through imitation of the performance of the teacher model using knowledge distillation. Light-M contains two KD strategies: (1) active exploration and passive transfer (AEPT) and (2) self-attention-based inter-class feature variation (AIFV) distillation for the medical image segmentation task. The AEPT encourages the student model to learn undiscovered knowledge/features of the teacher model without additional feature layers, aiming to explore new features and outperform the teacher. To improve the distinguishability of the student for different classes, the student learns the self-attention-based feature variation (AIFV) between classes. Since the proposed AEPT and AIFV only appear in the training process, our framework does not involve any additional computation burden for a student model during the segmentation task deployment. Extensive experiments on cardiac images and public real-scene datasets demonstrate that our approach improves student model learning representations and outperforms state-of-the-art methods by combining two knowledge distillation strategies. Moreover, when deployed on the IoT device, the distilled student model takes only 29.6 ms for one sample at the inference step.
Collapse
Affiliation(s)
- Yifan Zhang
- Shenzhen University, 3688 Nanhai Ave., Shenzhen, 518060, Guangdong, China
| | - Zhuangzhuang Chen
- Shenzhen University, 3688 Nanhai Ave., Shenzhen, 518060, Guangdong, China
| | - Xuan Yang
- Shenzhen University, 3688 Nanhai Ave., Shenzhen, 518060, Guangdong, China.
| |
Collapse
|
35
|
Lin Y, Wang J, Liu Q, Zhang K, Liu M, Wang Y. CFANet: Context fusing attentional network for preoperative CT image segmentation in robotic surgery. Comput Biol Med 2024; 171:108115. [PMID: 38402837 DOI: 10.1016/j.compbiomed.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/30/2024] [Accepted: 02/04/2024] [Indexed: 02/27/2024]
Abstract
Accurate segmentation of CT images is crucial for clinical diagnosis and preoperative evaluation of robotic surgery, but challenges arise from fuzzy boundaries and small-sized targets. In response, a novel 2D segmentation network named Context Fusing Attentional Network (CFANet) is proposed. CFANet incorporates three key modules to address these challenges, namely pyramid fusing module (PFM), parallel dilated convolution module (PDCM) and scale attention module (SAM). Integration of these modules into the encoder-decoder structure enables effective utilization of multi-level and multi-scale features. Compared with advanced segmentation method, the Dice score improved by 2.14% on the dataset of liver tumor. This improvement is expected to have a positive impact on the preoperative evaluation of robotic surgery and to support clinical diagnosis, especially in early tumor detection.
Collapse
Affiliation(s)
- Yao Lin
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, 410082, China
| | - Jiazheng Wang
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, 410082, China.
| | - Qinghao Liu
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, 410082, China
| | - Kang Zhang
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, 410082, China
| | - Min Liu
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, 410082, China; Research Institute of Hunan University in Chongqing, Chongqing, 401135, China.
| | - Yaonan Wang
- College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China; National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha, 410082, China
| |
Collapse
|
36
|
Zhou Y, Zheng Y, Tian Y, Bai Y, Cai N, Wang P. SCAN: sequence-based context-aware association network for hepatic vessel segmentation. Med Biol Eng Comput 2024; 62:817-827. [PMID: 38032458 DOI: 10.1007/s11517-023-02975-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
Accurate segmentation of hepatic vessel is significant for the surgeons to design the preoperative planning of liver surgery. In this paper, a sequence-based context-aware association network (SCAN) is designed for hepatic vessel segmentation, in which three schemes are incorporated to simultaneously extract the 2D features of hepatic vessels and capture the correlations between adjacent CT slices. The two schemes of slice-level attention module and graph association module are designed to bridge feature gaps between the encoder and the decoder in the low- and high-dimensional spaces. The region-edge constrained loss is designed to well optimize the proposed SCAN, which integrates cross-entropy loss, dice loss, and edge-constrained loss. Experimental results indicate that the proposed SCAN is superior to several existing deep learning frameworks, in terms of 0.845 DSC, 0.856 precision, 0.866 sensitivity, and 0.861 F1-score.
Collapse
Affiliation(s)
- Yinghong Zhou
- School of Information Engineering, Guangdong University of Technology, Guangzhou, China
| | - Yu Zheng
- School of Information Engineering, Guangdong University of Technology, Guangzhou, China
| | - Yinfeng Tian
- School of Information Engineering, Guangdong University of Technology, Guangzhou, China
| | - Youfang Bai
- School of Information Engineering, Guangdong University of Technology, Guangzhou, China
| | - Nian Cai
- School of Information Engineering, Guangdong University of Technology, Guangzhou, China.
| | - Ping Wang
- Department of Hepatobiliary Surgery in the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
37
|
Huang Z, Xiao Q, Xiong T, Shi W, Yang Y, Li G. Predicting Drug-Protein Interactions through Branch-Chain Mining and multi-dimensional attention network. Comput Biol Med 2024; 171:108127. [PMID: 38350397 DOI: 10.1016/j.compbiomed.2024.108127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/26/2024] [Accepted: 02/06/2024] [Indexed: 02/15/2024]
Abstract
Identifying drug-protein interactions (DPIs) is crucial in drug discovery and repurposing. Computational methods for precise DPI identification can expedite development timelines and reduce expenses compared with conventional experimental methods. Lately, deep learning techniques have been employed for predicting DPIs, enhancing these processes. Nevertheless, the limitations observed in prior studies, where many extract features from complete drug and protein entities, overlooking the crucial theoretical foundation that pharmacological responses are often correlated with specific substructures, can lead to poor predictive performance. Furthermore, certain substructure-focused research confines its exploration to a solitary fragment category, such as a functional group. In this study, addressing these constraints, we present an end-to-end framework termed BCMMDA for predicting DPIs. The framework considers various substructure types, including branch chains, common substructures, and specific fragments. We designed a specific feature learning module by combining our proposed multi-dimensional attention mechanism with convolutional neural networks (CNNs). Deep CNNs assist in capturing the synergistic effects among these fragment sets, enabling the extraction of relevant features of drugs and proteins. Meanwhile, the multi-dimensional attention mechanism refines the relationship between drug and protein features by assigning attention vectors to each drug compound and amino acid. This mechanism empowers the model to further concentrate on pivotal substructures and elements, thereby improving its ability to identify essential interactions in DPI prediction. We evaluated the performance of BCMMDA on four well-known benchmark datasets. The results indicated that BCMMDA outperformed state-of-the-art baseline models, demonstrating significant improvement in performance.
Collapse
Affiliation(s)
- Zhuo Huang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China; MOE-LCSM, School of Mathematics and Statistics, Hunan Normal University, Changsha, 410081, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Tuo Xiong
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Wanwan Shi
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Yide Yang
- Key Laboratory of Molecular Epidemiology of Hunan Province, School of Medicine, Hunan Normal University, Changsha, 410006, China.
| | - Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, 330013, China.
| |
Collapse
|
38
|
Wei K, Kong W, Liu L, Wang J, Li B, Zhao B, Li Z, Zhu J, Yu G. CT synthesis from MR images using frequency attention conditional generative adversarial network. Comput Biol Med 2024; 170:107983. [PMID: 38286104 DOI: 10.1016/j.compbiomed.2024.107983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 12/24/2023] [Accepted: 01/13/2024] [Indexed: 01/31/2024]
Abstract
Magnetic resonance (MR) image-guided radiotherapy is widely used in the treatment planning of malignant tumors, and MR-only radiotherapy, a representative of this technique, requires synthetic computed tomography (sCT) images for effective radiotherapy planning. Convolutional neural networks (CNN) have shown remarkable performance in generating sCT images. However, CNN-based models tend to synthesize more low-frequency components and the pixel-wise loss function usually used to optimize the model can result in blurred images. To address these problems, a frequency attention conditional generative adversarial network (FACGAN) is proposed in this paper. Specifically, a frequency cycle generative model (FCGM) is designed to enhance the inter-mapping between MR and CT and extract more rich tissue structure information. Additionally, a residual frequency channel attention (RFCA) module is proposed and incorporated into the generator to enhance its ability in perceiving the high-frequency image features. Finally, high-frequency loss (HFL) and cycle consistency high-frequency loss (CHFL) are added to the objective function to optimize the model training. The effectiveness of the proposed model is validated on pelvic and brain datasets and compared with state-of-the-art deep learning models. The results show that FACGAN produces higher-quality sCT images while retaining clearer and richer high-frequency texture information.
Collapse
Affiliation(s)
- Kexin Wei
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, China
| | - Weipeng Kong
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, China
| | - Liheng Liu
- Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Jian Wang
- Department of Radiology, Central Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Baosheng Li
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, No.440, Jiyan Road, Jinan, 250117, Shandong Province, China
| | - Bo Zhao
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, China
| | - Zhenjiang Li
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, No.440, Jiyan Road, Jinan, 250117, Shandong Province, China
| | - Jian Zhu
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, No.440, Jiyan Road, Jinan, 250117, Shandong Province, China.
| | - Gang Yu
- Shandong Key Laboratory of Medical Physics and Image Processing, Shandong Institute of Industrial Technology for Health Sciences and Precision Medicine, School of Physics and Electronics, Shandong Normal University, Jinan, China.
| |
Collapse
|
39
|
Yang K, Song J, Liu M, Xue L, Liu S, Yin X, Liu K. TBACkp: HER2 expression status classification network focusing on intrinsic subenvironmental characteristics of breast cancer liver metastases. Comput Biol Med 2024; 170:108002. [PMID: 38277921 DOI: 10.1016/j.compbiomed.2024.108002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/24/2023] [Accepted: 01/13/2024] [Indexed: 01/28/2024]
Abstract
The HER2 expression status in breast cancer liver metastases is a crucial indicator for the diagnosis, treatment, and prognosis assessment of patients. And typical diagnosis involves assessing the HER2 expression status through invasive procedures like biopsy. However, this method has certain drawbacks, such as being difficult in obtaining tissue samples and requiring long examination periods. To address these limitations, we propose an AI-aided diagnostic model. This model enables rapid diagnosis. It diagnoses a patient's HER2 expression status on the basis of preprocessed images, which is the region of the lesion extracted from a CT image rather than from an actual tissue sample. The algorithm of the model adopts a parallel structure, including a Branch Block and a Trunk Block. The Branch Block is responsible for extracting the gradient characteristics between the tumor sub-environments, and the Trunk Block is for fusing the characteristics extracted by the Branch Block. The Branch Block contains CNN with self-attention, which combines the advantages of CNN and self-attention to extract more meticulous and comprehensive image features. And the Trunk Block is so designed that it fuses the extracted image feature information without affecting the transmission of the original image features. The Conv-Attention is used to calculate the attention in the Trunk Block, which uses kernel dot product and is responsible for providing the weight for the self-attention in the process of using convolution induced deviation calculation. Combined with the structure of the model and the method used, we refer to this model as TBACkp. The dataset comprises the enhanced abdominal CT images of 151 patients with liver metastases from breast cancer, together with the corresponding HER2 expression levels for each patient. The experimental results are as follows: (AUC: 0.915, ACC: 0.854, specificity: 0.809, precision: 0.863, recall: 0.881, F1-score: 0.872). The results demonstrate that this method can accurately assess the HER2 expression status in patients when compared with other advanced deep learning model.
Collapse
Affiliation(s)
- Kun Yang
- College of Quality and Technical Supervision, Hebei University, Baoding, China; Hebei Technology Innovation Center for Lightweight of New Energy Vehicle Power System, Baoding, China; Scientific Research and Innovation Team of Hebei University, Baoding, China
| | - Jie Song
- College of Quality and Technical Supervision, Hebei University, Baoding, China; Hebei Technology Innovation Center for Lightweight of New Energy Vehicle Power System, Baoding, China; Scientific Research and Innovation Team of Hebei University, Baoding, China
| | - Meng Liu
- Department of Radiology, Affiliated Hospital of Hebei University, Baoding, China
| | - Linyan Xue
- College of Quality and Technical Supervision, Hebei University, Baoding, China; Hebei Technology Innovation Center for Lightweight of New Energy Vehicle Power System, Baoding, China; Scientific Research and Innovation Team of Hebei University, Baoding, China
| | - Shuang Liu
- College of Quality and Technical Supervision, Hebei University, Baoding, China; Hebei Technology Innovation Center for Lightweight of New Energy Vehicle Power System, Baoding, China; Scientific Research and Innovation Team of Hebei University, Baoding, China
| | - Xiaoping Yin
- Department of Radiology, Affiliated Hospital of Hebei University, Baoding, China; Hebei Key Laboratory of Precise Imaging of Inflammation Related Tumors, Hebei University, Baoding, China; The Outstanding Young Scientific Research and Innovation Team of Hebei University, Baoding, China.
| | - Kun Liu
- College of Quality and Technical Supervision, Hebei University, Baoding, China; Hebei Technology Innovation Center for Lightweight of New Energy Vehicle Power System, Baoding, China; Scientific Research and Innovation Team of Hebei University, Baoding, China.
| |
Collapse
|
40
|
Wen W, Zhang H, Wang Z, Gao X, Wu P, Lin J, Zeng N. Enhanced multi-label cardiology diagnosis with channel-wise recurrent fusion. Comput Biol Med 2024; 171:108210. [PMID: 38417383 DOI: 10.1016/j.compbiomed.2024.108210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/08/2024] [Accepted: 02/25/2024] [Indexed: 03/01/2024]
Abstract
The timely detection of abnormal electrocardiogram (ECG) signals is vital for preventing heart disease. However, traditional automated cardiology diagnostic methods have the limitation of being unable to simultaneously identify multiple diseases in a segment of ECG signals, and do not consider the potential correlations between the 12-lead ECG signals. To address these issues, this paper presents a novel network architecture, denoted as Branched Convolution and Channel Fusion Network (BCCF-Net), designed for the multi-label diagnosis of ECG cardiology to achieve simultaneous identification of multiple diseases. Among them, the BCCF-Net incorporates the Channel-wise Recurrent Fusion (CRF) network, which is designed to enhance the ability to explore potential correlation information between 12 leads. Furthermore, the utilization of the squeeze and excitation (SE) attention mechanism maximizes the potential of the convolutional neural network (CNN). In order to efficiently capture complex patterns in space and time across various scales, the multi branch convolution (MBC) module has been developed. Through extensive experiments on two public datasets with seven subtasks, the efficacy and robustness of the proposed ECG multi-label classification framework have been comprehensively evaluated. The results demonstrate the superior performance of the BCCF-Net compared to other state-of-the-art algorithms. The developed framework holds practical application in clinical settings, allowing for the refined diagnosis of cardiac arrhythmias through ECG signal analysis.
Collapse
Affiliation(s)
- Weimin Wen
- School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Hongyi Zhang
- School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Zidong Wang
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, UK.
| | - Xingen Gao
- School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Peishu Wu
- Department of Instrumental and Electrical Engineering, Xiamen University, Fujian 361005, China
| | - Juqiang Lin
- School of Opto-Electronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Nianyin Zeng
- Department of Instrumental and Electrical Engineering, Xiamen University, Fujian 361005, China.
| |
Collapse
|
41
|
Tang X, Luo L, Wang S. TSE-ARF: An adaptive prediction method of effectors across secretion system types. Anal Biochem 2024; 686:115407. [PMID: 38030053 DOI: 10.1016/j.ab.2023.115407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/12/2023] [Accepted: 11/20/2023] [Indexed: 12/01/2023]
Abstract
Bacterial effector proteins are secreted by a variety of protein secretion systems and play an important role in the interaction between the host and pathogenic bacteria. Therefore, it is important to find a fast and inexpensive method to discover bacterial effectors. In this study, we propose a multi-type secretion effector adaptive random forest (TSE-ARF) to adaptively identify secretion effectors across T1SE-T4SE and T6SE based only on protein sequences. First, we proposed two new feature descriptors by considering some characteristic protein information and fused them with some universal features to form a 290-dimensional feature vector with good versatility. Then, the TSE-ARF model was used to make classification predictions by parameter adaptation of different secretion effectors integrating Shuffled Frog Leaping Algorithm and random forest. The perfect performance in TSE-ARF under different data sets and settings shows its considerable generalization ability, with which more candidate effectors were screened in the whole genome. Source code is available at https://github.com/AIMOVE/TSE-ARF.
Collapse
Affiliation(s)
- Xianjun Tang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Longfei Luo
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China; Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, Yunnan, China.
| |
Collapse
|
42
|
Wang Z, Yu L, Tian S, Huo X. CRMEFNet: A coupled refinement, multiscale exploration and fusion network for medical image segmentation. Comput Biol Med 2024; 171:108202. [PMID: 38402839 DOI: 10.1016/j.compbiomed.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/22/2023] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Accurate segmentation of target areas in medical images, such as lesions, is essential for disease diagnosis and clinical analysis. In recent years, deep learning methods have been intensively researched and have generated significant progress in medical image segmentation tasks. However, most of the existing methods have limitations in modeling multilevel feature representations and identification of complex textured pixels at contrasting boundaries. This paper proposes a novel coupled refinement and multiscale exploration and fusion network (CRMEFNet) for medical image segmentation, which explores in the optimization and fusion of multiscale features to address the abovementioned limitations. The CRMEFNet consists of three main innovations: a coupled refinement module (CRM), a multiscale exploration and fusion module (MEFM), and a cascaded progressive decoder (CPD). The CRM decouples features into low-frequency body features and high-frequency edge features, and performs targeted optimization of both to enhance intraclass uniformity and interclass differentiation of features. The MEFM performs a two-stage exploration and fusion of multiscale features using our proposed multiscale aggregation attention mechanism, which explores the differentiated information within the cross-level features, and enhances the contextual connections between the features, to achieves adaptive feature fusion. Compared to existing complex decoders, the CPD decoder (consisting of the CRM and MEFM) can perform fine-grained pixel recognition while retaining complete semantic location information. It also has a simple design and excellent performance. The experimental results from five medical image segmentation tasks, ten datasets and twelve comparison models demonstrate the state-of-the-art performance, interpretability, flexibility and versatility of our CRMEFNet.
Collapse
Affiliation(s)
- Zhi Wang
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Long Yu
- College of Network Center, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China.
| | - Shengwei Tian
- College of Software, Xinjiang University, Urumqi, 830000, China; Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China
| | - Xiangzuo Huo
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, 830000, China; Signal and Signal Processing Laboratory, College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China
| |
Collapse
|
43
|
Farkhani S, Demnitz N, Boraxbekk CJ, Lundell H, Siebner HR, Petersen ET, Madsen KH. End-to-end volumetric segmentation of white matter hyperintensities using deep learning. Comput Methods Programs Biomed 2024; 245:108008. [PMID: 38290291 DOI: 10.1016/j.cmpb.2024.108008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 12/08/2023] [Accepted: 01/03/2024] [Indexed: 02/01/2024]
Abstract
BACKGROUND AND OBJECTIVES Reliable detection of white matter hyperintensities (WMH) is crucial for studying the impact of diffuse white-matter pathology on brain health and monitoring changes in WMH load over time. However, manual annotation of 3D high-dimensional neuroimages is laborious and can be prone to biases and errors in the annotation procedure. In this study, we evaluate the performance of deep learning (DL) segmentation tools and propose a novel volumetric segmentation model incorporating self-attention via a transformer-based architecture. Ultimately, we aim to evaluate diverse factors that influence WMH segmentation, aiming for a comprehensive analysis of the state-of-the-art algorithms in a broader context. METHODS We trained state-of-the-art DL algorithms, and incorporated advanced attention mechanisms, using structural fluid-attenuated inversion recovery (FLAIR) image acquisitions. The anatomical MRI data utilized for model training was obtained from healthy individuals aged 62-70 years in the Live active Successful Aging (LISA) project. Given the potential sparsity of lesion volume among healthy aging individuals, we explored the impact of incorporating a weighted loss function and ensemble models. To assess the generalizability of the studied DL models, we applied the trained algorithm to an independent subset of data sourced from the MICCAI WMH challenge (MWSC). Notably, this subset had vastly different acquisition parameters compared to the LISA dataset used for training. RESULTS Consistently, DL approaches exhibited commendable segmentation performance, achieving the level of inter-rater agreement comparable to expert performance, ensuring superior quality segmentation outcomes. On the out of sample dataset, the ensemble models exhibited the most outstanding performance. CONCLUSIONS DL methods generally surpassed conventional approaches in our study. While all DL methods performed comparably, incorporating attention mechanisms could prove advantageous in future applications with a wider availability of training data. As expected, our experiments indicate that the use of ensemble-based models enables the superior generalization in out-of-distribution settings. We believe that introducing DL methods in the WHM annotation workflow in heathy aging cohorts is promising, not only for reducing the annotation time required, but also for eventually improving accuracy and robustness via incorporating the automatic segmentations in the evaluation procedure.
Collapse
Affiliation(s)
- Sadaf Farkhani
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark.
| | - Naiara Demnitz
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark
| | - Carl-Johan Boraxbekk
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark; Institute for Clinical Medicine, Faculty of Medical and Health Sciences, University of Copenhagen, Denmark; Department of Neurology, Copenhagen University Hospital Bispebjerg and Frederiksberg, Copenhagen, Denmark; Institute of Sports Medicine Copenhagen (ISMC), Copenhagen University Hospital Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Henrik Lundell
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark; Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Hartwig Roman Siebner
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark; Institute for Clinical Medicine, Faculty of Medical and Health Sciences, University of Copenhagen, Denmark; Department of Neurology, Copenhagen University Hospital Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Esben Thade Petersen
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark; Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Kristoffer Hougaard Madsen
- Danish Research Center for Magnetic Resonance, Center for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital-Amager and Hvidovre, Kattegaard Alle 30, Hvidovre, Denmark; Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
44
|
Qi X, Wang H, Ji Y, Li Y, Luo X, Nie R, Liang X. Daily natural gas load prediction method based on APSO optimization and Attention-BiLSTM. PeerJ Comput Sci 2024; 10:e1890. [PMID: 38435580 PMCID: PMC10909168 DOI: 10.7717/peerj-cs.1890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 01/29/2024] [Indexed: 03/05/2024]
Abstract
As the economy continues to develop and technology advances, there is an increasing societal need for an environmentally friendly ecosystem. Consequently, natural gas, known for its minimal greenhouse gas emissions, has been widely adopted as a clean energy alternative. The accurate prediction of short-term natural gas demand poses a significant challenge within this context, as precise forecasts have important implications for gas dispatch and pipeline safety. The incorporation of intelligent algorithms into prediction methodologies has resulted in notable progress in recent times. Nevertheless, certain limitations persist. However, there exist certain limitations, including the tendency to easily fall into local optimization and inadequate search capability. To address the challenge of accurately predicting daily natural gas loads, we propose a novel methodology that integrates the adaptive particle swarm optimization algorithm, attention mechanism, and bidirectional long short-term memory (BiLSTM) neural networks. The initial step involves utilizing the BiLSTM network to conduct bidirectional data learning. Following this, the attention mechanism is employed to calculate the weights of the hidden layer in the BiLSTM, with a specific focus on weight distribution. Lastly, the adaptive particle swarm optimization algorithm is utilized to comprehensively optimize and design the network structure, initial learning rate, and learning rounds of the BiLSTM network model, thereby enhancing the accuracy of the model. The findings revealed that the combined model achieved a mean absolute percentage error (MAPE) of 0.90% and a coefficient of determination (R2) of 0.99. These results surpassed those of the other comparative models, demonstrating superior prediction accuracy, as well as exhibiting favorable generalization and prediction stability.
Collapse
Affiliation(s)
- Xinjing Qi
- College of Metrology and Measurement Engineering, China Jiliang University, Hangzhou, Zhejiang, China
| | - Huan Wang
- Ningbo China Resources Xingguang Gas Co Ltd, Ningbo, Zhejiang, China
| | - Yubo Ji
- Ningbo China Resources Xingguang Gas Co Ltd, Ningbo, Zhejiang, China
| | - Yuan Li
- Wuhan Gas & Heat and Design Institute Co Ltd, Wuhan, Hubei, China
| | - Xuguang Luo
- Wuhan Gas & Heat and Design Institute Co Ltd, Wuhan, Hubei, China
| | - Rongshan Nie
- College of Quality and Safety Engineering, China Jiliang University, Hangzhou, Zhejiang, China
| | - Xiaoyu Liang
- College of Metrology and Measurement Engineering, China Jiliang University, Hangzhou, Zhejiang, China
- College of Quality and Safety Engineering, China Jiliang University, Hangzhou, Zhejiang, China
| |
Collapse
|
45
|
Argade D, Khairnar V, Vora D, Patil S, Kotecha K, Alfarhood S. Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism. Heliyon 2024; 10:e26162. [PMID: 38420442 PMCID: PMC10900395 DOI: 10.1016/j.heliyon.2024.e26162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 01/28/2024] [Accepted: 02/08/2024] [Indexed: 03/02/2024] Open
Abstract
In recent decades, abstractive text summarization using multimodal input has attracted many researchers due to the capability of gathering information from various sources to create a concise summary. However, the existing methodologies based on multimodal summarization provide only a summary for the short videos and poor results for the lengthy videos. To address the aforementioned issues, this research presented the Multimodal Abstractive Summarization using Bidirectional Encoder Representations from Transformers (MAS-BERT) with an attention mechanism. The purpose of the video summarization is to increase the speed of searching for a large collection of videos so that the users can quickly decide whether the video is relevant or not by reading the summary. Initially, the data is obtained from the publicly available How2 dataset and is encoded using the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder and the Long Short Term Memory (LSTM) encoder. The textual data which is embedded in the embedding layer is encoded using a bidirectional GRU encoder and the features with audio and video data are encoded with LSTM encoder. After this, BERT based attention mechanism is used to combine the modalities and finally, the BI-GRU based decoder is used for summarizing the multimodalities. The results obtained through the experiments that show the proposed MAS-BERT has achieved a better result of 60.2 for Rouge-1 whereas, the existing Decoder-only Multimodal Transformer (D-MmT) and the Factorized Multimodal Transformer based Decoder Only Language model (FLORAL) has achieved 49.58 and 56.89 respectively. Our work facilitates users by providing better contextual information and user experience and would help video-sharing platforms for customer retention by allowing users to search for relevant videos by looking at its summary.
Collapse
Affiliation(s)
- Dakshata Argade
- Terna Engineering College, Nerul, Navi Mumbai, 400706, India
| | | | - Deepali Vora
- Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, 412115, India
| | - Shruti Patil
- Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, 412115, India
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India
| | - Sultan Alfarhood
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O.Box 51178, Riyadh, 11543, Saudi Arabia
| |
Collapse
|
46
|
Wei W, Zhang L, Yang K, Li J, Cui N, Han Y, Zhang N, Yang X, Tan H, Wang K. A lightweight network for traffic sign recognition based on multi-scale feature and attention mechanism. Heliyon 2024; 10:e26182. [PMID: 38420439 PMCID: PMC10900943 DOI: 10.1016/j.heliyon.2024.e26182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 01/29/2024] [Accepted: 02/08/2024] [Indexed: 03/02/2024] Open
Abstract
Traffic sign recognition is an important part of intelligent transportation system. It uses computer vision and traffic sign recognition technology to detect and recognize traffic signs on the road automatically. In this paper, we propose a lightweight model for traffic sign recognition based on convolutional neural networks called ConvNeSe. Firstly, the feature extraction module of the model is constructed using the Depthwise Separable Convolution and Inverted Residuals structures. The model extracts multi-scale features with strong representation ability by optimizing the structure of convolutional neural networks and fusing of features. Then, the model introduces Squeeze and Excitation Block (SE Block) to improve the attention to important features, which can capture key information of traffic sign images. Finally, the accuracy of the model in the German Traffic Sign Recognition Benchmark Database (GTSRB) is 99.85%. At the same time, the model has good robustness according to the results of ablation experiments.
Collapse
Affiliation(s)
- Wei Wei
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Lili Zhang
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Kang Yang
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Jing Li
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Ning Cui
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Yucheng Han
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Ning Zhang
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Xudong Yang
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Hongxin Tan
- Science and Technology on Complex Aviation Systems Simulation Laboratory, Beijing, 100076, China
| | - Kai Wang
- Institute of National Defense Science and Technology Innovation, Academy of Military Sciences, Beijing, 100036, China
| |
Collapse
|
47
|
Jakkaladiki SP, Maly F. Integrating hybrid transfer learning with attention-enhanced deep learning models to improve breast cancer diagnosis. PeerJ Comput Sci 2024; 10:e1850. [PMID: 38435578 PMCID: PMC10909230 DOI: 10.7717/peerj-cs.1850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 01/10/2024] [Indexed: 03/05/2024]
Abstract
Cancer, with its high fatality rate, instills fear in countless individuals worldwide. However, effective diagnosis and treatment can often lead to a successful cure. Computer-assisted diagnostics, especially in the context of deep learning, have become prominent methods for primary screening of various diseases, including cancer. Deep learning, an artificial intelligence technique that enables computers to reason like humans, has recently gained significant attention. This study focuses on training a deep neural network to predict breast cancer. With the advancements in medical imaging technologies such as X-ray, magnetic resonance imaging (MRI), and computed tomography (CT) scans, deep learning has become essential in analyzing and managing extensive image datasets. The objective of this research is to propose a deep-learning model for the identification and categorization of breast tumors. The system's performance was evaluated using the breast cancer identification (BreakHis) classification datasets from the Kaggle repository and the Wisconsin Breast Cancer Dataset (WBC) from the UCI repository. The study's findings demonstrated an impressive accuracy rate of 100%, surpassing other state-of-the-art approaches. The suggested model was thoroughly evaluated using F1-score, recall, precision, and accuracy metrics on the WBC dataset. Training, validation, and testing were conducted using pre-processed datasets, leading to remarkable results of 99.8% recall rate, 99.06% F1-score, and 100% accuracy rate on the BreakHis dataset. Similarly, on the WBC dataset, the model achieved a 99% accuracy rate, a 98.7% recall rate, and a 99.03% F1-score. These outcomes highlight the potential of deep learning models in accurately diagnosing breast cancer. Based on our research, it is evident that the proposed system outperforms existing approaches in this field.
Collapse
Affiliation(s)
- Sudha Prathyusha Jakkaladiki
- Faculty of Informatics and Management, University of Hradec Králové, Hradec Kralove, Hradec Kralove, Czech Republic
| | - Filip Maly
- Faculty of Informatics and Management, University of Hradec Králové, Hradec Kralove, Hradec Kralove, Czech Republic
| |
Collapse
|
48
|
Chen Y, Li X, Lv N, He Z, Wu B. Automatic detection method for tobacco beetles combining multi-scale global residual feature pyramid network and dual-path deformable attention. Sci Rep 2024; 14:4862. [PMID: 38418868 PMCID: PMC10902385 DOI: 10.1038/s41598-024-55347-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
Aiming at the problems of identifying storage pest tobacco pest beetles from images that have few object pixels and considerable image noise, and therefore suffer from lack of information and identifiable features, this paper proposes an automatic monitoring method of tobacco beetle based on Multi-scale Global residual Feature Pyramid Network and Dual-path Deformable Attention (MGrFPN-DDrGAM). Firstly, a Multi-scale Global residual Feature Pyramid Network (MGrFPN) is constructed to obtain rich high-level semantic features and more complete information on low-level features to reduce missed detection; Then, a Dual-path Deformable receptive field Guided Attention Module (DDrGAM) is designed to establish long-range channel dependence, guide the effective fusion of features and improve the localization accuracy of tobacco beetles by fitting the spatial geometric deformation features of and capturing the spatial information of feature maps with different scales to enrich the feature information in the channel and spatial. Finally, to simulate a real scene, a multi-scene tobacco beetle dataset is created. The dataset includes 28,080 images and manually labeled tobacco beetle objects. The experimental results show that under the framework of the Faster R-CNN algorithm, the detection precision and recall rate of this method can reach 91.4% and 98.4% when the intersection ratio (IoU) is 0.5. Compared with Faster R-CNN and FPN, when the intersection ratio (IoU) is 0.7, the detection precision is improved by 32.9% and 6.9%, respectively. The proposed method is superior to the current mainstream methods.
Collapse
Affiliation(s)
- Yuling Chen
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
- Mianyang Teachers' College, Mianyang, 621000, Sichuan, China
| | - Xiaoxia Li
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
- Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Mianyang, 621010, Sichuan, China
| | - Nianzu Lv
- Xinjiang Institute of Technology, Aksu, 13558, Xinjiang, China
| | - Zhenxiang He
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
- Tianfu College of Southwest University of Finance and Economics, Mianyang, 621000, Sichuan, China
| | - Bin Wu
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China.
- Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Mianyang, 621010, Sichuan, China.
| |
Collapse
|
49
|
He X, Zhang H, Huang J, Zhao D, Li Y, Nie R, Liu X. [Research on fault diagnosis of patient monitor based on text mining]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2024; 41:168-176. [PMID: 38403618 PMCID: PMC10894744 DOI: 10.7507/1001-5515.202306017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
The conventional fault diagnosis of patient monitors heavily relies on manual experience, resulting in low diagnostic efficiency and ineffective utilization of fault maintenance text data. To address these issues, this paper proposes an intelligent fault diagnosis method for patient monitors based on multi-feature text representation, improved bidirectional gate recurrent unit (BiGRU) and attention mechanism. Firstly, the fault text data was preprocessed, and the word vectors containing multiple linguistic features was generated by linguistically-motivated bidirectional encoder representation from Transformer. Then, the bidirectional fault features were extracted and weighted by the improved BiGRU and attention mechanism respectively. Finally, the weighted loss function is used to reduce the impact of class imbalance on the model. To validate the effectiveness of the proposed method, this paper uses the patient monitor fault dataset for verification, and the macro F1 value has achieved 91.11%. The results show that the model built in this study can realize the automatic classification of fault text, and may provide assistant decision support for the intelligent fault diagnosis of the patient monitor in the future.
Collapse
Affiliation(s)
- Xiangfei He
- School of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Hehua Zhang
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
- School of Biological Information, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Jing Huang
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| | - Dechun Zhao
- School of Biological Information, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Yang Li
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| | - Rui Nie
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| | - Xianghua Liu
- Department of Medical Engineering, Daping Hospital of Army Medical University, Chongqing 400042, P. R. China
| |
Collapse
|
50
|
Wang X, Liu J. Vegetable disease detection using an improved YOLOv8 algorithm in the greenhouse plant environment. Sci Rep 2024; 14:4261. [PMID: 38383751 PMCID: PMC10881480 DOI: 10.1038/s41598-024-54540-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 02/14/2024] [Indexed: 02/23/2024] Open
Abstract
This study introduces YOLOv8n-vegetable, a model designed to address challenges related to imprecise detection of vegetable diseases in greenhouse plant environment using existing network models. The model incorporates several improvements and optimizations to enhance its effectiveness. Firstly, a novel C2fGhost module replaces partial C2f. with GhostConv based on Ghost lightweight convolution, reducing the model's parameters and improving detection performance. Second, the Occlusion Perception Attention Module (OAM) is integrated into the Neck section to better preserve feature information after fusion, enhancing vegetable disease detection in greenhouse settings. To address challenges associated with detecting small-sized objects and the depletion of semantic knowledge due to varying scales, an additional layer for detecting small-sized objects is included. This layer improves the amalgamation of extensive and basic semantic knowledge, thereby enhancing overall detection accuracy. Finally, the HIoU boundary loss function is introduced, leading to improved convergence speed and regression accuracy. These improvement strategies were validated through experiments using a self-built vegetable disease detection dataset in a greenhouse environment. Multiple experimental comparisons have demonstrated the model's effectiveness, achieving the objectives of improving detection speed while maintaining accuracy and real-time detection capability. According to experimental findings, the enhanced model exhibited a 6.46% rise in mean average precision (mAP) over the original model on the self-built vegetable disease detection dataset under greenhouse conditions. Additionally, the parameter quantity and model size decreased by 0.16G and 0.21 MB, respectively. The proposed model demonstrates significant advancements over the original algorithm and exhibits strong competitiveness when compared with other advanced object detection models. The lightweight and fast detection of vegetable diseases offered by the proposed model presents promising applications in vegetable disease detection tasks.
Collapse
Affiliation(s)
- Xuewei Wang
- Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China
| | - Jun Liu
- Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China.
| |
Collapse
|