1
|
Jiang R, Zheng X, Sun J, Chen L, Xu G, Zhang R. Classification for Alzheimer's disease and frontotemporal dementia via resting-state electroencephalography-based coherence and convolutional neural network. Cogn Neurodyn 2025; 19:46. [PMID: 40051486 PMCID: PMC11880455 DOI: 10.1007/s11571-025-10232-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 02/08/2025] [Accepted: 02/12/2025] [Indexed: 03/09/2025] Open
Abstract
The study aimed to diagnose of Alzheimer's Disease (AD) and Frontotemporal Dementia (FTD) based on brain functional connectivity features extracted via resting-state Electroencephalographic (EEG) signals, and subsequently developed a convolutional neural network (CNN) model, Coherence-CNN, for classification. First, a publicly available dataset of EEG resting state-closed eye recordings containing 36 AD subjects, 23 FTD subjects, and 29 cognitively normal (CN) subjects was used. Then, coherence metrics were utilized to quantify brain functional connectivity, and the differences in coherence between groups across various frequency bands were investigated. Next, spectral clustering was used to analyze variations and differences in brain functional connectivity related to disease states, revealing distinct connectivity patterns in brain electrode position maps. The results demonstrated that brain functional connectivity between different regions was more robust in the CN group, while the AD and FTD groups exhibited various degrees of connectivity decline, reflecting the pronounced differences in connectivity patterns associated with each condition. Furthermore, Coherence-CNN was developed based on CNN and the feature of coherence for three-class classification, achieving a commendable accuracy of 94.32% through leave-one-out cross-validation. This study revealed that Coherence-CNN demonstrated significant performance for distinguishing AD, FTD, and CN groups, supporting the disorder of brain functional connectivity in AD and FTD.
Collapse
Affiliation(s)
- Rundong Jiang
- School of Mathematics, Northwest University, Xi’an, China
| | - Xiaowei Zheng
- School of Mathematics, Northwest University, Xi’an, China
- Medical Big Data Research Center, Northwest University, Xi’an, China
| | - Jiamin Sun
- School of Mathematics, Northwest University, Xi’an, China
| | - Lei Chen
- School of Mathematics, Northwest University, Xi’an, China
| | - Guanghua Xu
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, China
- State Key Laboratory for Manufacturing Systems Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Rui Zhang
- School of Mathematics, Northwest University, Xi’an, China
- Medical Big Data Research Center, Northwest University, Xi’an, China
| |
Collapse
|
2
|
Hong Y, Pan H, Jia Y, Sun W, Gao H. ResDNet: Efficient Dense Multi-Scale Representations With Residual Learning for High-Level Vision Tasks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3904-3915. [PMID: 35533173 DOI: 10.1109/tnnls.2022.3169779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep feature fusion plays a significant role in the strong learning ability of convolutional neural networks (CNNs) for computer vision tasks. Recently, works continually demonstrate the advantages of efficient aggregation strategy and some of them refer to multiscale representations. In this article, we describe a novel network architecture for high-level computer vision tasks where densely connected feature fusion provides multiscale representations for the residual network. We term our method the ResDNet which is a simple and efficient backbone made up of sequential ResDNet modules containing the variants of dense blocks named sliding dense blocks (SDBs). Compared with DenseNet, ResDNet enhances the feature fusion and reduces the redundancy by shallower densely connected architectures. Experimental results on three classification benchmarks including CIFAR-10, CIFAR-100, and ImageNet demonstrate the effectiveness of ResDNet. ResDNet always outperforms DenseNet using much less computation on CIFAR-100. On ImageNet, ResDNet-B-129 achieves 1.94% and 0.89% top-1 accuracy improvement over ResNet-50 and DenseNet-201 with similar complexity. Besides, ResDNet with more than 1000 layers achieves remarkable accuracy on CIFAR compared with other state-of-the-art results. Based on MMdetection implementation of RetinaNet, ResDNet-B-129 improves mAP from 36.3 to 39.5 compared with ResNet-50 on COCO dataset.
Collapse
|
3
|
Huang J, Li Y, Meng B, Zhang Y, Wei Y, Dai X, An D, Zhao Y, Fang X. ProteoNet: A CNN-based framework for analyzing proteomics MS-RGB images. iScience 2024; 27:111362. [PMID: 39679296 PMCID: PMC11638609 DOI: 10.1016/j.isci.2024.111362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 06/15/2024] [Accepted: 11/07/2024] [Indexed: 12/17/2024] Open
Abstract
Proteomics is crucial in clinical research, yet the clinical application of proteomic data remains challenging. Transforming proteomic mass spectrometry (MS) data into red, green, and blue color (MS-RGB) image formats and applying deep learning (DL) techniques has shown great potential to enhance analysis efficiency. However, current DL models often fail to extract subtle, crucial features from MS-RGB data. To address this, we developed ProteoNet, a deep learning framework that refines MS-RGB data analysis. ProteoNet incorporates semantic partitioning, adaptive average pooling, and weighted factors into the Convolutional Neural Network (CNN) model, thus enhancing data analysis accuracy. Our experiments with proteomics data from urine, blood, and tissue samples related to liver, kidney, and thyroid diseases demonstrate that ProteoNet outperforms existing models in accuracy. ProteoNet also provides a direct conversion method for MS-RGB data, enabling a seamless workflow. Moreover, its compatibility with various CNN architectures, including lightweight models like MobileNetV2, underscores its scalability and clinical potential.
Collapse
Affiliation(s)
- Jinze Huang
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| | - Yimin Li
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Bo Meng
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| | - Yong Zhang
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yaoguang Wei
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Xinhua Dai
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| | - Dong An
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Yang Zhao
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Xiang Fang
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| |
Collapse
|
4
|
Yuan J, Zhu H, Li S, Thierry B, Yang CT, Zhang C, Zhou X. Truncated M13 phage for smart detection of E. coli under dark field. J Nanobiotechnology 2024; 22:599. [PMID: 39363262 PMCID: PMC11451008 DOI: 10.1186/s12951-024-02881-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 09/26/2024] [Indexed: 10/05/2024] Open
Abstract
BACKGROUND The urgent need for affordable and rapid detection methodologies for foodborne pathogens, particularly Escherichia coli (E. coli), highlights the importance of developing efficient and widely accessible diagnostic systems. Dark field microscopy, although effective, requires specific isolation of the target bacteria which can be hindered by the high cost of producing specialized antibodies. Alternatively, M13 bacteriophage, which naturally targets E. coli, offers a cost-efficient option with well-established techniques for its display and modification. Nevertheless, its filamentous structure with a large length-diameter ratio contributes to nonspecific binding and low separation efficiency, posing significant challenges. Consequently, refining M13 phage methodologies and their integration with advanced microscopy techniques stands as a critical pathway to improve detection specificity and efficiency in food safety diagnostics. METHODS We employed a dual-plasmid strategy to generate a truncated M13 phage (tM13). This engineered tM13 incorporates two key genetic modifications: a partial mutation at the N-terminus of pIII and biotinylation at the hydrophobic end of pVIII. These alterations enable efficient attachment of tM13 to diverse E. coli strains, facilitating rapid magnetic separation. For detection, we additionally implemented a convolutional neural network (CNN)-based algorithm for precise identification and quantification of bacterial cells using dark field microscopy. RESULTS The results obtained from spike-in and clinical sample analyses demonstrated the accuracy, high sensitivity (with a detection limit of 10 CFU/μL), and time-saving nature (30 min) of our tM13-based immunomagnetic enrichment approach combined with AI-enabled analytics, thereby supporting its potential to facilitate the identification of diverse E. coli strains in complex samples. CONCLUSION The study established a rapid and accurate detection strategy for E. coli utilizing truncated M13 phages as capture probes, along with a dark field microscopy detection platform that integrates an image processing model and convolutional neural network.
Collapse
Affiliation(s)
- Jiasheng Yuan
- College of Veterinary Medicine, Institute of Comparative Medicine, Yangzhou University, Yangzhou, 225009, China
- Jiangsu Coinnovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, 225009, China
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, The Ministry of Education of China, Yangzhou University, Yangzhou, 225009, China
| | - Huquan Zhu
- College of Veterinary Medicine, Institute of Comparative Medicine, Yangzhou University, Yangzhou, 225009, China
| | - Shixinyi Li
- College of Veterinary Medicine, Institute of Comparative Medicine, Yangzhou University, Yangzhou, 225009, China
| | - Benjamin Thierry
- Future Industries Institute, University of South Australia, Mawson Lakes Campus, Adelaide, SA, 5095, Australia
| | - Chih-Tsung Yang
- Future Industries Institute, University of South Australia, Mawson Lakes Campus, Adelaide, SA, 5095, Australia
| | - Chen Zhang
- School of Information Engineering, Yangzhou University, Yangzhou, 225127, China.
- Jiangsu Province Engineering Research Centre of Knowledge Management and Intelligent Service, Yangzhou University, Yangzhou, 225127, China.
| | - Xin Zhou
- College of Veterinary Medicine, Institute of Comparative Medicine, Yangzhou University, Yangzhou, 225009, China.
- Jiangsu Coinnovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, 225009, China.
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, The Ministry of Education of China, Yangzhou University, Yangzhou, 225009, China.
| |
Collapse
|
5
|
Liu Y, Zhang Y, Wang Y, Hou F, Yuan J, Tian J, Zhang Y, Shi Z, Fan J, He Z. A Survey of Visual Transformers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7478-7498. [PMID: 37015131 DOI: 10.1109/tnnls.2022.3227717] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing Transformer-liked architectures in the computer vision (CV) field, which have demonstrated their effectiveness on three fundamental CV tasks (classification, detection, and segmentation) as well as multiple sensory data stream (images, point clouds, and vision-language data). Because of their competitive modeling capabilities, the visual Transformers have achieved impressive performance improvements over multiple benchmarks as compared with modern convolution neural networks (CNNs). In this survey, we have reviewed over 100 of different visual Transformers comprehensively according to three fundamental CV tasks and different data stream types, where taxonomy is proposed to organize the representative methods according to their motivations, structures, and application scenarios. Because of their differences on training settings and dedicated vision tasks, we have also evaluated and compared all these existing visual Transformers under different configurations. Furthermore, we have revealed a series of essential but unexploited aspects that may empower such visual Transformers to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between the visual Transformers and the sequential ones. Finally, two promising research directions are suggested for future investment. We will continue to update the latest articles and their released source codes at https://github.com/liuyang-ict/awesome-visual-transformers.
Collapse
|
6
|
Kumari S, Chowdhry J, Chandra Garg M. AI-enhanced adsorption modeling: Challenges, applications, and bibliographic analysis. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119968. [PMID: 38171130 DOI: 10.1016/j.jenvman.2023.119968] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/24/2023] [Accepted: 12/24/2023] [Indexed: 01/05/2024]
Abstract
Inorganic and organic contaminants, such as fertilisers, heavy metals, and dyes, are the primary causes of water pollution. The field of artificial intelligence (AI) has received significant interest due to its capacity to address challenges across various fields. The use of AI techniques in water treatment and desalination has recently shown useful for optimising processes and dealing with the challenges of water pollution and scarcity. The utilization of AI in the water treatment industry is anticipated to result in a reduction in operational expenditures through the lowering of procedure costs and the optimisation of chemical utilization. The predictive capabilities of artificial intelligence models have accurately assessed the efficacy of different adsorbents in removing contaminants from wastewater. This article provides an overview of the various AI techniques and how they can be used in the adsorption of contaminants during the water treatment process. The reviewed publications were analysed for their diversity in journal type, publication year, research methodology, and initial study context. Citation network analysis, an objective method, and tools like VOSviewer are used to find these groups. The primary issues that need to be addressed include the availability and selection of data, low reproducibility, and little proof of uses in real water treatment. The provision of challenges is essential to ensure the prospective success of AI associated with technologies. The brief overview holds importance to everyone involved in the field of water, encompassing scientists, engineers, students, and stakeholders.
Collapse
Affiliation(s)
- Sheetal Kumari
- Amity Institute of Environmental Science (AIES), Amity University Uttar Pradesh, Sector-125, Noida, 201313, Gautam Budh Nagar, India
| | | | - Manoj Chandra Garg
- Amity Institute of Environmental Science (AIES), Amity University Uttar Pradesh, Sector-125, Noida, 201313, Gautam Budh Nagar, India.
| |
Collapse
|
7
|
Fan G, Gan M, Fan B, Chen CLP. Multiscale Cross-Connected Dehazing Network With Scene Depth Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1598-1612. [PMID: 35776818 DOI: 10.1109/tnnls.2022.3184164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, we propose a multiscale cross-connected dehazing network with scene depth fusion. We focus on the correlation between a hazy image and the corresponding depth image. The model encodes and decodes the hazy image and the depth image separately and includes cross connections at the decoding end to directly generate a clean image in an end-to-end manner. Specifically, we first construct an input pyramid to obtain the receptive fields of the depth image and the hazy image at multiple levels. Then, we add the features of the corresponding dimensions in the input pyramid to the encoder. Finally, the two paths of the decoder are cross-connected. In addition, the proposed model uses wavelet pooling and residual channel attention modules (RCAMs) as components. A series of ablation experiments shows that the wavelet pooling and RCAMs effectively improve the performance of the model. We conducted extensive experiments on multiple dehazing datasets, and the results show that the model is superior to other advanced methods in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and subjective visual effects. The source code and supplementary are available at https://github.com/CCECfgd/MSCDN-master.
Collapse
|
8
|
Luo Q, Zhu H, Zhu J, Li Y, Yu Y, Lei L, Lin F, Zhou M, Cui L, Zhu T, Li X, Zuo H, Yang X. Artificial intelligence-enabled 8-lead ECG detection of atrial septal defect among adults: a novel diagnostic tool. Front Cardiovasc Med 2023; 10:1279324. [PMID: 38028503 PMCID: PMC10679442 DOI: 10.3389/fcvm.2023.1279324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open
Abstract
Background Patients with atrial septal defect (ASD) exhibit distinctive electrocardiogram (ECG) patterns. However, ASD cannot be diagnosed solely based on these differences. Artificial intelligence (AI) has been widely used for specifically diagnosing cardiovascular diseases other than arrhythmia. Our study aimed to develop an artificial intelligence-enabled 8-lead ECG to detect ASD among adults. Method In this study, our AI model was trained and validated using 526 ECGs from patients with ASD and 2,124 ECGs from a control group with a normal cardiac structure in our hospital. External testing was conducted at Wuhan Central Hospital, involving 50 ECGs from the ASD group and 46 ECGs from the normal group. The model was based on a convolutional neural network (CNN) with a residual network to classify 8-lead ECG data into either the ASD or normal group. We employed a 10-fold cross-validation approach. Results Statistically significant differences (p < 0.05) were observed in the cited ECG features between the ASD and normal groups. Our AI model performed well in identifying ECGs in both the ASD group [accuracy of 0.97, precision of 0.90, recall of 0.97, specificity of 0.97, F1 score of 0.93, and area under the curve (AUC) of 0.99] and the normal group within the training and validation datasets from our hospital. Furthermore, these corresponding indices performed impressively in the external test data set with the accuracy of 0.82, precision of 0.90, recall of 0.74, specificity of 0.91, F1 score of 0.81 and the AUC of 0.87. And the series of experiments of subgroups to discuss specific clinic situations associated to this issue was remarkable as well. Conclusion An ECG-based detection of ASD using an artificial intelligence algorithm can be achieved with high diagnostic performance, and it shows great clinical promise. Our research on AI-enabled 8-lead ECG detection of ASD in adults is expected to provide robust references for early detection of ASD, healthy pregnancies, and related decision-making. A lower number of leads is also more favorable for the application of portable devices, which it is expected that this technology will bring significant economic and societal benefits.
Collapse
Affiliation(s)
- Qiushi Luo
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Hongling Zhu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jiabing Zhu
- Wuhan Zoncare Bio-Medical Electronics Co., Ltd, Wuhan, China
| | - Yi Li
- Wuhan Zoncare Bio-Medical Electronics Co., Ltd, Wuhan, China
| | - Yang Yu
- Division of Cardiology, the Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lei Lei
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Fan Lin
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Minghe Zhou
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Longyan Cui
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Tao Zhu
- Wuhan Zoncare Bio-Medical Electronics Co., Ltd, Wuhan, China
| | - Xuefei Li
- Wuhan National High Magnetic Field Center, Huazhong University of Science and Technology, Wuhan, China
| | - Huakun Zuo
- Wuhan National High Magnetic Field Center, Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoyun Yang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
9
|
Cao J, Pang Y, Han J, Li X. Hierarchical Regression and Classification for Accurate Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2425-2439. [PMID: 34695000 DOI: 10.1109/tnnls.2021.3106641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Accurate object detection requires correct classification and high-quality localization. Currently, most of the single shot detectors (SSDs) conduct simultaneous classification and regression using a fully convolutional network. Despite high efficiency, this structure has some inappropriate designs for accurate object detection. The first one is the mismatch of bounding box classification, where the classification results of the default bounding boxes are improperly treated as the results of the regressed bounding boxes during the inference. The second one is that only one-time regression is not good enough for high-quality object localization. To solve the problem of classification mismatch, we propose a novel reg-offset-cls (ROC) module including three hierarchical steps: the regression of the default bounding box, the prediction of new feature sampling locations, and the classification of the regressed bounding box with more accurate features. For high-quality localization, we stack two ROC modules together. The input of the second ROC module is the output of the first ROC module. In addition, we inject a feature enhanced (FE) module between two stacked ROC modules to extract more contextual information. The experiments on three different datasets (i.e., MS COCO, PASCAL VOC, and UAVDT) are performed to demonstrate the effectiveness and superiority of our method. Without any bells or whistles, our proposed method outperforms state-of-the-art one-stage methods at a real-time speed. The source code is available at https://github.com/JialeCao001/HSD.
Collapse
|
10
|
Cao J, Pang Y, Anwer RM, Cholakkal H, Khan FS, Shao L. SipMaskv2: Enhanced Fast Image and Video Instance Segmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3798-3812. [PMID: 37815954 DOI: 10.1109/tpami.2022.3180564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2023]
Abstract
We propose a fast single-stage method for both image and video instance segmentation, called SipMask, that preserves the instance spatial information by performing multiple sub-region mask predictions. The main module in our method is a light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for the sub-regions within a bounding-box, enabling a better delineation of spatially adjacent instances. To better correlate mask prediction with object detection, we further propose a mask alignment weighting loss and a feature alignment scheme. In addition, we identify two issues that impede the performance of single-stage instance segmentation and introduce two modules, including a sample selection scheme and an instance refinement module, to address these two issues. Experiments are performed on both image instance segmentation dataset MS COCO and video instance segmentation dataset YouTube-VIS. On MS COCO test-dev set, our method achieves a state-of-the-art performance. In terms of real-time capabilities, it outperforms YOLACT by a gain of 3.0% (mask AP) under the similar settings, while operating at a comparable speed. On YouTube-VIS validation set, our method also achieves promising results. The source code is available at https://github.com/JialeCao001/SipMask.
Collapse
|
11
|
Dong W, Hou S, Xiao S, Qu J, Du Q, Li Y. Generative Dual-Adversarial Network With Spectral Fidelity and Spatial Enhancement for Hyperspectral Pansharpening. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7303-7317. [PMID: 34111007 DOI: 10.1109/tnnls.2021.3084745] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hyperspectral (HS) pansharpening is of great importance in improving the spatial resolution of HS images for remote sensing tasks. HS image comprises abundant spectral contents, whereas panchromatic (PAN) image provides spatial information. HS pansharpening constitutes the possibility for providing the pansharpened image with both high spatial and spectral resolution. This article develops a specific pansharpening framework based on a generative dual-adversarial network (called PS-GDANet). Specifically, the pansharpening problem is formulated as a dual task that can be solved by a generative adversarial network (GAN) with two discriminators. The spatial discriminator forces the intensity component of the pansharpened image to be as consistent as possible with the PAN image, and the spectral discriminator helps to preserve spectral information of the original HS image. Instead of designing a deep network, PS-GDANet extends GANs to two discriminators and provides a high-resolution pansharpened image in a fraction of iterations. The experimental results demonstrate that PS-GDANet outperforms several widely accepted state-of-the-art pansharpening methods in terms of qualitative and quantitative assessment.
Collapse
|
12
|
Xu H, Zhao Z. NetBCE: An Interpretable Deep Neural Network for Accurate Prediction of Linear B-cell Epitopes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:1002-1012. [PMID: 36526218 PMCID: PMC10025766 DOI: 10.1016/j.gpb.2022.11.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 10/27/2022] [Accepted: 11/11/2022] [Indexed: 12/15/2022]
Abstract
Identification of B-cell epitopes (BCEs) plays an essential role in the development of peptide vaccines and immuno-diagnostic reagents, as well as antibody design and production. In this work, we generated a large benchmark dataset comprising 124,879 experimentally supported linear epitope-containing regions in 3567 protein clusters from over 1.3 million B cell assays. Analysis of this curated dataset showed large pathogen diversity covering 176 different families. The accuracy in linear BCE prediction was found to strongly vary with different features, while all sequence-derived and structural features were informative. To search more efficient and interpretive feature representations, a ten-layer deep learning framework for linear BCE prediction, namely NetBCE, was developed. NetBCE achieved high accuracy and robust performance with the average area under the curve (AUC) value of 0.8455 in five-fold cross-validation through automatically learning the informative classification features. NetBCE substantially outperformed the conventional machine learning algorithms and other tools, with more than 22.06% improvement of AUC value compared to other tools using an independent dataset. Through investigating the output of important network modules in NetBCE, epitopes and non-epitopes tended to be presented in distinct regions with efficient feature representation along the network layer hierarchy. The NetBCE is freely available at https://github.com/bsml320/NetBCE.
Collapse
Affiliation(s)
- Haodong Xu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
| |
Collapse
|
13
|
Luo Y, Lu J, Jiang X, Zhang B. Learning From Architectural Redundancy: Enhanced Deep Supervision in Deep Multipath Encoder-Decoder Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4271-4284. [PMID: 33587717 DOI: 10.1109/tnnls.2021.3056384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep encoder-decoders are the model of choice for pixel-level estimation due to their redundant deep architectures. Yet they still suffer from the vanishing supervision information issue that affects convergence because of their overly deep architectures. In this work, we propose and theoretically derive an enhanced deep supervision (EDS) method which improves on conventional deep supervision (DS) by incorporating variance minimization into the optimization. A new structure variance loss is introduced to build a bridge between deep encoder-decoders and variance minimization, and provides a new way to minimize the variance by forcing different intermediate decoding outputs (paths) to reach an agreement. We also design a focal weighting strategy to effectively combine multiple losses in a scale-balanced way, so that the supervision information is sufficiently enforced throughout the encoder-decoders. To evaluate the proposed method on the pixel-level estimation task, a novel multipath residual encoder is proposed and extensive experiments are conducted on four challenging density estimation and crowd counting benchmarks. The experimental results demonstrate the superiority of our EDS over other paradigms, and improved estimation performance is reported using our deeply supervised encoder-decoder.
Collapse
|
14
|
Wang B, Xue B, Zhang M. Surrogate-Assisted Particle Swarm Optimization for Evolving Variable-Length Transferable Blocks for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3727-3740. [PMID: 33556026 DOI: 10.1109/tnnls.2021.3054400] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep convolutional neural networks (CNNs) have demonstrated promising performance on image classification tasks, but the manual design process becomes more and more complex due to the fast depth growth and the increasingly complex topologies of CNNs. As a result, neural architecture search (NAS) has emerged to automatically design CNNs that outperform handcrafted counterparts. However, the computational cost is immense, e.g., 22400 GPU-days and 2000 GPU-days for two outstanding NAS works named NAS and NASNet, respectively, which motivates this work. A new effective and efficient surrogate-assisted particle swarm optimization (PSO) algorithm is proposed to automatically evolve CNNs. This is achieved by proposing a novel surrogate model, a new method of creating a surrogate data set, and a new encoding strategy to encode variable-length blocks of CNNs, all of which are integrated into a PSO algorithm to form the proposed method. The proposed method shows its effectiveness by achieving the competitive error rates of 3.49% on the CIFAR-10 data set, 18.49% on the CIFAR-100 data set, and 1.82% on the SVHN data set. The CNN blocks are efficiently learned by the proposed method from CIFAR-10 within 3 GPU-days due to the acceleration achieved by the surrogate model and the surrogate data set to avoid the training of 80.1% of CNN blocks represented by the particles. Without any further search, the evolved blocks from CIFAR-10 can be successfully transferred to CIFAR-100, SVHN, and ImageNet, which exhibits the transferability of the block learned by the proposed method.
Collapse
|
15
|
TR-Net: A Transformer-Based Neural Network for Point Cloud Processing. MACHINES 2022. [DOI: 10.3390/machines10070517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Point cloud is a versatile geometric representation that could be applied in computer vision tasks. On account of the disorder of point cloud, it is challenging to design a deep neural network used in point cloud analysis. Furthermore, most existing frameworks for point cloud processing either hardly consider the local neighboring information or ignore context-aware and spatially-aware features. To deal with the above problems, we propose a novel point cloud processing architecture named TR-Net, which is based on transformer. This architecture reformulates the point cloud processing task as a set-to-set translation problem. TR-Net directly operates on raw point clouds without any data transformation or annotation, which reduces the consumption of computing resources and memory usage. Firstly, a neighborhood embedding backbone is designed to effectively extract the local neighboring information from point cloud. Then, an attention-based sub-network is constructed to better learn a semantically abundant and discriminatory representation from embedded features. Finally, effective global features are yielded through feeding the features extracted by attention-based sub-network into a residual backbone. For different downstream tasks, we build different decoders. Extensive experiments on the public datasets illustrate that our approach outperforms other state-of-the-art methods. For example, our TR-Net performs 93.1% overall accuracy on the ModelNet40 dataset and the TR-Net archives a mIou of 85.3% on the ShapeNet dataset for part segmentation.
Collapse
|
16
|
Zhang H. A Review of Convolutional Neural Network Development in Computer Vision. EAI ENDORSED TRANSACTIONS ON INTERNET OF THINGS 2022. [DOI: 10.4108/eetiot.v7i28.445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Convolutional neural networks have made admirable progress in computer vision. As a fast-growing computer field, CNNs are one of the classical and widely used network structures. The Internet of Things (IoT) has gotten a lot of attention in recent years. This has directly led to the vigorous development of AI technology, such as the intelligent luggage security inspection system developed by the IoT, intelligent fire alarm system, driverless car, drone technology, and other cutting-edge directions. This paper first outlines the structure of CNNs, including the convolutional layer, the downsampling layer, and the fully connected layer, all of which play an important role. Then some different modules of classical networks are described, and these modules are rapidly driving the development of CNNs. And then the current state of CNNs research in image classification, object segmentation, and object detection is discussed.
Collapse
|
17
|
CAFC-Net: A Critical and Align Feature Constructing Network for Oriented Ship Detection in Aerial Images. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3391391. [PMID: 35251146 PMCID: PMC8894055 DOI: 10.1155/2022/3391391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 01/20/2022] [Indexed: 11/30/2022]
Abstract
Ship detection is one of the fundamental tasks in computer vision. In recent years, the methods based on convolutional neural networks have made great progress. However, improvement of ship detection in aerial images is limited by large-scale variation, aspect ratio, and dense distribution. In this paper, a Critical and Align Feature Constructing Network (CAFC-Net) which is an end-to-end single-stage rotation detector is proposed to improve ship detection accuracy. The framework is formed by three modules: a Biased Attention Module (BAM), a Feature Alignment Module (FAM), and a Distinctive Detection Module (DDM). Specifically, the BAM extracts biased critical features for classification and regression. With the extracted biased regression features, the FAM generates high-quality anchor boxes. Through a novel Alignment Convolution, convolutional features can be aligned according to anchor boxes. The DDM produces orientation-sensitive feature and reconstructs orientation-invariant features to alleviate inconsistency between classification and localization accuracy. Extensive experiments on two remote sensing datasets HRS2016 and self-built ship datasets show the state-of-the-art performance of our detector.
Collapse
|
18
|
Kashyap R. Breast Cancer Histopathological Image Classification Using Stochastic Dilated Residual Ghost Model. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2022. [DOI: 10.4018/ijirr.289655] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
A new deep learning-based classification model called the Stochastic Dilated Residual Ghost (SDRG) was proposed in this work for categorizing histopathology images of breast cancer. The SDRG model used the proposed Multiscale Stochastic Dilated Convolution (MSDC) model, a ghost unit, stochastic upsampling, and downsampling units to categorize breast cancer accurately. This study addresses four primary issues: first, strain normalization was used to manage color divergence, data augmentation with several factors was used to handle the overfitting. The second challenge is extracting and enhancing tiny and low-level information such as edge, contour, and color accuracy; it is done by the proposed multiscale stochastic and dilation unit. The third contribution is to remove redundant or similar information from the convolution neural network using a ghost unit. According to the assessment findings, the SDRG model scored overall 95.65 percent accuracy rates in categorizing images with a precision of 99.17 percent, superior to state-of-the-art approaches.
Collapse
Affiliation(s)
- Ramgopal Kashyap
- Amity School of Engineering and Technology, Amity University, Raipur, India
| |
Collapse
|
19
|
Jovel J, Greiner R. An Introduction to Machine Learning Approaches for Biomedical Research. Front Med (Lausanne) 2021; 8:771607. [PMID: 34977072 PMCID: PMC8716730 DOI: 10.3389/fmed.2021.771607] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 11/18/2021] [Indexed: 11/15/2022] Open
Abstract
Machine learning (ML) approaches are a collection of algorithms that attempt to extract patterns from data and to associate such patterns with discrete classes of samples in the data—e.g., given a series of features describing persons, a ML model predicts whether a person is diseased or healthy, or given features of animals, it predicts weather an animal is treated or control, or whether molecules have the potential to interact or not, etc. ML approaches can also find such patterns in an agnostic manner, i.e., without having information about the classes. Respectively, those methods are referred to as supervised and unsupervised ML. A third type of ML is reinforcement learning, which attempts to find a sequence of actions that contribute to achieving a specific goal. All of these methods are becoming increasingly popular in biomedical research in quite diverse areas including drug design, stratification of patients, medical images analysis, molecular interactions, prediction of therapy outcomes and many more. We describe several supervised and unsupervised ML techniques, and illustrate a series of prototypical examples using state-of-the-art computational approaches. Given the complexity of reinforcement learning, it is not discussed in detail here, instead, interested readers are referred to excellent reviews on that topic. We focus on concepts rather than procedures, as our goal is to attract the attention of researchers in biomedicine toward the plethora of powerful ML methods and their potential to leverage basic and applied research programs.
Collapse
Affiliation(s)
- Juan Jovel
- The Metabolomics Innovation Centre, University of Alberta, Edmonton, AB, Canada
- *Correspondence: Juan Jovel
| | - Russell Greiner
- Faculty of Science-Computing Science, University of Alberta, Edmonton, AB, Canada
- Russell Greiner
| |
Collapse
|
20
|
|
21
|
Apicella A, Donnarumma F, Isgrò F, Prevete R. A survey on modern trainable activation functions. Neural Netw 2021; 138:14-32. [DOI: 10.1016/j.neunet.2021.01.026] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 12/17/2020] [Accepted: 01/25/2021] [Indexed: 01/07/2023]
|
22
|
An Innovative Intelligent System with Integrated CNN and SVM: Considering Various Crops through Hyperspectral Image Data. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10040242] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Generation of a thematic map is important for scientists and agriculture engineers in analyzing different crops in a given field. Remote sensing data are well-accepted for image classification on a vast area of crop investigation. However, most of the research has currently focused on the classification of pixel-based image data for analysis. The study was carried out to develop a multi-category crop hyperspectral image classification system to identify the major crops in the Chiayi Golden Corridor. The hyperspectral image data from CASI (Compact Airborne Spectrographic Imager) were used as the experimental data in this study. A two-stage classification was designed to display the performance of the image classification. More specifically, the study used a multi-class classification by support vector machine (SVM) + convolutional neural network (CNN) for image classification analysis. SVM is a supervised learning model that analyzes data used for classification. CNN is a class of deep neural networks that is applied to analyzing visual imagery. The image classification comparison was made among four crops (paddy rice, potatoes, cabbages, and peanuts), roads, and structures for classification. In the first stage, the support vector machine handled the hyperspectral image classification through pixel-based analysis. Then, the convolution neural network improved the classification of image details through various blocks (cells) of segmentation in the second stage. A series of discussion and analyses of the results are presented. The repair module was also designed to link the usage of CNN and SVM to remove the classification errors.
Collapse
|
23
|
|
24
|
Hui L, Bo Z, Linquan H, Jiabao G, Yifan L. FoolChecker: A platform to evaluate the robustness of images against adversarial attacks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
|
26
|
Sun W, Su Y, Wu X, Wu X. A novel end-to-end 1D-ResCNN model to remove artifact from EEG signals. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.029] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
27
|
Defect Detection of Industry Wood Veneer Based on NAS and Multi-Channel Mask R-CNN. SENSORS 2020; 20:s20164398. [PMID: 32781740 PMCID: PMC7472158 DOI: 10.3390/s20164398] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 11/17/2022]
Abstract
Wood veneer defect detection plays a vital role in the wood veneer production industry. Studies on wood veneer defect detection usually focused on detection accuracy for industrial applications but ignored algorithm execution speed; thus, their methods do not meet the required speed of online detection. In this paper, a new detection method is proposed that achieves high accuracy and a suitable speed for online production. Firstly, 2838 wood veneer images were collected using data collection equipment developed in the laboratory and labeled by experienced workers from a wood company. Then, an integrated model, glance multiple channel mask region convolution neural network (R-CNN), was constructed to detect wood veneer defects, which included a glance network and a multiple channel mask R-CNN. Neural network architect search technology was used to automatically construct the glance network with the lowest number of floating-point operations to pick out potential defect images out of numerous original wood veneer images. A genetic algorithm was used to merge the intermediate features extracted by the glance network. Multi-Channel Mask R-CNN was then used to classify and locate the defects. The experimental results show that the proposed method achieves a 98.70% overall classification accuracy and a 95.31% mean average precision, and only 2.5 s was needed to detect a batch of 50 standard images and 50 defective images. Compared with other wood veneer defect detection methods, the proposed method is more accurate and faster.
Collapse
|
28
|
Jiang X, Zhang L, Lv P, Guo Y, Zhu R, Li Y, Pang Y, Li X, Zhou B, Xu M. Learning Multi-Level Density Maps for Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2705-2715. [PMID: 31562106 DOI: 10.1109/tnnls.2019.2933920] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
People in crowd scenes often exhibit the characteristic of imbalanced distribution. On the one hand, people size varies largely due to the camera perspective. People far away from the camera look smaller and are likely to occlude each other, whereas people near to the camera look larger and are relatively sparse. On the other hand, the number of people also varies greatly in the same or different scenes. This article aims to develop a novel model that can accurately estimate the crowd count from a given scene with imbalanced people distribution. To this end, we have proposed an effective multi-level convolutional neural network (MLCNN) architecture that first adaptively learns multi-level density maps and then fuses them to predict the final output. Density map of each level focuses on dealing with people of certain sizes. As a result, the fusion of multi-level density maps is able to tackle the large variation in people size. In addition, we introduce a new loss function named balanced loss (BL) to impose relatively BL feedback during training, which helps further improve the performance of the proposed network. Furthermore, we introduce a new data set including 1111 images with a total of 49 061 head annotations. MLCNN is easy to train with only one end-to-end training stage. Experimental results demonstrate that our MLCNN achieves state-of-the-art performance. In particular, our MLCNN reaches a mean absolute error (MAE) of 242.4 on the UCF_CC_50 data set, which is 37.2 lower than the second-best result.
Collapse
|
29
|
Chen M, Hao Y. Label-less Learning for Emotion Cognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2430-2440. [PMID: 31425055 DOI: 10.1109/tnnls.2019.2929071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we propose a label-less learning for emotion cognition (LLEC) to achieve the utilization of a large amount of unlabeled data. We first inspect the unlabeled data from two perspectives, i.e., the feature layer and the decision layer. By utilizing the similarity model and the entropy model, this paper presents a hybrid label-less learning that can automatically label data without human intervention. Then, we design an enhanced hybrid label-less learning to purify the automatic labeled data. To further improve the accuracy of emotion detection model and increase the utilization of unlabeled data, we apply enhanced hybrid label-less learning for multimodal unlabeled emotion data. Finally, we build a real-world test bed to evaluate the LLEC algorithm. The experimental results show that the LLEC algorithm can improve the accuracy of emotion detection significantly.
Collapse
|
30
|
Fu X, Liang B, Huang Y, Ding X, Paisley J. Lightweight Pyramid Networks for Image Deraining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1794-1807. [PMID: 31329133 DOI: 10.1109/tnnls.2019.2926481] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Existing deep convolutional neural networks (CNNs) have found major success in image deraining, but at the expense of an enormous number of parameters. This limits their potential applications, e.g., in mobile devices. In this paper, we propose a lightweight pyramid networt (LPNet) for single-image deraining. Instead of designing a complex network structure, we use domain-specific knowledge to simplify the learning process. In particular, we find that by introducing the mature Gaussian-Laplacian image pyramid decomposition technology to the neural network, the learning problem at each pyramid level is greatly simplified and can be handled by a relatively shallow network with few parameters. We adopt recursive and residual network structures to build the proposed LPNet, which has less than 8K parameters while still achieving the state-of-the-art performance on rain removal. We also discuss the potential value of LPNet for other low- and high-level vision tasks.
Collapse
|
31
|
Li Y, Pang Y, Wang K, Li X. Toward improving ECG biometric identification using cascaded convolutional neural networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
32
|
Low-rank discriminative regression learning for image classification. Neural Netw 2020; 125:245-257. [PMID: 32146355 DOI: 10.1016/j.neunet.2020.02.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 01/14/2020] [Accepted: 02/13/2020] [Indexed: 11/21/2022]
Abstract
As a famous multivariable analysis technique, regression methods, such as ridge regression, are widely used for image representation and dimensionality reduction. However, the metric of ridge regression and its variants is always the Frobenius norm (F-norm), which is sensitive to outliers and noise in data. At the same time, the performance of the ridge regression and its extensions is limited by the class number of the data. To address these problems, we propose a novel regression learning method which named low-rank discriminative regression learning (LDRL) for image representation. LDRL assumes that the input data is corrupted and thus the L1 norm can be used as a sparse constraint on the noised matrix to recover the clean data for regression, which can improve the robustness of the algorithm. Due to learn a novel project matrix that is not limited by the number of classes, LDRL is suitable for classifying the data set no matter whether there is a small or large number of classes. The performance of the proposed LDRL is evaluated on six public image databases. The experimental results prove that LDRL obtains better performance than existing regression methods.
Collapse
|
33
|
Chung JH, Kim DW, Kang TK, Lim MT. Traffic Sign Recognition in Harsh Environment Using Attention Based Convolutional Pooling Neural Network. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10211-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Cao J, Pang Y, Han J, Gao B, Li X. Taking a Look at Small-Scale Pedestrians and Occluded Pedestrians. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3143-3152. [PMID: 31831419 DOI: 10.1109/tip.2019.2957927] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Small-scale pedestrian detection and occluded pedestrian detection are two challenging tasks. However, most state-of-the-art methods merely handle one single task each time, thus giving rise to relatively poor performance when the two tasks, in practice, are required simultaneously. In this paper, it is found that small-scale pedestrian detection and occluded pedestrian detection actually have a common problem, i.e., an inaccurate location problem. Therefore, solving this problem enables to improve the performance of both tasks. To this end, we pay more attention to the predicted bounding box with worse location precision and extract more contextual information around objects, where two modules (i.e., location bootstrap and semantic transition) are proposed. The location bootstrap is used to reweight regression loss, where the loss of the predicted bounding box far from the corresponding ground-truth is upweighted and the loss of the predicted bounding box near the corresponding ground-truth is downweighted. Additionally, the semantic transition adds more contextual information and relieves semantic inconsistency of the skip-layer fusion. Since the location bootstrap is not used at the test stage and the semantic transition is lightweight, the proposed method does not add many extra computational costs during inference. Experiments on the challenging CityPersons and Caltech datasets show that the proposed method outperforms the state-of-the-art methods on the small-scale pedestrians and occluded pedestrians (e.g., 5.20% and 4.73% improvements on the Caltech).
Collapse
|
35
|
Rehman SU, Tu S, Waqas M, Huang Y, Rehman OU, Ahmad B, Ahmad S. Unsupervised pre-trained filter learning approach for efficient convolution neural network. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.06.084] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
36
|
Xie G, Yang K, Zhang T, Wang J, Lai J. Balanced Decoupled Spatial Convolution for CNNs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3419-3432. [PMID: 30714934 DOI: 10.1109/tnnls.2019.2892035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we are interested in designing lightweight CNNs by decoupling the convolution along the spatial and channel dimension. Most existing decoupling techniques focus on approximating the filter matrix through decomposition. In contrast, we provide a decoupled view of the standard convolution to separate the spatial information and the channel information. The resulting decoupled process is exactly equivalent to the standard convolution. Inspired from our decoupled view, we propose an effective structure, balanced decoupled spatial convolution (BDSC), to relax the sparsity of the filter in spatial aggregation by learning a spatial configuration and reduce the redundancy by reducing the number of intermediate channels. We also designed an adaptive spatial configuration, which is simply adding a nonlinear activation layer [rectified linear units (ReLU)] after the intermediate output. Our experiments verify that the adaptive spatial configuration can improve the classification performance without extra cost. In addition, our BDSC achieves comparable classification performance with the standard convolution but with a smaller model size on Canadian Institute for Advanced Research (CIFAR)-100, CIFAR-10, and ImageNet. To show the potential of further reducing the redundancy of across channel-domain convolution, we also show experiments of our models with a designed lightweight across channel-domain convolution. Finally, we show in our experiments that our models achieve superior performance than the state-of-the-art models.
Collapse
|
37
|
Pang Y, Zhou B, Nie F. Simultaneously Learning Neighborship and Projection Matrix for Supervised Dimensionality Reduction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2779-2793. [PMID: 30640633 DOI: 10.1109/tnnls.2018.2886317] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Explicitly or implicitly, most dimensionality reduction methods need to determine which samples are neighbors and the similarities between the neighbors in the original high-dimensional space. The projection matrix is then learnt on the assumption that the neighborhood information, e.g., the similarities, are known and fixed prior to learning. However, it is difficult to precisely measure the intrinsic similarities of samples in high-dimensional space because of the curse of dimensionality. Consequently, the neighbors selected according to such similarities and the projection matrix obtained according to such similarities and the corresponding neighbors might not be optimal in the sense of classification and generalization. To overcome this drawback, in this paper, we propose to let the similarities and neighbors be variables and model these in a low-dimensional space. Both the optimal similarity and projection matrix are obtained by minimizing a unified objective function. Nonnegative and sum-to-one constraints on the similarity are adopted. Instead of empirically setting the regularization parameter, we treat it as a variable to be optimized. It is interesting that the optimal regularization parameter is adaptive to the neighbors in a low-dimensional space and has an intuitive meaning. Experimental results on the YALE B, COIL-100, and MNIST data sets demonstrate the effectiveness of the proposed method.
Collapse
|
38
|
Liu Y, Han J, Zhang Q, Shan C. Deep Salient Object Detection with Contextual Information Guidance. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:360-374. [PMID: 31380760 DOI: 10.1109/tip.2019.2930906] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Integration of multi-level contextual information, such as feature maps and side outputs, is crucial for Convolutional Neural Networks (CNNs) based salient object detection. However, most existing methods either simply concatenate multi-level feature maps or calculate element-wise addition of multi-level side outputs, thus failing to take full advantages of them. In this work, we propose a new strategy for guiding multi-level contextual information integration, where feature maps and side outputs across layers are fully engaged. Specifically, shallower-level feature maps are guided by the deeper-level side outputs to learn more accurate properties of the salient object. In turn, the deeper-level side outputs can be propagated to high-resolution versions with spatial details complemented by means of shallower-level feature maps. Moreover, a group convolution module is proposed with the aim to achieve high-discriminative feature maps, in which the backbone feature maps are divided into a number of groups and then the convolution is applied to the channels of backbone feature maps within each group. Eventually, the group convolution module is incorporated in the guidance module to further promote the guidance role. Experiments on three public benchmark datasets verify the effectiveness and superiority of the proposed method over the state-of-the-art methods.
Collapse
|
39
|
|
40
|
Yang L, Song Q, Wu Y, Hu M. Attention Inspiring Receptive-Fields Network for Learning Invariant Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1744-1755. [PMID: 30371393 DOI: 10.1109/tnnls.2018.2873722] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we describe a simple and highly efficient module for image classification, which we term the "Attention Inspiring Receptive-fields" (Air) module. We effectively convert the spatial attention mechanism into a plug-in module. In addition, we reveal the relationship between the spatial attention mechanism and the receptive fields, indicating that the proper use of the spatial attention mechanism can effectively increase the receptive fields of the module, which is able to enhance translation invariance and scale invariance of the network. By integrating the Air module into advanced convolutional neural networks (such as ResNet and ResNeXt), we can construct AirNet architectures for learning invariant representations and gain significant improvements on challenging data sets. We present extensive experiments on CIFAR and ImageNet data sets to verify the effectiveness and feature invariance of the Air module and explore more concise and efficient designs of the proposed module. On ImageNet classification, our AirNet-50 and AirNet-101 (ResNet-50/101 with Air module) achieve 1.69% and 1.50% top-1 accuracy improvement with a small amount of extra computation and parameters compared with the original ResNet. We make models and code public available https://github.com/soeaver/AirNet-PyTorch. We further demonstrate that AirNet has a good ability for transfer learning and measure the performance on Microsoft Common Objects in Context object detection, instance segmentation, and pose estimation.
Collapse
|
41
|
Li Y, Pang Y, Wang J, Li X. Patient-specific ECG classification by deeper CNN from generic to dedicated. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.06.068] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
42
|
Yu Y, Ji Z, Guo J, Pang Y. Transductive Zero-Shot Learning With Adaptive Structural Embedding. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4116-4127. [PMID: 29035229 DOI: 10.1109/tnnls.2017.2753852] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Zero-shot learning (ZSL) endows the computer vision system with the inferential capability to recognize new categories that have never seen before. Two fundamental challenges in it are visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps, respectively. This paper presents two corresponding methods named Adaptive STructural Embedding (ASTE) and Self-PAced Selective Strategy (SPASS) for both challenges. Specifically, ASTE formulates the visual-semantic interactions in a latent structural support vector machine framework by adaptively adjusting the slack variables to embody different reliablenesses among training instances. To alleviate the domain shift problem in ZSL, SPASS borrows the idea from self-paced learning by iteratively selecting the unseen instances from reliable to less reliable to gradually adapt the knowledge from the seen domain to the unseen domain. Consequently, by combining SPASS and ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively reinforce the classification capacity. Extensive experiments on three benchmark data sets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and TASTE. Furthermore, we also propose a fast training (FT) strategy to improve the efficiency of most existing ZSL methods. The FT strategy is surprisingly simple and general enough, which speeds up the training time of most existing ZSL methods by 4~300 times while holding the previous performance.
Collapse
|
43
|
A CNN-SIFT Hybrid Pedestrian Navigation Method Based on First-Person Vision. REMOTE SENSING 2018. [DOI: 10.3390/rs10081229] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The emergence of new wearable technologies, such as action cameras and smart glasses, has driven the use of the first-person perspective in computer applications. This field is now attracting the attention and investment of researchers aiming to develop methods to process first-person vision (FPV) video. The current approaches present particular combinations of different image features and quantitative methods to accomplish specific objectives, such as object detection, activity recognition, user–machine interaction, etc. FPV-based navigation is necessary in some special areas, where Global Position System (GPS) or other radio-wave strength methods are blocked, and is especially helpful for visually impaired people. In this paper, we propose a hybrid structure with a convolutional neural network (CNN) and local image features to achieve FPV pedestrian navigation. A novel end-to-end trainable global pooling operator, called AlphaMEX, has been designed to improve the scene classification accuracy of CNNs. A scale-invariant feature transform (SIFT)-based tracking algorithm is employed for movement estimation and trajectory tracking of the person through each frame of FPV images. Experimental results demonstrate the effectiveness of the proposed method. The top-1 error rate of the proposed AlphaMEX-ResNet outperforms the original ResNet (k = 12) by 1.7% on the ImageNet dataset. The CNN-SIFT hybrid pedestrian navigation system reaches 0.57 m average absolute error, which is an adequate accuracy for pedestrian navigation. Both positions and movements can be well estimated by the proposed pedestrian navigation algorithm with a single wearable camera.
Collapse
|
44
|
Jiang X, Pang Y, Sun M, Li X. Cascaded Subpatch Networks for Effective CNNs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2684-2694. [PMID: 28504949 DOI: 10.1109/tnnls.2017.2689098] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Conventional convolutional neural networks use either a linear or a nonlinear filter to extract features from an image patch (region) of spatial size (typically, is small and is equal to , e.g., is 5 or 7). Generally, the size of the filter is equal to the size of the input patch. We argue that the representational ability of equal-size strategy is not strong enough. To overcome the drawback, we propose to use subpatch filter whose spatial size is smaller than . The proposed subpatch filter consists of two subsequent filters. The first one is a linear filter of spatial size and is aimed at extracting features from spatial domain. The second one is of spatial size and is used for strengthening the connection between different input feature channels and for reducing the number of parameters. The subpatch filter convolves with the input patch and the resulting network is called a subpatch network. Taking the output of one subpatch network as input, we further repeat constructing subpatch networks until the output contains only one neuron in spatial domain. These subpatch networks form a new network called the cascaded subpatch network (CSNet). The feature layer generated by CSNet is called the csconv layer. For the whole input image, we construct a deep neural network by stacking a sequence of csconv layers. Experimental results on five benchmark data sets demonstrate the effectiveness and compactness of the proposed CSNet. For example, our CSNet reaches a test error of 5.68% on the CIFAR10 data set without model averaging. To the best of our knowledge, this is the best result ever obtained on the CIFAR10 data set.
Collapse
|
45
|
Chen B, Li J, Wei G, Ma B. M-SAC-VLADNet: A Multi-Path Deep Feature Coding Model for Visual Classification. ENTROPY 2018; 20:e20050341. [PMID: 33265431 PMCID: PMC7512860 DOI: 10.3390/e20050341] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 04/21/2018] [Accepted: 04/26/2018] [Indexed: 11/16/2022]
Abstract
Vector of locally aggregated descriptor (VLAD) coding has become an efficient feature coding model for retrieval and classification. In some recent works, the VLAD coding method is extended to a deep feature coding model which is called NetVLAD. NetVLAD improves significantly over the original VLAD method. Although the NetVLAD model has shown its potential for retrieval and classification, the discriminative ability is not fully researched. In this paper, we propose a new end-to-end feature coding network which is more discriminative than the NetVLAD model. First, we propose a sparsely-adaptive and covariance VLAD model. Next, we derive the back propagation models of all the proposed layers and extend the proposed feature coding model to an end-to-end neural network. Finally, we construct a multi-path feature coding network which aggregates multiple newly-designed feature coding networks for visual classification. Some experimental results show that our feature coding network is very effective for visual classification.
Collapse
Affiliation(s)
| | - Jie Li
- Correspondence: ; Tel.: +86-020-2223-6361
| | | | | |
Collapse
|
46
|
Cao J, Pang Y, Li X. Learning Multilayer Channel Features for Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3210-3220. [PMID: 28459686 DOI: 10.1109/tip.2017.2694224] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Pedestrian detection based on the combination of convolutional neural network (CNN) and traditional handcrafted features (i.e., HOG+LUV) has achieved great success. In general, HOG+LUV are used to generate the candidate proposals and then CNN classifies these proposals. Despite its success, there is still room for improvement. For example, CNN classifies these proposals by the fully connected layer features, while proposal scores and the features in the inner-layers of CNN are ignored. In this paper, we propose a unifying framework called multi-layer channel features (MCF) to overcome the drawback. It first integrates HOG+LUV with each layer of CNN into a multi-layer image channels. Based on the multi-layer image channels, a multi-stage cascade AdaBoost is then learned. The weak classifiers in each stage of the multi-stage cascade are learned from the image channels of corresponding layer. Experiments on Caltech data set, INRIA data set, ETH data set, TUD-Brussels data set, and KITTI data set are conducted. With more abundant features, an MCF achieves the state of the art on Caltech pedestrian data set (i.e., 10.40% miss rate). Using new and accurate annotations, an MCF achieves 7.98% miss rate. As many non-pedestrian detection windows can be quickly rejected by the first few stages, it accelerates detection speed by 1.43 times. By eliminating the highly overlapped detection windows with lower scores after the first stage, it is 4.07 times faster than negligible performance loss.
Collapse
|