1
|
Zhang K, Zhu D, Min X, Zhai G. Unified Approach to Mesh Saliency: Evaluating Textured and Non-Textured Meshes Through VR and Multifunctional Prediction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:3151-3160. [PMID: 40063447 DOI: 10.1109/tvcg.2025.3549550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2025]
Abstract
Mesh saliency aims to empower artificial intelligence with strong adaptability to highlight regions that naturally attract visual attention. Existing advances primarily emphasize the crucial role of geometric shapes in determining mesh saliency, but it remains challenging to flexibly sense the unique visual appeal brought by the realism of complex texture patterns. To investigate the interaction between geometric shapes and texture features in visual perception, we establish a comprehensive mesh saliency dataset, capturing saliency distributions for identical 3D models under both non-textured and textured conditions. Additionally, we propose a unified saliency prediction model applicable to various mesh types, providing valuable insights for both detailed modeling and realistic rendering applications. This model effectively analyzes the geometric structure of the mesh while seamlessly incorporating texture features into the topological framework, ensuring coherence throughout appearance-enhanced modeling. Through extensive theoretical and empirical validation, our approach not only enhances performance across different mesh types, but also demonstrates the model's scalability and generalizability, particularly through cross-validation of various visual features.
Collapse
|
2
|
Zhou J, Ren F. Scene categorization by Hessian-regularized active perceptual feature selection. Sci Rep 2025; 15:739. [PMID: 39753661 PMCID: PMC11698863 DOI: 10.1038/s41598-024-84181-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 12/20/2024] [Indexed: 01/06/2025] Open
Abstract
Decoding the semantic categories of complex sceneries is fundamental to numerous artificial intelligence (AI) infrastructures. This work presents an advanced selection of multi-channel perceptual visual features for recognizing scenic images with elaborate spatial structures, focusing on developing a deep hierarchical model dedicated to learning human gaze behavior. Utilizing the BING objectness measure, we efficiently localize objects or their details across varying scales within scenes. To emulate humans observing semantically or visually significant areas within scenes, we propose a robust deep active learning (RDAL) strategy. This strategy progressively generates gaze shifting paths (GSP) and calculates deep GSP representations within a unified architecture. A notable advantage of RDAL is the robustness to label noise, which is implemented by a carefully-designed sparse penalty term. This mechanism ensures that irrelevant or misleading deep GSP features are intelligently discarded. Afterward, a novel Hessian-regularized Feature Selector (HFS) is proposed to select high-quality features from the deep GSP features, wherein (i) the spatial composition of scenic patches can be optimally maintained, and (ii) a linear SVM is learned simultaneously. Empirical evaluations across six standard scenic datasets demonstrated our method's superior performance, highlighting its exceptional ability to differentiate various sophisticated scenery categories.
Collapse
Affiliation(s)
- Junwu Zhou
- School of Higher Vocational and Technical College, Shanghai Dianji University, Shanghai, 201306, China
| | - Fuji Ren
- College of Computer Sciences, Anhui University, Hefei, 230039, China.
| |
Collapse
|
3
|
Mamalakis M, Macfarlane SC, Notley SV, Gad AKB, Panoutsos G. A novel pipeline employing deep multi-attention channels network for the autonomous detection of metastasizing cells through fluorescence microscopy. Comput Biol Med 2024; 181:109052. [PMID: 39216406 DOI: 10.1016/j.compbiomed.2024.109052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 08/09/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]
Abstract
Metastasis driven by cancer cell migration is the leading cause of cancer-related deaths. It involves significant changes in the organization of the cytoskeleton, which includes the actin microfilaments and the vimentin intermediate filaments. Understanding how these filament change cells from normal to invasive offers insights that can be used to improve cancer diagnosis and therapy. We have developed a computational, transparent, large-scale and imaging-based pipeline, that can distinguish between normal human cells and their isogenically matched, oncogenically transformed, invasive and metastasizing counterparts, based on the spatial organization of actin and vimentin filaments in the cell cytoplasm. Due to the intricacy of these subcellular structures, manual annotation is not trivial to automate. We used established deep learning methods and our new multi-attention channel architecture. To ensure a high level of interpretability of the network, which is crucial for the application area, we developed an interpretable global explainable approach correlating the weighted geometric mean of the total cell images and their local GradCam scores. The methods offer detailed, objective and measurable understanding of how different components of the cytoskeleton contribute to metastasis, insights that can be used for future development of novel diagnostic tools, such as a nanometer level, vimentin filament-based biomarker for digital pathology, and for new treatments that significantly can increase patient survival.
Collapse
Affiliation(s)
- Michail Mamalakis
- School of Electrical and Electronic Engineering, University of Sheffield, Sheffield, UK; Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Infection, Immunity and Cardiovascular Disease, and Department of Computer science, Sheffield, UK; Department of Psychiatry, Cambridge University, Cambridge, UK.
| | - Sarah C Macfarlane
- Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK
| | - Scott V Notley
- Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, UK
| | - Annica K B Gad
- Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK; Madeira Chemistry Research Centre, University of Madeira, Funchal, Portugal; Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden
| | - George Panoutsos
- School of Electrical and Electronic Engineering, University of Sheffield, Sheffield, UK; Insigneo Institute for in-silico, Medicine, University of Sheffield, Sheffield, UK; Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK.
| |
Collapse
|
4
|
Liu G, Zhang J, Chan AB, Hsiao JH. Human attention guided explainable artificial intelligence for computer vision models. Neural Netw 2024; 177:106392. [PMID: 38788290 DOI: 10.1016/j.neunet.2024.106392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/11/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models.
Collapse
Affiliation(s)
- Guoyang Liu
- School of Integrated Circuits, Shandong University, Jinan, China; Department of Psychology, University of Hong Kong, Pokfulam Road, Hong Kong.
| | | | - Antoni B Chan
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| | - Janet H Hsiao
- Division of Social Science, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong; Department of Psychology, University of Hong Kong, Pokfulam Road, Hong Kong.
| |
Collapse
|
5
|
Liu X, Wang L. MSRMNet: Multi-scale skip residual and multi-mixed features network for salient object detection. Neural Netw 2024; 173:106144. [PMID: 38335792 DOI: 10.1016/j.neunet.2024.106144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/08/2023] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
The current models for the salient object detection (SOD) have made remarkable progress through multi-scale feature fusion strategies. However, the existing models have large deviations in the detection of different scales, and the target boundaries of the prediction images are still blurred. In this paper, we propose a new model addressing these issues using a transformer backbone to capture multiple feature layers. The model uses multi-scale skip residual connections during encoding to improve the accuracy of the model's predicted object position and edge pixel information. Furthermore, to extract richer multi-scale semantic information, we perform multiple mixed feature operations in the decoding stage. In addition, we add the structure similarity index measure (SSIM) function with coefficients in the loss function to enhance the accurate prediction performance of the boundaries. Experiments demonstrate that our algorithm achieves state-of-the-art results on five public datasets, and improves the performance metrics of the existing SOD tasks. Codes and results are available at: https://github.com/xxwudi508/MSRMNet.
Collapse
Affiliation(s)
- Xinlong Liu
- Sun Yat-Sen University, Guangzhou 510275, China.
| | - Luping Wang
- Sun Yat-Sen University, Guangzhou 510275, China.
| |
Collapse
|
6
|
Xu Z, Zhou T, Ma M, Deng C, Dai Q, Fang L. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 2024; 384:202-209. [PMID: 38603505 DOI: 10.1126/science.adl1203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 03/11/2024] [Indexed: 04/13/2024]
Abstract
The pursuit of artificial general intelligence (AGI) continuously demands higher computing performance. Despite the superior processing speed and efficiency of integrated photonic circuits, their capacity and scalability are restricted by unavoidable errors, such that only simple tasks and shallow models are realized. To support modern AGIs, we designed Taichi-large-scale photonic chiplets based on an integrated diffractive-interference hybrid design and a general distributed computing architecture that has millions-of-neurons capability with 160-tera-operations per second per watt (TOPS/W) energy efficiency. Taichi experimentally achieved on-chip 1000-category-level classification (testing at 91.89% accuracy in the 1623-category Omniglot dataset) and high-fidelity artificial intelligence-generated content with up to two orders of magnitude of improvement in efficiency. Taichi paves the way for large-scale photonic computing and advanced tasks, further exploiting the flexibility and potential of photonics for modern AGI.
Collapse
Affiliation(s)
- Zhihao Xu
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, China
| | - Tiankuang Zhou
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
- Institute for Brain and Cognitive Science, Tsinghua University (THUIBCS), Beijing, China
| | - Muzhou Ma
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - ChenChen Deng
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
| | - Qionghai Dai
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
- Institute for Brain and Cognitive Science, Tsinghua University (THUIBCS), Beijing, China
- Department of Automation, Tsinghua University, Beijing, China
| | - Lu Fang
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
- Institute for Brain and Cognitive Science, Tsinghua University (THUIBCS), Beijing, China
| |
Collapse
|
7
|
Li S, Seger CA, Zhang J, Liu M, Dong W, Liu W, Chen Q. Alpha oscillations encode Bayesian belief updating underlying attentional allocation in dynamic environments. Neuroimage 2023; 284:120464. [PMID: 37984781 DOI: 10.1016/j.neuroimage.2023.120464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/13/2023] [Accepted: 11/17/2023] [Indexed: 11/22/2023] Open
Abstract
In a dynamic environment, expectations of the future constantly change based on updated evidence and affect the dynamic allocation of attention. To further investigate the neural mechanisms underlying attentional expectancies, we employed a modified Central Cue Posner Paradigm in which the probability of cues being valid (that is, accurately indicated the upcoming target location) was manipulated. Attentional deployment to the cued location (α), which was governed by precision of predictions on previous trials, was estimated using a hierarchical Bayesian model and was included as a regressor in the analyses of electrophysiological (EEG) data. Our results revealed that before the target appeared, alpha oscillations (8∼13 Hz) for high-predictability cues (88 % valid) were significantly predicted by precision-dependent attention (α). This relationship was not observed under low-predictability conditions (69 % and 50 % valid cues). After the target appeared, precision-dependent attention (α) correlated with alpha band oscillations only in the valid cue condition and not in the invalid condition. Further analysis under conditions of significant attentional modulation by precision suggested a separate effect of cue orientation. These results provide new insights on how trial-by-trial Bayesian belief updating relates to alpha band encoding of environmentally-sensitive allocation of visual spatial attention.
Collapse
Affiliation(s)
- Siying Li
- School of Psychology, Shenzhen University, No. 3688, Nanhai Avenue, Shenzhen 518060, China
| | - Carol A Seger
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China; Department of Psychology, Colorado State University, Fort Collins, United States
| | - Jianfeng Zhang
- School of Psychology, Shenzhen University, No. 3688, Nanhai Avenue, Shenzhen 518060, China
| | - Meng Liu
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Wenshan Dong
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Wanting Liu
- School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Qi Chen
- School of Psychology, Shenzhen University, No. 3688, Nanhai Avenue, Shenzhen 518060, China.
| |
Collapse
|
8
|
Wang H, Zhang T, Zhang C, Shi L, Ng SYL, Yan HC, Yeung KCM, Wong JSH, Cheung KMC, Shea GKH. An intelligent composite model incorporating global / regional X-rays and clinical parameters to predict progressive adolescent idiopathic scoliosis curvatures and facilitate population screening. EBioMedicine 2023; 95:104768. [PMID: 37619449 PMCID: PMC10470293 DOI: 10.1016/j.ebiom.2023.104768] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 08/02/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023] Open
Abstract
BACKGROUND Adolescent idiopathic scoliosis (AIS) affects up to 5% of the population. The efficacy of school-aged screening remains controversial since it is uncertain which curvatures will progress following diagnosis and require treatment. Patient demographics, vertebral morphology, skeletal maturity, and bone quality represent individual risk factors for progression but have yet to be integrated towards accurate prognostication. The objective of this work was to develop composite machine learning-based prediction model to accurately predict AIS curves at-risk of progression. METHODS 1870 AIS patients with remaining growth potential were identified. Curve progression was defined by a Cobb angle increase in the major curve of ≥6° between first visit and skeletal maturity in curves that exceeded 25°. Separate prediction modules were developed for i) clinical data, ii) global/regional spine X-rays, and iii) hand X-rays. The hand X-ray module performed automated image classification and segmentation tasks towards estimation of skeletal maturity and bone mineral density. A late fusion strategy integrated these domains towards the prediction of progressive curves at first clinic visit. FINDINGS Composite model performance was assessed on a validation cohort and achieved an accuracy of 83.2% (79.3-83.6%, 95% confidence interval), sensitivity of 80.9% (78.2-81.9%), specificity of 83.6% (78.8-84.1%) and an AUC of 0.84 (0.81-0.85), outperforming single modality prediction models (AUC 0.65-0.78). INTERPRETATION The composite prediction model achieved a high degree of accuracy. Upon incorporation into school-aged screening programs, patients at-risk of progression may be prioritized to receive urgent specialist attention, more frequent follow-up, and pre-emptive treatment. FUNDING Funding from The Society for the Relief of Disabled Children was awarded to GKHS.
Collapse
Affiliation(s)
- Hongfei Wang
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Teng Zhang
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Changmeng Zhang
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Liangyu Shi
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Samuel Yan-Lik Ng
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Ho-Cheong Yan
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | | | - Janus Siu-Him Wong
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Kenneth Man-Chee Cheung
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China
| | - Graham Ka-Hon Shea
- Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, China.
| |
Collapse
|
9
|
Choi Y, Yu W, Nagarajan MB, Teng P, Goldin JG, Raman SS, Enzmann DR, Kim GHJ, Brown MS. Translating AI to Clinical Practice: Overcoming Data Shift with Explainability. Radiographics 2023; 43:e220105. [PMID: 37104124 PMCID: PMC10190133 DOI: 10.1148/rg.220105] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 09/09/2022] [Accepted: 09/23/2022] [Indexed: 04/28/2023]
Abstract
To translate artificial intelligence (AI) algorithms into clinical practice requires generalizability of models to real-world data. One of the main obstacles to generalizability is data shift, a data distribution mismatch between model training and real environments. Explainable AI techniques offer tools to detect and mitigate the data shift problem and develop reliable AI for clinical practice. Most medical AI is trained with datasets gathered from limited environments, such as restricted disease populations and center-dependent acquisition conditions. The data shift that commonly exists in the limited training set often causes a significant performance decrease in the deployment environment. To develop a medical application, it is important to detect potential data shift and its impact on clinical translation. During AI training stages, from premodel analysis to in-model and post hoc explanations, explainability can play a key role in detecting model susceptibility to data shift, which is otherwise hidden because the test data have the same biased distribution as the training data. Performance-based model assessments cannot effectively distinguish the model overfitting to training data bias without enriched test sets from external environments. In the absence of such external data, explainability techniques can aid in translating AI to clinical practice as a tool to detect and mitigate potential failures due to data shift. ©RSNA, 2023 Quiz questions for this article are available in the supplemental material.
Collapse
Affiliation(s)
- Youngwon Choi
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Wenxi Yu
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Mahesh B. Nagarajan
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Pangyu Teng
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Jonathan G. Goldin
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Steven S. Raman
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Dieter R. Enzmann
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Grace Hyun J. Kim
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Matthew S. Brown
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| |
Collapse
|
10
|
Diaz-Guerra F, Jimenez-Molina A. Continuous Prediction of Web User Visual Attention on Short Span Windows Based on Gaze Data Analytics. SENSORS (BASEL, SWITZERLAND) 2023; 23:2294. [PMID: 36850892 PMCID: PMC9960063 DOI: 10.3390/s23042294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/27/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Understanding users' visual attention on websites is paramount to enhance the browsing experience, such as providing emergent information or dynamically adapting Web interfaces. Existing approaches to accomplish these challenges are generally based on the computation of salience maps of static Web interfaces, while websites increasingly become more dynamic and interactive. This paper proposes a method and provides a proof-of-concept to predict user's visual attention on specific regions of a website with dynamic components. This method predicts the regions of a user's visual attention without requiring a constant recording of the current layout of the website, but rather by knowing the structure it presented in a past period. To address this challenge, the concept of visit intention is introduced in this paper, defined as the probability that a user, while browsing, will fixate their gaze on a specific region of the website in the next period. Our approach uses the gaze patterns of a population that browsed a specific website, captured via an eye-tracker device, to aid personalized prediction models built with individual visual kinetics features. We show experimentally that it is possible to conduct such a prediction through multilabel classification models using a small number of users, obtaining an average area under curve of 84.3%, and an average accuracy of 79%. Furthermore, the user's visual kinetics features are consistently selected in every set of a cross-validation evaluation.
Collapse
Affiliation(s)
| | - Angel Jimenez-Molina
- Department of Industrial Engineering, University of Chile, Santiago 8370456, Chile
- Engineering Complex Systems Institute, Santiago 8370398, Chile
| |
Collapse
|
11
|
Cui J, Zheng L, Yu Y, Lin Y, Ni H, Xu X, Zhang Z. Deeply‐Recursive Attention Network for video steganography. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Affiliation(s)
- Jiabao Cui
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Liangli Zheng
- School of Software Technology Zhejiang University Ningbo China
| | - Yunlong Yu
- College of Information Science and Electronic Engineering Zhejiang University Hangzhou China
| | - Yining Lin
- Shanghai SUPREMIND Technology Co., Ltd. Shanghai China
| | - Huajian Ni
- Shanghai SUPREMIND Technology Co., Ltd. Shanghai China
| | - Xin Xu
- College of Intelligence Science and Technology National University of Defense Technology Changsha China
| | - Zhongfei Zhang
- Department of Computer Science Binghamton University Binghamton New York USA
| |
Collapse
|
12
|
MSRT: multi-scale representation transformer for regression-based human pose estimation. Pattern Anal Appl 2023. [DOI: 10.1007/s10044-023-01130-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
13
|
Jezequel L, Vu NS, Beaudet J, Histace A. Efficient Anomaly Detection Using Self-Supervised Multi-Cue Tasks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:807-821. [PMID: 37018555 DOI: 10.1109/tip.2022.3231532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Anomaly detection is important in many real-life applications. Recently, self-supervised learning has greatly helped deep anomaly detection by recognizing several geometric transformations. However these methods lack finer features, usually highly depend on the anomaly type, and do not perform well on fine-grained problems. To address these issues, we first introduce in this work three novel and efficient discriminative and generative tasks which have complementary strength: (i) a piece-wise jigsaw puzzle task focuses on structure cues; (ii) a tint rotation recognition is used within each piece, taking into account the colorimetry information; (iii) and a partial re-colorization task considers the image texture. In order to make the re-colorization task more object-oriented than background-oriented, we propose to include the contextual color information of the image border via an attention mechanism. We then present a new out-of-distribution detection function and highlight its better stability compared to existing methods. Along with it, we also experiment different score fusion functions. Finally, we evaluate our method on an extensive protocol composed of various anomaly types, from object anomalies, style anomalies with fine-grained classification to local anomalies with face anti-spoofing datasets. Our model significantly outperforms state-of-the-art with up to 36% relative error improvement on object anomalies and 40% on face anti-spoofing problems.
Collapse
|
14
|
Fan N, Liu Q, Li X, Zhou Z, He Z. Siamese Residual Network for Efficient Visual Tracking. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
15
|
Jian M, Jin H, Liu X, Zhang L. Multiscale Cascaded Attention Network for Saliency Detection Based on ResNet. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22249950. [PMID: 36560319 PMCID: PMC9783234 DOI: 10.3390/s22249950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/06/2022] [Accepted: 12/14/2022] [Indexed: 06/12/2023]
Abstract
Saliency detection is a key research topic in the field of computer vision. Humans can be accurately and quickly mesmerized by an area of interest in complex and changing scenes through the visual perception area of the brain. Although existing saliency-detection methods can achieve competent performance, they have deficiencies such as unclear margins of salient objects and the interference of background information on the saliency map. In this study, to improve the defects during saliency detection, a multiscale cascaded attention network was designed based on ResNet34. Different from the typical U-shaped encoding-decoding architecture, we devised a contextual feature extraction module to enhance the advanced semantic feature extraction. Specifically, a multiscale cascade block (MCB) and a lightweight channel attention (CA) module were added between the encoding and decoding networks for optimization. To address the blur edge issue, which is neglected by many previous approaches, we adopted the edge thinning module to carry out a deeper edge-thinning process on the output layer image. The experimental results illustrate that this method can achieve competitive saliency-detection performance, and the accuracy and recall rate are improved compared with those of other representative methods.
Collapse
Affiliation(s)
- Muwei Jian
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
- School of Information Science and Technology, Linyi University, Linyi 276012, China
| | - Haodong Jin
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
| | - Xiangyu Liu
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
| | - Linsong Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
| |
Collapse
|
16
|
Risnandar. DeSa COVID-19: Deep salient COVID-19 image-based quality assessment. JOURNAL OF KING SAUD UNIVERSITY. COMPUTER AND INFORMATION SCIENCES 2022; 34:9501-9512. [PMID: 38620925 PMCID: PMC8647162 DOI: 10.1016/j.jksuci.2021.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 11/14/2021] [Accepted: 11/16/2021] [Indexed: 04/17/2024]
Abstract
This study offers an advanced method to evaluate the coronavirus disease 2019 (COVID-19) image quality. The salient COVID-19 image map is incorporated with the deep convolutional neural network (DCNN), namely DeSa COVID-19, which exerts the n-convex method for the full-reference image quality assessment (FR-IQA). The glaring outcomes substantiate that DeSa COVID-19 and the recommended DCNN architecture can convey a remarkable accomplishment on the COVID-chestxray and the COVID-CT datasets, respectively. The salient COVID-19 image map is also gauged in the minuscule COVID-19 image patches. The exploratory results attest that DeSa COVID-19 and the recommended DCNN methods are very good accomplishment compared with other advanced methods on COVID-chestxray and COVID-CT datasets, respectively. The recommended DCNN also acquires the enhanced outgrowths faced with several advanced full-reference-medical-image-quality-assessment (FR-MIQA) techniques in the fast fading (FF), blocking artifact (BA), white noise Gaussian (WG), JPEG, and JPEG2000 (JP2K) in the distorted and undistorted COVID-19 images. The Spearman's rank order correlation coefficient (SROCC) and the linear correlation coefficient (LCC) appraise the recommended DCNN and DeSa COVID-19 fulfillment which are compared the recent FR-MIQA methods. The DeSa COVID-19 evaluation outshines 2.63 % and 2.62 % higher compared the recommended DCNN, and 28.53 % and 29.01 % esteem all of advanced FR-MIQAs methods on SROCC and LCC measures, respectively. The shift add operations of trigonometric, logarithmic, and exponential functions are mowed down in the computational complexity of the DeSa COVID-19 and the recommended DCNN. The DeSa COVID-19 more superior the recommended DCNN and also the other recent full-reference medical image quality assessment methods.
Collapse
Affiliation(s)
- Risnandar
- The Intelligent Systems Research Group, School of Computing, Telkom University, Jl. Telekomunikasi No. 1, Terusan Buahbatu-Dayeuhkolot, Bandung, West Java 40257 Indonesia
- The Computer Vision Research Group, the Research Center for Informatics, Indonesian Institute of Sciences (LIPI) and the National Research and Innovation Agency (BRIN), Republic of Indonesia, Jl. Sangkuriang/Cisitu No.21/154D LIPI Building 20th, 3rd Floor, Bandung, West Java, 40135 Indonesia
| |
Collapse
|
17
|
Audio–visual collaborative representation learning for Dynamic Saliency Prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
18
|
Liu N, Li L, Zhao W, Han J, Shao L. Instance-Level Relative Saliency Ranking With Graph Reasoning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8321-8337. [PMID: 34437057 DOI: 10.1109/tpami.2021.3107872] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Conventional salient object detection models cannot differentiate the importance of different salient objects. Recently, two works have been proposed to detect saliency ranking by assigning different degrees of saliency to different objects. However, one of these models cannot differentiate object instances and the other focuses more on sequential attention shift order inference. In this paper, we investigate a practical problem setting that requires simultaneously segment salient instances and infer their relative saliency rank order. We present a novel unified model as the first end-to-end solution, where an improved Mask R-CNN is first used to segment salient instances and a saliency ranking branch is then added to infer the relative saliency. For relative saliency ranking, we build a new graph reasoning module by combining four graphs to incorporate the instance interaction relation, local contrast, global contrast, and a high-level semantic prior, respectively. A novel loss function is also proposed to effectively train the saliency ranking branch. Besides, a new dataset and an evaluation metric are proposed for this task, aiming at pushing forward this field of research. Finally, experimental results demonstrate that our proposed model is more effective than previous methods. We also show an example of its practical usage on adaptive image retargeting.
Collapse
|
19
|
Zhou Y, Chang H, Lu X, Lu Y. DenseUNet: Improved image classification method using standard convolution and dense transposed convolution. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
20
|
Saliency Transfer Learning and Central-Cropping Network for Prostate Cancer Classification. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10999-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
21
|
Fuzzy Color Aura Matrices for Texture Image Segmentation. J Imaging 2022; 8:jimaging8090244. [PMID: 36135409 PMCID: PMC9504691 DOI: 10.3390/jimaging8090244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/23/2022] [Accepted: 09/02/2022] [Indexed: 11/23/2022] Open
Abstract
Fuzzy gray-level aura matrices have been developed from fuzzy set theory and the aura concept to characterize texture images. They have proven to be powerful descriptors for color texture classification. However, using them for color texture segmentation is difficult because of their high memory and computation requirements. To overcome this problem, we propose to extend fuzzy gray-level aura matrices to fuzzy color aura matrices, which would allow us to apply them to color texture image segmentation. Unlike the marginal approach that requires one fuzzy gray-level aura matrix for each color channel, a single fuzzy color aura matrix is required to locally characterize the interactions between colors of neighboring pixels. Furthermore, all works about fuzzy gray-level aura matrices consider the same neighborhood function for each site. Another contribution of this paper is to define an adaptive neighborhood function based on information about neighboring sites provided by a pre-segmentation method. For this purpose, we propose a modified simple linear iterative clustering algorithm that incorporates a regional feature in order to partition the image into superpixels. All in all, the proposed color texture image segmentation boils down to a superpixel classification using a simple supervised classifier, each superpixel being characterized by a fuzzy color aura matrix. Experimental results on the Prague texture segmentation benchmark show that our method outperforms the classical state-of-the-art supervised segmentation methods and is similar to recent methods based on deep learning.
Collapse
|
22
|
A Gated Fusion Network for Dynamic Saliency Prediction. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3094974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
23
|
Yan K, Wang X, Kim J, Zuo W, Feng D. Deep Cognitive Gate: Resembling Human Cognition for Saliency Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4776-4792. [PMID: 33755558 DOI: 10.1109/tpami.2021.3068277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Saliency detection by human refers to the ability to identify pertinent information using our perceptive and cognitive capabilities. While human perception is attracted by visual stimuli, our cognitive capability is derived from the inspiration of constructing concepts of reasoning. Saliency detection has gained intensive interest with the aim of resembling human 'perceptual' system. However, saliency related to human 'cognition', particularly the analysis of complex salient regions ('cogitating' process), is yet to be fully exploited. We propose to resemble human cognition, coupled with human perception, to improve saliency detection. We recognize saliency in three phases ('Seeing' - 'Perceiving' - 'Cogitating), mimicking human's perceptive and cognitive thinking of an image. In our method, 'Seeing' phase is related to human perception, and we formulate the 'Perceiving' and 'Cogitating' phases related to the human cognition systems via deep neural networks (DNNs) to construct a new module (Cognitive Gate) that enhances the DNN features for saliency detection. To the best of our knowledge, this is the first work that established DNNs to resemble human cognition for saliency detection. In our experiments, our approach outperformed 17 benchmarking DNN methods on six well-recognized datasets, demonstrating that resembling human cognition improves saliency detection.
Collapse
|
24
|
Xu Z, Yuan X, Zhou T, Fang L. A multichannel optical computing architecture for advanced machine vision. LIGHT, SCIENCE & APPLICATIONS 2022; 11:255. [PMID: 35977940 PMCID: PMC9385649 DOI: 10.1038/s41377-022-00945-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 07/14/2022] [Accepted: 07/21/2022] [Indexed: 06/03/2023]
Abstract
Endowed with the superior computing speed and energy efficiency, optical neural networks (ONNs) have attracted ever-growing attention in recent years. Existing optical computing architectures are mainly single-channel due to the lack of advanced optical connection and interaction operators, solving simple tasks such as hand-written digit classification, saliency detection, etc. The limited computing capacity and scalability of single-channel ONNs restrict the optical implementation of advanced machine vision. Herein, we develop Monet: a multichannel optical neural network architecture for a universal multiple-input multiple-channel optical computing based on a novel projection-interference-prediction framework where the inter- and intra- channel connections are mapped to optical interference and diffraction. In our Monet, optical interference patterns are generated by projecting and interfering the multichannel inputs in a shared domain. These patterns encoding the correspondences together with feature embeddings are iteratively produced through the projection-interference process to predict the final output optically. For the first time, Monet validates that multichannel processing properties can be optically implemented with high-efficiency, enabling real-world intelligent multichannel-processing tasks solved via optical computing, including 3D/motion detections. Extensive experiments on different scenarios demonstrate the effectiveness of Monet in handling advanced machine vision tasks with comparative accuracy as the electronic counterparts yet achieving a ten-fold improvement in computing efficiency. For intelligent computing, the trends of dealing with real-world advanced tasks are irreversible. Breaking the capacity and scalability limitations of single-channel ONN and further exploring the multichannel processing potential of wave optics, we anticipate that the proposed technique will accelerate the development of more powerful optical AI as critical support for modern advanced machine vision.
Collapse
Affiliation(s)
- Zhihao Xu
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, China
| | - Xiaoyun Yuan
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
- Institute for Brain and Cognitive Science, Tsinghua University (THUIBCS), Beijing, China
| | - Tiankuang Zhou
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, China
| | - Lu Fang
- Sigma Laboratory, Department of Electronic Engineering, Tsinghua University, Beijing, China.
- Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China.
- Institute for Brain and Cognitive Science, Tsinghua University (THUIBCS), Beijing, China.
| |
Collapse
|
25
|
Wang Q, Han T, Gao J, Yuan Y. Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3238-3250. [PMID: 33502985 DOI: 10.1109/tnnls.2021.3051371] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning. When it comes to specific tasks, we find that the domain shifts are reflected in model parameters' differences. To describe the domain gap directly at the parameter level, we propose a neuron linear transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters. Finally, the target neuron is generated via a linear transformation. Extensive experiments and analysis on six real-world data sets validate that NLT achieves top performance compared with other domain adaptation methods. An ablation study also shows that the NLT is robust and more effective than supervised and fine-tune training. Code is available at https://github.com/taohan10200/NLT.
Collapse
|
26
|
Marques-Villarroya S, Castillo JC, Gamboa-Montero JJ, Sevilla-Salcedo J, Salichs MA. A Bio-Inspired Endogenous Attention-Based Architecture for a Social Robot. SENSORS (BASEL, SWITZERLAND) 2022; 22:5248. [PMID: 35890931 PMCID: PMC9323278 DOI: 10.3390/s22145248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 12/10/2022]
Abstract
A robust perception system is crucial for natural human-robot interaction. An essential capability of these systems is to provide a rich representation of the robot's environment, typically using multiple sensory sources. Moreover, this information allows the robot to react to both external stimuli and user responses. The novel contribution of this paper is the development of a perception architecture, which was based on the bio-inspired concept of endogenous attention being integrated into a real social robot. In this paper, the architecture is defined at a theoretical level to provide insights into the underlying bio-inspired mechanisms and at a practical level to integrate and test the architecture within the complete architecture of a robot. We also defined mechanisms to establish the most salient stimulus for the detection or task in question. Furthermore, the attention-based architecture uses information from the robot's decision-making system to produce user responses and robot decisions. Finally, this paper also presents the preliminary test results from the integration of this architecture into a real social robot.
Collapse
Affiliation(s)
- Sara Marques-Villarroya
- RoboticsLab, Universidad Carlos III de Madrid, 28911 Leganés, Spain; (J.J.G.-M.); (J.S.-S.); (M.A.S.)
| | - Jose Carlos Castillo
- RoboticsLab, Universidad Carlos III de Madrid, 28911 Leganés, Spain; (J.J.G.-M.); (J.S.-S.); (M.A.S.)
| | | | | | | |
Collapse
|
27
|
Pei J, Zhou T, Tang H, Liu C, Chen C. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03647-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
28
|
Liu Y, Cheng MM, Zhang XY, Nie GY, Wang M. DNA: Deeply Supervised Nonlinear Aggregation for Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6131-6142. [PMID: 33531332 DOI: 10.1109/tcyb.2021.3051350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent progress on salient object detection mainly aims at exploiting how to effectively integrate multiscale convolutional features in convolutional neural networks (CNNs). Many popular methods impose deep supervision to perform side-output predictions that are linearly aggregated for final saliency prediction. In this article, we theoretically and experimentally demonstrate that linear aggregation of side-output predictions is suboptimal, and it only makes limited use of the side-output information obtained by deep supervision. To solve this problem, we propose deeply supervised nonlinear aggregation (DNA) for better leveraging the complementary information of various side-outputs. Compared with existing methods, it: 1) aggregates side-output features rather than predictions and 2) adopts nonlinear instead of linear transformations. Experiments demonstrate that DNA can successfully break through the bottleneck of the current linear approaches. Specifically, the proposed saliency detector, a modified U-Net architecture with DNA, performs favorably against state-of-the-art methods on various datasets and evaluation metrics without bells and whistles.
Collapse
|
29
|
|
30
|
Zhu D, Chen Y, Zhao D, Zhu Y, Zhou Q, Zhai G, Yang X. Multiscale Brain-Like Neural Network for Saliency Prediction on Omnidirectional Images. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3052526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Dandan Zhu
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Yongqing Chen
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Defang Zhao
- School of Software Engineering, Tongji University, Shanghai, China
| | - Yucheng Zhu
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Qiangqiang Zhou
- School of Software, Jiangxi Normal University, Nanchang, China
| | - Guangtao Zhai
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaokang Yang
- MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
31
|
Peng P, Yang KF, Liang SQ, Li YJ. Contour-guided saliency detection with long-range interactions. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
32
|
Spatiotemporal context-aware network for video salient object detection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07330-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
33
|
Martin D, Serrano A, Bergman AW, Wetzstein G, Masia B. ScanGAN360: A Generative Model of Realistic Scanpaths for 360° Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2003-2013. [PMID: 35167469 DOI: 10.1109/tvcg.2022.3150502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Understanding and modeling the dynamics of human gaze behavior in 360° environments is crucial for creating, improving, and developing emerging virtual reality applications. However, recruiting human observers and acquiring enough data to analyze their behavior when exploring virtual environments requires complex hardware and software setups, and can be time-consuming. Being able to generate virtual observers can help overcome this limitation, and thus stands as an open problem in this medium. Particularly, generative adversarial approaches could alleviate this challenge by generating a large number of scanpaths that reproduce human behavior when observing new scenes, essentially mimicking virtual observers. However, existing methods for scanpath generation do not adequately predict realistic scanpaths for 360° images. We present ScanGAN360, a new generative adversarial approach to address this problem. We propose a novel loss function based on dynamic time warping and tailor our network to the specifics of 360° images. The quality of our generated scanpaths outperforms competing approaches by a large margin, and is almost on par with the human baseline. ScanGAN360 allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior, facilitating experimentation, and aiding novel applications in virtual reality and beyond.
Collapse
|
34
|
Lv Z, Zhu S, Wang D, Liang Z. Whole constraint and partial triplet-center loss for infrared-visible re-identification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07276-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
35
|
Zhou L, Wang Y, Lei B, Yang W. Regional Self-Attention Convolutional Neural Network for Facial Expression Recognition. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s0218001422560134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
36
|
Pfeiffer C, Wengeler S, Loquercio A, Scaramuzza D. Visual attention prediction improves performance of autonomous drone racing agents. PLoS One 2022; 17:e0264471. [PMID: 35231038 PMCID: PMC8887736 DOI: 10.1371/journal.pone.0264471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 02/10/2022] [Indexed: 11/18/2022] Open
Abstract
Humans race drones faster than neural networks trained for end-to-end autonomous flight. This may be related to the ability of human pilots to select task-relevant visual information effectively. This work investigates whether neural networks capable of imitating human eye gaze behavior and attention can improve neural networks' performance for the challenging task of vision-based autonomous drone racing. We hypothesize that gaze-based attention prediction can be an efficient mechanism for visual information selection and decision making in a simulator-based drone racing task. We test this hypothesis using eye gaze and flight trajectory data from 18 human drone pilots to train a visual attention prediction model. We then use this visual attention prediction model to train an end-to-end controller for vision-based autonomous drone racing using imitation learning. We compare the drone racing performance of the attention-prediction controller to those using raw image inputs and image-based abstractions (i.e., feature tracks). Comparing success rates for completing a challenging race track by autonomous flight, our results show that the attention-prediction based controller (88% success rate) outperforms the RGB-image (61% success rate) and feature-tracks (55% success rate) controller baselines. Furthermore, visual attention-prediction and feature-track based models showed better generalization performance than image-based models when evaluated on hold-out reference trajectories. Our results demonstrate that human visual attention prediction improves the performance of autonomous vision-based drone racing agents and provides an essential step towards vision-based, fast, and agile autonomous flight that eventually can reach and even exceed human performances.
Collapse
Affiliation(s)
- Christian Pfeiffer
- Robotics and Perception Group, Department of Informatics, University of Zurich, Zurich, Switzerland
- Department of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
- * E-mail:
| | - Simon Wengeler
- Robotics and Perception Group, Department of Informatics, University of Zurich, Zurich, Switzerland
- Department of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Antonio Loquercio
- Robotics and Perception Group, Department of Informatics, University of Zurich, Zurich, Switzerland
- Department of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Davide Scaramuzza
- Robotics and Perception Group, Department of Informatics, University of Zurich, Zurich, Switzerland
- Department of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| |
Collapse
|
37
|
Zhou L, Zhou T, Khan S, Sun H, Shen J, Shao L. Weakly Supervised Visual Saliency Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3111-3124. [PMID: 35380961 DOI: 10.1109/tip.2022.3158064] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The success of current deep saliency models heavily depends on large amounts of annotated human fixation data to fit the highly non-linear mapping between the stimuli and visual saliency. Such fully supervised data-driven approaches are annotation-intensive and often fail to consider the underlying mechanisms of visual attention. In contrast, in this paper, we introduce a model based on various cognitive theories of visual saliency, which learns visual attention patterns in a weakly supervised manner. Our approach incorporates insights from cognitive science as differentiable submodules, resulting in a unified, end-to-end trainable framework. Specifically, our model encapsulates the following important components motivated from biological vision. (a) As scene semantics are closely related to visually attentive regions, our model encodes discriminative spatial information for scene understanding through spatial visual semantics embedding. (b) To model the objectness factors in visual attention deployment, we incorporate object-level semantics embedding and object relation information. (c) Considering the "winner-take-all" mechanism in visual stimuli processing, we model the competition mechanism among objects with softmax based neural attention. (d) Lastly, a conditional center prior is learned to mimic the spatial distribution bias of visual attention. Furthermore, we propose novel loss functions to utilize supervision cues from image-level semantics, saliency prior knowledge, and self-information compression. Experiments show that our method achieves promising results, and even outperforms many of its fully supervised counterparts. Overall, our weakly supervised saliency method makes an essential step towards reducing the annotation budget of current approaches, as well as providing a more comprehensive understanding of the visual attention mechanism. Our code is available at: https://github.com/ashleylqx/WeakFixation.git.
Collapse
|
38
|
Zhang J, Feng X, Wu Q, Yang G, Tao M, Yang Y, He Y. Rice bacterial blight resistant cultivar selection based on visible/near-infrared spectrum and deep learning. PLANT METHODS 2022; 18:49. [PMID: 35428329 PMCID: PMC9013134 DOI: 10.1186/s13007-022-00882-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 03/31/2022] [Indexed: 05/10/2023]
Abstract
BACKGROUND Rice bacterial blight (BB) has caused serious damage in rice yield and quality leading to huge economic loss and food safety problems. Breeding disease resistant cultivar becomes the eco-friendliest and most effective alternative to regulate its outburst, since the propagation of pathogenic bacteria is restrained. However, the BB resistance cultivar selection suffers tremendous labor cost, low efficiency, and subjective human error. And dynamic rice BB phenotyping study is absent from exploring the pattern of BB growth with different genotypes. RESULTS In this paper, with the aim of alleviating the labor burden of plant breeding experts in the resistant cultivar screening processing and exploring the disease resistance phenotyping variation pattern, visible/near-infrared (VIS-NIR) hyperspectral images of rice leaves from three varieties after inoculation were collected and sent into a self-built deep learning model LPnet for disease severity assessment. The growth status of BB lesion at the time scale was fully revealed. On the strength of the attention mechanism inside LPnet, the most informative spectral features related to lesion proportion were further extracted and combined into a novel and refined leaf spectral index. The effectiveness and feasibility of the proposed wavelength combination were verified by identifying the resistant cultivar, assessing the resistant ability, and spectral image visualization. CONCLUSIONS This study illustrated that informative VIS-NIR spectrums coupled with attention deep learning had great potential to not only directly assess disease severity but also excavate spectral characteristics for rapid screening disease resistant cultivars in high-throughput phenotyping.
Collapse
Affiliation(s)
- Jinnuo Zhang
- College of Biosystems Engineering and Food Science, Key Laboratory of Spectroscopy, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou, 310058, China
| | - Xuping Feng
- College of Biosystems Engineering and Food Science, Key Laboratory of Spectroscopy, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou, 310058, China
| | - Qingguan Wu
- College of Biosystems Engineering and Food Science, Key Laboratory of Spectroscopy, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou, 310058, China
| | - Guofeng Yang
- College of Biosystems Engineering and Food Science, Key Laboratory of Spectroscopy, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou, 310058, China
| | - Mingzhu Tao
- College of Biosystems Engineering and Food Science, Key Laboratory of Spectroscopy, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou, 310058, China
| | - Yong Yang
- State Key Laboratory for Managing Biotic and Chemical Treats to the Quality and Safety of Agro-Products, Key Laboratory of Biotechnology for Plant Protection, Ministry of Agriculture, and Rural Affairs, Zhejiang Provincial Key Laboratory of Biotechnology for Plant Protection, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Science, Hangzhou, 310021, China.
| | - Yong He
- College of Biosystems Engineering and Food Science, Key Laboratory of Spectroscopy, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
39
|
A Saliency Prediction Model Based on Re-Parameterization and Channel Attention Mechanism. ELECTRONICS 2022. [DOI: 10.3390/electronics11081180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Deep saliency models can effectively imitate the attention mechanism of human vision, and they perform considerably better than classical models that rely on handcrafted features. However, deep models also require higher-level information, such as context or emotional content, to further approach human performance. Therefore, this study proposes a multilevel saliency prediction network that aims to use a combination of spatial and channel information to find possible high-level features, further improving the performance of a saliency model. Firstly, we use a VGG style network with an identity block as the primary network architecture. With the help of re-parameterization, we can obtain rich features similar to multiscale networks and effectively reduce computational cost. Secondly, a subnetwork with a channel attention mechanism is designed to find potential saliency regions and possible high-level semantic information in an image. Finally, image spatial features and a channel enhancement vector are combined after quantization to improve the overall performance of the model. Compared with classical models and other deep models, our model exhibits superior overall performance.
Collapse
|
40
|
Han Y, Chen X, Zhang S, Qi D. iNL: Implicit non-local network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
41
|
A novel spatiotemporal attention enhanced discriminative network for video salient object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02649-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
42
|
Lu X, Wang W, Shen J, Crandall D, Luo J. Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2228-2242. [PMID: 33232224 DOI: 10.1109/tpami.2020.3040258] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We introduce a novel network, called CO-attention siamese network (COSNet), to address the zero-shot video object segmentation task in a holistic fashion. We exploit the inherent correlation among video frames and incorporate a global co-attention mechanism to further improve the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in COSNet provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. COSNet is a unified and end-to-end trainable framework where different co-attention variants can be derived for capturing diverse properties of the learned joint feature space. We train COSNet with pairs (or groups) of video frames, and this naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. Our extensive experiments over three large benchmarks demonstrate that COSNet outperforms the current alternatives by a large margin. Our implementations are available at https://github.com/carrierlxk/COSNet.
Collapse
|
43
|
Wang Z, Zhou Z, Lu H, Hu Q, Jiang J. Video Saliency Prediction via Joint Discrimination and Local Consistency. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1490-1501. [PMID: 32452797 DOI: 10.1109/tcyb.2020.2989158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
While saliency detection on static images has been widely studied, the research on video saliency detection is still in an early stage and requires more efforts due to the challenge to bring both local and global consistency of salient objects into full consideration. In this article, we propose a novel dynamic saliency network based on both local consistency and global discriminations, via which semantic features across video frames are simultaneously extracted and a recurrent feature optimization structure is designed to further enhance its performances. To ensure that the generated dynamic salient map is more concentrated, we design a lightweight discriminator with a local consistency loss LC to identify subtle differences between predicted maps and ground truths. As a result, the proposed network can be further stimulated to produce more realistic saliency maps with smoother boundaries and simpler layer transitions. The added LC loss forces the network to pay more attention to the local consistency between continuous saliency maps. Both qualitative and quantitative experiments are carried out on three large datasets, and the results demonstrate that our proposed network not only achieves improved performances but also shows good robustness.
Collapse
|
44
|
Che Z, Borji A, Zhai G, Ling S, Li J, Min X, Guo G, Le Callet P. SMGEA: A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1051-1065. [PMID: 33296311 DOI: 10.1109/tnnls.2020.3039295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of source models transfer to other target models and, thus, pose a security threat to black-box applications (when attackers have no access to the target models). Current transfer-based ensemble attacks, however, only consider a limited number of source models to craft an adversarial example and, thus, obtain poor transferability. Besides, recent query-based black-box attacks, which require numerous queries to the target model, not only come under suspicion by the target model but also cause expensive query cost. In this article, we propose a novel transfer-based black-box attack, dubbed serial-minigroup-ensemble-attack (SMGEA). Concretely, SMGEA first divides a large number of pretrained white-box source models into several "minigroups." For each minigroup, we design three new ensemble strategies to improve the intragroup transferability. Moreover, we propose a new algorithm that recursively accumulates the "long-term" gradient memories of the previous minigroup to the subsequent minigroup. This way, the learned adversarial information can be preserved, and the intergroup transferability can be improved. Experiments indicate that SMGEA not only achieves state-of-the-art black-box attack ability over several data sets but also deceives two online black-box saliency prediction systems in real world, i.e., DeepGaze-II (https://deepgaze.bethgelab.org/) and SALICON (http://salicon.net/demo/). Finally, we contribute a new code repository to promote research on adversarial attack and defense over ubiquitous pixel-to-pixel computer vision tasks. We share our code together with the pretrained substitute model zoo at https://github.com/CZHQuality/AAA-Pix2pix.
Collapse
|
45
|
Sun B, Ren Y, Lu X. Semisupervised Consistent Projection Metric Learning for Person Reidentification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:738-747. [PMID: 32310811 DOI: 10.1109/tcyb.2020.2979262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person reidentification is a hot topic in the computer vision field. Many efforts have been paid on modeling a discriminative distance metric. However, existing metric-learning-based methods are a lack of generalization. In this article, the poor generalization of the metric model is argued as the biased estimation problem that the independent identical distribution hypothesis is not valid. The verification experimental result shows that there is a sharp difference between the training and test samples in the metric subspace. A semisupervised consistent projection metric-learning method is proposed to ease the biased estimation problem by learning a consistent constrained metric subspace in which the identified pairs are forced to follow the distribution of the positive training pairs. First, a semisupervised method is proposed to generate potential matching pairs from the k -nearest neighbors of test samples. The potential matching pairs are used to estimate the distances' distribution center of the positive test pairs. Second, the metric subspace is improved by forcing this estimation to be close to the center of the positive training pairs. Finally, extensive experiments are conducted on five datasets and the results demonstrate that the proposed method reaches the best performance, especially on the rank-1 identification rate.
Collapse
|
46
|
Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06895-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
47
|
|
48
|
Graph-based few-shot learning with transformed feature propagation and optimal class allocation. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.110] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
49
|
Wenxiu Wang. Infrared Pedestrian Detection Method Based on Attention Model. PATTERN RECOGNITION AND IMAGE ANALYSIS 2021. [DOI: 10.1134/s1054661821040271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
50
|
A Probabilistic Re-Intepretation of Confidence Scores in Multi-Exit Models. ENTROPY 2021; 24:e24010001. [PMID: 35052027 PMCID: PMC8774619 DOI: 10.3390/e24010001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/11/2021] [Accepted: 12/18/2021] [Indexed: 11/30/2022]
Abstract
In this paper, we propose a new approach to train a deep neural network with multiple intermediate auxiliary classifiers, branching from it. These ‘multi-exits’ models can be used to reduce the inference time by performing early exit on the intermediate branches, if the confidence of the prediction is higher than a threshold. They rely on the assumption that not all the samples require the same amount of processing to yield a good prediction. In this paper, we propose a way to train jointly all the branches of a multi-exit model without hyper-parameters, by weighting the predictions from each branch with a trained confidence score. Each confidence score is an approximation of the real one produced by the branch, and it is calculated and regularized while training the rest of the model. We evaluate our proposal on a set of image classification benchmarks, using different neural models and early-exit stopping criteria.
Collapse
|