1
|
Hernández-Cámara P, Vila-Tomás J, Laparra V, Malo J. Dissecting the effectiveness of deep features as metric of perceptual image quality. Neural Netw 2025; 185:107189. [PMID: 39874824 DOI: 10.1016/j.neunet.2025.107189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/07/2025] [Accepted: 01/15/2025] [Indexed: 01/30/2025]
Abstract
There is an open debate on the role of artificial networks to understand the visual brain. Internal representations of images in artificial networks develop human-like properties. In particular, evaluating distortions using differences between internal features is correlated to human perception of distortion. However, the origins of this correlation are not well understood. Here, we dissect the different factors involved in the emergence of human-like behavior: function, architecture, and environment. To do so, we evaluate the aforementioned human-network correlation at different depths of 46 pre-trained model configurations that include no psycho-visual information. The results show that most of the models correlate better with human opinion than SSIM (a de-facto standard in subjective image quality). Moreover, some models are better than state-of-the-art networks specifically tuned for the application (LPIPS, DISTS). Regarding the function, supervised classification leads to nets that correlate better with humans than the explored models for self- and non-supervised tasks. However, we found that better performance in the task does not imply more human behavior. Regarding the architecture, simpler models correlate better with humans than very deep nets and generally, the highest correlation is not achieved in the last layer. Finally, regarding the environment, training with large natural datasets leads to bigger correlations than training in smaller databases with restricted content, as expected. We also found that the best classification models are not the best for predicting human distances. In the general debate about understanding human vision, our empirical findings imply that explanations have not to be focused on a single abstraction level, but all function, architecture, and environment are relevant.
Collapse
Affiliation(s)
| | - Jorge Vila-Tomás
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Valero Laparra
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Jesús Malo
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| |
Collapse
|
2
|
An NM, Roh H, Kim S, Kim JH, Im M. Machine Learning Techniques for Simulating Human Psychophysical Testing of Low-Resolution Phosphene Face Images in Artificial Vision. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2405789. [PMID: 39985243 PMCID: PMC12005743 DOI: 10.1002/advs.202405789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 01/18/2025] [Indexed: 02/24/2025]
Abstract
To evaluate the quality of artificial visual percepts generated by emerging methodologies, researchers often rely on labor-intensive and tedious human psychophysical experiments. These experiments necessitate repeated iterations upon any major/minor modifications in the hardware/software configurations. Here, the capacity of standard machine learning (ML) models is investigated to accurately replicate quaternary match-to-sample tasks using low-resolution facial images represented by arrays of phosphenes as input stimuli. Initially, the performance of the ML models trained to approximate innate human facial recognition abilities across a dataset comprising 3600 phosphene images of human faces is analyzed. Subsequently, due to the time constraints and the potential for subject fatigue, the psychophysical test is limited to presenting only 720 low-resolution phosphene images to 36 human subjects. Notably, the superior model adeptly mirrors the behavioral trend of human subjects, offering precise predictions for 8 out of 9 phosphene quality levels on the overlapping test queries. Subsequently, human recognition performances for untested phosphene images are predicted, streamlining the process and minimizing the need for additional psychophysical tests. The findings underscore the transformative potential of ML in reshaping the research paradigm of visual prosthetics, facilitating the expedited advancement of prostheses.
Collapse
Affiliation(s)
- Na Min An
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
- Present address:
Kim Jaechul Graduate School of AIKAISTSeoul02455Republic of Korea
| | - Hyeonhee Roh
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
| | - Sein Kim
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
| | - Jae Hun Kim
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
- Sensor System Research CenterAdvanced Materials and Systems Research DivisionKISTSeoul02792Republic of Korea
| | - Maesoon Im
- Brain Science InstituteKorea Institute of Science and Technology (KIST)Seoul02792Republic of Korea
- Division of Bio‐Medical Science and TechnologyUniversity of Science and Technology (UST)Seoul02792Republic of Korea
- KHU‐KIST Department of Converging Science and TechnologyKyung Hee UniversitySeoul02447Republic of Korea
| |
Collapse
|
3
|
Gupta P, Dobs K. Human-like face pareidolia emerges in deep neural networks optimized for face and object recognition. PLoS Comput Biol 2025; 21:e1012751. [PMID: 39869654 PMCID: PMC11790231 DOI: 10.1371/journal.pcbi.1012751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/03/2025] [Accepted: 12/24/2024] [Indexed: 01/29/2025] Open
Abstract
The human visual system possesses a remarkable ability to detect and process faces across diverse contexts, including the phenomenon of face pareidolia--seeing faces in inanimate objects. Despite extensive research, it remains unclear why the visual system employs such broadly tuned face detection capabilities. We hypothesized that face pareidolia results from the visual system's optimization for recognizing both faces and objects. To test this hypothesis, we used task-optimized deep convolutional neural networks (CNNs) and evaluated their alignment with human behavioral signatures and neural responses, measured via magnetoencephalography (MEG), related to pareidolia processing. Specifically, we trained CNNs on tasks involving combinations of face identification, face detection, object categorization, and object detection. Using representational similarity analysis, we found that CNNs that included object categorization in their training tasks represented pareidolia faces, real faces, and matched objects more similarly to neural responses than those that did not. Although these CNNs showed similar overall alignment with neural data, a closer examination of their internal representations revealed that specific training tasks had distinct effects on how pareidolia faces were represented across layers. Finally, interpretability methods revealed that only a CNN trained for both face identification and object categorization relied on face-like features-such as 'eyes'-to classify pareidolia stimuli as faces, mirroring findings in human perception. Our results suggest that human-like face pareidolia may emerge from the visual system's optimization for face identification within the context of generalized object categorization.
Collapse
Affiliation(s)
- Pranjul Gupta
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Katharina Dobs
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain, and Behavior, Universities of Marburg, Giessen and Darmstadt, Marburg, Germany
| |
Collapse
|
4
|
Wu S, Zhou L, Hu Z, Liu J. Hierarchical Context-Based Emotion Recognition With Scene Graphs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:3725-3739. [PMID: 36018874 DOI: 10.1109/tnnls.2022.3196831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
For a better intention inference, we often try to figure out the emotional states of other people in social communications. Many studies on affective computing have been carried out to infer emotions through perceiving human states, i.e., facial expression and body posture. Such methods are skillful in a controlled environment. However, it often leads to misestimation due to the deficiency of effective inputs in unconstrained circumstances, that is, where context-aware emotion recognition appeared. We take inspiration from the advanced reasoning pattern of humans in perceived emotion recognition and propose the hierarchical context-based emotion recognition method with scene graphs. We propose to extract three contexts from the image, i.e., the entity context, the global context, and the scene context. The scene context contains abstract information about entity labels and their relationships. It is similar to the information processing of the human visual sensing mechanism. After that, these contexts are further fused to perform emotion recognition. We carried out a bunch of experiments on the widely used context-aware emotion datasets, i.e., CAER-S, EMOTIC, and BOdy Language Dataset (BoLD). We demonstrate that the hierarchical contexts can benefit emotion recognition by improving the accuracy of the SOTA score from 84.82% to 90.83% on CAER-S. The ablation experiments show that hierarchical contexts provide complementary information. Our method improves the F1 score of the SOTA result from 29.33% to 30.24% (C-F1) on EMOTIC. We also build the image-based emotion recognition task with BoLD-Img from BoLD and obtain a better emotion recognition score (ERS) score of 0.2153.
Collapse
|
5
|
Abstract
Deep neural networks (DNNs) are machine learning algorithms that have revolutionized computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. In this article, we review evidence regarding current DNNs as adequate behavioral models of human core object recognition. To this end, we argue that it is important to distinguish between statistical tools and computational models and to understand model quality as a multidimensional concept in which clarity about modeling goals is key. Reviewing a large number of psychophysical and computational explorations of core object recognition performance in humans and DNNs, we argue that DNNs are highly valuable scientific tools but that, as of today, DNNs should only be regarded as promising-but not yet adequate-computational models of human core object recognition behavior. On the way, we dispel several myths surrounding DNNs in vision science.
Collapse
Affiliation(s)
- Felix A Wichmann
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany;
| | | |
Collapse
|
6
|
Dobs K, Yuan J, Martinez J, Kanwisher N. Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition. Proc Natl Acad Sci U S A 2023; 120:e2220642120. [PMID: 37523537 PMCID: PMC10410721 DOI: 10.1073/pnas.2220642120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 06/08/2023] [Indexed: 08/02/2023] Open
Abstract
Human face recognition is highly accurate and exhibits a number of distinctive and well-documented behavioral "signatures" such as the use of a characteristic representational space, the disproportionate performance cost when stimuli are presented upside down, and the drop in accuracy for faces from races the participant is less familiar with. These and other phenomena have long been taken as evidence that face recognition is "special". But why does human face perception exhibit these properties in the first place? Here, we use deep convolutional neural networks (CNNs) to test the hypothesis that all of these signatures of human face perception result from optimization for the task of face recognition. Indeed, as predicted by this hypothesis, these phenomena are all found in CNNs trained on face recognition, but not in CNNs trained on object recognition, even when additionally trained to detect faces while matching the amount of face experience. To test whether these signatures are in principle specific to faces, we optimized a CNN on car discrimination and tested it on upright and inverted car images. As we found for face perception, the car-trained network showed a drop in performance for inverted vs. upright cars. Similarly, CNNs trained on inverted faces produced an inverted face inversion effect. These findings show that the behavioral signatures of human face perception reflect and are well explained as the result of optimization for the task of face recognition, and that the nature of the computations underlying this task may not be so special after all.
Collapse
Affiliation(s)
- Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen35394, Germany
- Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Marburg35302, Germany
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Joanne Yuan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Julio Martinez
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Psychology, Stanford University, Stanford, CA94305
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
7
|
Vila-Tomás J, Hernández-Cámara P, Malo J. Artificial psychophysics questions classical hue cancellation experiments. Front Neurosci 2023; 17:1208882. [PMID: 37483357 PMCID: PMC10358728 DOI: 10.3389/fnins.2023.1208882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/16/2023] [Indexed: 07/25/2023] Open
Abstract
We show that classical hue cancellation experiments lead to human-like opponent curves even if the task is done by trivial (identity) artificial networks. Specifically, human-like opponent spectral sensitivities always emerge in artificial networks as long as (i) the retina converts the input radiation into any tristimulus-like representation, and (ii) the post-retinal network solves the standard hue cancellation task, e.g. the network looks for the weights of the cancelling lights so that every monochromatic stimulus plus the weighted cancelling lights match a grey reference in the (arbitrary) color representation used by the network. In fact, the specific cancellation lights (and not the network architecture) are key to obtain human-like curves: results show that the classical choice of the lights is the one that leads to the best (more human-like) result, and any other choices lead to progressively different spectral sensitivities. We show this in two ways: through artificial psychophysics using a range of networks with different architectures and a range of cancellation lights, and through a change-of-basis theoretical analogy of the experiments. This suggests that the opponent curves of the classical experiment are just a by-product of the front-end photoreceptors and of a very specific experimental choice but they do not inform about the downstream color representation. In fact, the architecture of the post-retinal network (signal recombination or internal color space) seems irrelevant for the emergence of the curves in the classical experiment. This result in artificial networks questions the conventional interpretation of the classical result in humans by Jameson and Hurvich.
Collapse
|
8
|
Kanwisher N, Khosla M, Dobs K. Using artificial neural networks to ask 'why' questions of minds and brains. Trends Neurosci 2023; 46:240-254. [PMID: 36658072 DOI: 10.1016/j.tins.2022.12.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/29/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Neuroscientists have long characterized the properties and functions of the nervous system, and are increasingly succeeding in answering how brains perform the tasks they do. But the question 'why' brains work the way they do is asked less often. The new ability to optimize artificial neural networks (ANNs) for performance on human-like tasks now enables us to approach these 'why' questions by asking when the properties of networks optimized for a given task mirror the behavioral and neural characteristics of humans performing the same task. Here we highlight the recent success of this strategy in explaining why the visual and auditory systems work the way they do, at both behavioral and neural levels.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Meenakshi Khosla
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University, Giessen, Germany.
| |
Collapse
|
9
|
Kirubeswaran OR, Storrs KR. Inconsistent illusory motion in predictive coding deep neural networks. Vision Res 2023; 206:108195. [PMID: 36801664 DOI: 10.1016/j.visres.2023.108195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 01/31/2023] [Accepted: 01/31/2023] [Indexed: 02/19/2023]
Abstract
Why do we perceive illusory motion in some static images? Several accounts point to eye movements, response latencies to different image elements, or interactions between image patterns and motion energy detectors. Recently PredNet, a recurrent deep neural network (DNN) based on predictive coding principles, was reported to reproduce the "Rotating Snakes" illusion, suggesting a role for predictive coding. We begin by replicating this finding, then use a series of "in silico" psychophysics and electrophysiology experiments to examine whether PredNet behaves consistently with human observers and non-human primate neural data. A pretrained PredNet predicted illusory motion for all subcomponents of the Rotating Snakes pattern, consistent with human observers. However, we found no simple response delays in internal units, unlike evidence from electrophysiological data. PredNet's detection of motion in gradients seemed dependent on contrast, but depends predominantly on luminance in humans. Finally, we examined the robustness of the illusion across ten PredNets of identical architecture, retrained on the same video data. There was large variation across network instances in whether they reproduced the Rotating Snakes illusion, and what motion, if any, they predicted for simplified variants. Unlike human observers, no network predicted motion for greyscale variants of the Rotating Snakes pattern. Our results sound a cautionary note: even when a DNN successfully reproduces some idiosyncrasy of human vision, more detailed investigation can reveal inconsistencies between humans and the network, and between different instances of the same network. These inconsistencies suggest that predictive coding does not reliably give rise to human-like illusory motion.
Collapse
Affiliation(s)
| | - Katherine R Storrs
- Department of Experimental Psychology, Justus Liebig University Giessen, Germany; Centre for Mind, Brain and Behaviour (CMBB), University of Marburg and Justus Liebig University Giessen, Germany; School of Psychology, University of Auckland, New Zealand
| |
Collapse
|
10
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
11
|
Perception without preconception: comparison between the human and machine learner in recognition of tissues from histological sections. Sci Rep 2022; 12:16420. [PMID: 36180472 PMCID: PMC9525725 DOI: 10.1038/s41598-022-20012-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 09/07/2022] [Indexed: 11/08/2022] Open
Abstract
Deep neural networks (DNNs) have shown success in image classification, with high accuracy in recognition of everyday objects. Performance of DNNs has traditionally been measured assuming human accuracy is perfect. In specific problem domains, however, human accuracy is less than perfect and a comparison between humans and machine learning (ML) models can be performed. In recognising everyday objects, humans have the advantage of a lifetime of experience, whereas DNN models are trained only with a limited image dataset. We have tried to compare performance of human learners and two DNN models on an image dataset which is novel to both, i.e. histological images. We thus aim to eliminate the advantage of prior experience that humans have over DNN models in image classification. Ten classes of tissues were randomly selected from the undergraduate first year histology curriculum of a Medical School in North India. Two machine learning (ML) models were developed based on the VGG16 (VML) and Inception V2 (IML) DNNs, using transfer learning, to produce a 10-class classifier. One thousand (1000) images belonging to the ten classes (i.e. 100 images from each class) were split into training (700) and validation (300) sets. After training, the VML and IML model achieved 85.67 and 89% accuracy on the validation set, respectively. The training set was also circulated to medical students (MS) of the college for a week. An online quiz, consisting of a random selection of 100 images from the validation set, was conducted on students (after obtaining informed consent) who volunteered for the study. 66 students participated in the quiz, providing 6557 responses. In addition, we prepared a set of 10 images which belonged to different classes of tissue, not present in training set (i.e. out of training scope or OTS images). A second quiz was conducted on medical students with OTS images, and the ML models were also run on these OTS images. The overall accuracy of MS in the first quiz was 55.14%. The two ML models were also run on the first quiz questionnaire, producing accuracy between 91 and 93%. The ML models scored more than 80% of medical students. Analysis of confusion matrices of both ML models and all medical students showed dissimilar error profiles. However, when comparing the subset of students who achieved similar accuracy as the ML models, the error profile was also similar. Recognition of 'stomach' proved difficult for both humans and ML models. In 04 images in the first quiz set, both VML model and medical students produced highly equivocal responses. Within these images, a pattern of bias was uncovered-the tendency of medical students to misclassify 'liver' tissue. The 'stomach' class proved most difficult for both MS and VML, producing 34.84% of all errors of MS, and 41.17% of all errors of VML model; however, the IML model committed most errors in recognising the 'skin' class (27.5% of all errors). Analysis of the convolution layers of the DNN outlined features in the original image which might have led to misclassification by the VML model. In OTS images, however, the medical students produced better overall score than both ML models, i.e. they successfully recognised patterns of similarity between tissues and could generalise their training to a novel dataset. Our findings suggest that within the scope of training, ML models perform better than 80% medical students with a distinct error profile. However, students who have reached accuracy close to the ML models, tend to replicate the error profile as that of the ML models. This suggests a degree of similarity between how machines and humans extract features from an image. If asked to recognise images outside the scope of training, humans perform better at recognising patterns and likeness between tissues. This suggests that 'training' is not the same as 'learning', and humans can extend their pattern-based learning to different domains outside of the training set.
Collapse
|
12
|
van Dyck LE, Denzler SJ, Gruber WR. Guiding visual attention in deep convolutional neural networks based on human eye movements. Front Neurosci 2022; 16:975639. [PMID: 36177359 PMCID: PMC9514055 DOI: 10.3389/fnins.2022.975639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/25/2022] [Indexed: 11/13/2022] Open
Abstract
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models’ visual attention during object recognition in natural images either toward or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed.
Collapse
Affiliation(s)
- Leonard Elia van Dyck
- Department of Psychology, University of Salzburg, Salzburg, Austria
- Centre for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
- *Correspondence: Leonard Elia van Dyck,
| | | | - Walter Roland Gruber
- Department of Psychology, University of Salzburg, Salzburg, Austria
- Centre for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| |
Collapse
|
13
|
Puebla G, Bowers JS. Can deep convolutional neural networks support relational reasoning in the same-different task? J Vis 2022; 22:11. [PMID: 36094524 PMCID: PMC9482325 DOI: 10.1167/jov.22.10.11] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Same-different visual reasoning is a basic skill central to abstract combinatorial thought. This fact has lead neural networks researchers to test same-different classification on deep convolutional neural networks (DCNNs), which has resulted in a controversy regarding whether this skill is within the capacity of these models. However, most tests of same-different classification rely on testing on images that come from the same pixel-level distribution as the training images, yielding the results inconclusive. In this study, we tested relational same-different reasoning in DCNNs. In a series of simulations we show that models based on the ResNet architecture are capable of visual same-different classification, but only when the test images are similar to the training images at the pixel level. In contrast, when there is a shift in the testing distribution that does not change the relation between the objects in the image, the performance of DCNNs decreases substantially. This finding is true even when the DCNNs’ training regime is expanded to include images taken from a wide range of different pixel-level distributions or when the model is trained on the testing distribution but on a different task in a multitask learning context. Furthermore, we show that the relation network, a deep learning architecture specifically designed to tackle visual relational reasoning problems, suffers the same kind of limitations. Overall, the results of this study suggest that learning same-different relations is beyond the scope of current DCNNs.
Collapse
|
14
|
Gomez-Villa A, Martín A, Vazquez-Corral J, Bertalmío M, Malo J. On the synthesis of visual illusions using deep generative models. J Vis 2022; 22:2. [PMID: 35833884 PMCID: PMC9290318 DOI: 10.1167/jov.22.8.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference.
Collapse
Affiliation(s)
- Alex Gomez-Villa
- Computer Vision Center, Universitat Autónoma de Barcelona, Barcelona, Spain.,
| | - Adrián Martín
- Department of Information and Communications Technologies, Universitat Pompeu Fabra, Barcelona, Spain.,
| | - Javier Vazquez-Corral
- Computer Science Department, Universitat Autónoma de Barcelona and Computer Vision Center, Barcelona, Spain.,
| | | | - Jesús Malo
- Image Processing Lab, Faculty of Physics, Universitat de Valéncia, Spain.,
| |
Collapse
|
15
|
Nicholson DA, Prinz AA. Could simplified stimuli change how the brain performs visual search tasks? A deep neural network study. J Vis 2022; 22:3. [PMID: 35675057 PMCID: PMC9187944 DOI: 10.1167/jov.22.7.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 05/04/2022] [Indexed: 11/24/2022] Open
Abstract
Visual search is a complex behavior influenced by many factors. To control for these factors, many studies use highly simplified stimuli. However, the statistics of these stimuli are very different from the statistics of the natural images that the human visual system is optimized by evolution and experience to perceive. Could this difference change search behavior? If so, simplified stimuli may contribute to effects typically attributed to cognitive processes, such as selective attention. Here we use deep neural networks to test how optimizing models for the statistics of one distribution of images constrains performance on a task using images from a different distribution. We train four deep neural network architectures on one of three source datasets-natural images, faces, and x-ray images-and then adapt them to a visual search task using simplified stimuli. This adaptation produces models that exhibit performance limitations similar to humans, whereas models trained on the search task alone exhibit no such limitations. However, we also find that deep neural networks trained to classify natural images exhibit similar limitations when adapted to a search task that uses a different set of natural images. Therefore, the distribution of data alone cannot explain this effect. We discuss how future work might integrate an optimization-based approach into existing models of visual search behavior.
Collapse
Affiliation(s)
- David A Nicholson
- Emory University, Department of Biology, O. Wayne Rollins Research Center, Atlanta, Georgia
| | - Astrid A Prinz
- Emory University, Department of Biology, O. Wayne Rollins Research Center, Atlanta, Georgia
| |
Collapse
|
16
|
Makino T, Jastrzębski S, Oleszkiewicz W, Chacko C, Ehrenpreis R, Samreen N, Chhor C, Kim E, Lee J, Pysarenko K, Reig B, Toth H, Awal D, Du L, Kim A, Park J, Sodickson DK, Heacock L, Moy L, Cho K, Geras KJ. Differences between human and machine perception in medical diagnosis. Sci Rep 2022; 12:6877. [PMID: 35477730 PMCID: PMC9046399 DOI: 10.1038/s41598-022-10526-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 04/06/2022] [Indexed: 02/07/2023] Open
Abstract
Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since they can fail for reasons unrelated to underlying pathology. Humans are less likely to make such superficial mistakes, since they use features that are grounded on medical science. It is therefore important to know whether DNNs use different features than humans. Towards this end, we propose a framework for comparing human and machine perception in medical diagnosis. We frame the comparison in terms of perturbation robustness, and mitigate Simpson's paradox by performing a subgroup analysis. The framework is demonstrated with a case study in breast cancer screening, where we separately analyze microcalcifications and soft tissue lesions. While it is inconclusive whether humans and DNNs use different features to detect microcalcifications, we find that for soft tissue lesions, DNNs rely on high frequency components ignored by radiologists. Moreover, these features are located outside of the region of the images found most suspicious by radiologists. This difference between humans and machines was only visible through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into the comparison.
Collapse
Affiliation(s)
- Taro Makino
- Center for Data Science, New York University, New York, NY, USA.
- Department of Radiology, NYU Langone Health, New York, NY, USA.
| | - Stanisław Jastrzębski
- Center for Data Science, New York University, New York, NY, USA
- Department of Radiology, NYU Langone Health, New York, NY, USA
- Center for Advanced Imaging Innovation and Research, NYU Langone Health, New York, NY, USA
| | - Witold Oleszkiewicz
- Faculty of Electronics and Information Technology, Warsaw University of Technology, Warszawa, Poland
| | - Celin Chacko
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | | | - Naziya Samreen
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - Chloe Chhor
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - Eric Kim
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - Jiyon Lee
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | | | - Beatriu Reig
- Department of Radiology, NYU Langone Health, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Hildegard Toth
- Department of Radiology, NYU Langone Health, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Divya Awal
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - Linda Du
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - Alice Kim
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - James Park
- Department of Radiology, NYU Langone Health, New York, NY, USA
| | - Daniel K Sodickson
- Department of Radiology, NYU Langone Health, New York, NY, USA
- Center for Advanced Imaging Innovation and Research, NYU Langone Health, New York, NY, USA
- Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Laura Heacock
- Department of Radiology, NYU Langone Health, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Linda Moy
- Department of Radiology, NYU Langone Health, New York, NY, USA
- Center for Advanced Imaging Innovation and Research, NYU Langone Health, New York, NY, USA
- Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Kyunghyun Cho
- Center for Data Science, New York University, New York, NY, USA
- Department of Computer Science, Courant Institute, New York University, New York, NY, USA
| | - Krzysztof J Geras
- Center for Data Science, New York University, New York, NY, USA.
- Department of Radiology, NYU Langone Health, New York, NY, USA.
- Center for Advanced Imaging Innovation and Research, NYU Langone Health, New York, NY, USA.
- Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
17
|
Kobayashi T, Kitaoka A, Kosaka M, Tanaka K, Watanabe E. Motion illusion-like patterns extracted from photo and art images using predictive deep neural networks. Sci Rep 2022; 12:3893. [PMID: 35273206 PMCID: PMC8913633 DOI: 10.1038/s41598-022-07438-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 02/18/2022] [Indexed: 11/09/2022] Open
Abstract
In our previous study, we successfully reproduced the illusory motion perceived in the rotating snakes illusion using deep neural networks incorporating predictive coding theory. In the present study, we further examined the properties of the network using a set of 1500 images, including ordinary static images of paintings and photographs and images of various types of motion illusions. Results showed that the networks clearly classified a group of illusory images and others and reproduced illusory motions against various types of illusions similar to human perception. Notably, the networks occasionally detected anomalous motion vectors, even in ordinally static images where humans were unable to perceive any illusory motion. Additionally, illusion-like designs with repeating patterns were generated using areas where anomalous vectors were detected, and psychophysical experiments were conducted, in which illusory motion perception in the generated designs was detected. The observed inaccuracy of the networks will provide useful information for further understanding information processing associated with human vision.
Collapse
Affiliation(s)
- Taisuke Kobayashi
- Laboratory of Neurophysiology, National Institute for Basic Biology, Higashiyama 5-1, Myodaiji-cho, Okazaki, Aichi, 444-8787, Japan.
| | - Akiyoshi Kitaoka
- College of Comprehensive Psychology, Ritsumeikan University, Iwakura-cho 2-150, Ibaraki, Osaka, 567-8570, Japan
| | - Manabu Kosaka
- Code_monsters group, Laboratory of Neurophysiology, National Institute for Basic Biology, Higashiyama 5-1, Myodaiji-cho, Okazaki, Aichi, 444-8787, Japan
| | - Kenta Tanaka
- Code_monsters group, Laboratory of Neurophysiology, National Institute for Basic Biology, Higashiyama 5-1, Myodaiji-cho, Okazaki, Aichi, 444-8787, Japan
| | - Eiji Watanabe
- Laboratory of Neurophysiology, National Institute for Basic Biology, Higashiyama 5-1, Myodaiji-cho, Okazaki, Aichi, 444-8787, Japan. .,Department of Basic Biology, The Graduate University for Advanced Studies (SOKENDAI), Miura, Kanagawa, 240-0193, Japan.
| |
Collapse
|
18
|
Vaishnav M, Cadene R, Alamia A, Linsley D, VanRullen R, Serre T. Understanding the Computational Demands Underlying Visual Reasoning. Neural Comput 2022; 34:1075-1099. [PMID: 35231926 DOI: 10.1162/neco_a_01485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/07/2021] [Indexed: 11/04/2022]
Abstract
Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the synthetic visual reasoning test (SVRT) challenge, a collection of 23 visual reasoning problems. Our analysis reveals a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different versus spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in humans' visual reasoning ability. To test this hypothesis, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of experiments, we evaluated the ability of these attention networks to learn to solve the SVRT challenge and found the resulting architectures to be much more efficient at solving the hardest of these visual reasoning tasks. Most important, the corresponding improvements on individual tasks partially explained our novel taxonomy. Overall, this work provides a granular computational account of visual reasoning and yields testable neuroscience predictions regarding the differential need for feature-based versus spatial attention depending on the type of visual reasoning problem.
Collapse
Affiliation(s)
- Mohit Vaishnav
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, 31052 Toulose, France.,Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| | - Remi Cadene
- Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| | - Andrea Alamia
- Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse, 31052 Toulouse, France
| | - Drew Linsley
- Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| | - Rufin VanRullen
- Artificial and Natural Intelligence, Toulouse Institute, Université de Toulouse, and Centre de Recherche Cerveau et Cognition, CNRS, Université de Toulouse, 31052 Toulouse, France
| | - Thomas Serre
- Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, 31052 Toulouse, France.,Carney Institute for Brain Science, Department of Cognitive Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, U.S.A.
| |
Collapse
|
19
|
Thompson JAF. Noise increases the correspondence between artificial and human vision. PLoS Biol 2021; 19:e3001477. [PMID: 34890404 PMCID: PMC8664186 DOI: 10.1371/journal.pbio.3001477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
This Primer explores the implications of a recent PLOS Biology study, arguing that noise-robustness, a property of human vision that standard computer vision models fail to mimic, provides an opportunity to probe the neural mechanisms underlying visual object recognition and refine computational models of the ventral visual stream.
Collapse
Affiliation(s)
- Jessica A. F. Thompson
- Human Information Processing Lab, Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
20
|
Daube C, Xu T, Zhan J, Webb A, Ince RA, Garrod OG, Schyns PG. Grounding deep neural network predictions of human categorization behavior in understandable functional features: The case of face identity. PATTERNS (NEW YORK, N.Y.) 2021; 2:100348. [PMID: 34693374 PMCID: PMC8515012 DOI: 10.1016/j.patter.2021.100348] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/30/2020] [Accepted: 08/20/2021] [Indexed: 01/24/2023]
Abstract
Deep neural networks (DNNs) can resolve real-world categorization tasks with apparent human-level performance. However, true equivalence of behavioral performance between humans and their DNN models requires that their internal mechanisms process equivalent features of the stimulus. To develop such feature equivalence, our methodology leveraged an interpretable and experimentally controlled generative model of the stimuli (realistic three-dimensional textured faces). Humans rated the similarity of randomly generated faces to four familiar identities. We predicted these similarity ratings from the activations of five DNNs trained with different optimization objectives. Using information theoretic redundancy, reverse correlation, and the testing of generalization gradients, we show that DNN predictions of human behavior improve because their shape and texture features overlap with those that subsume human behavior. Thus, we must equate the functional features that subsume the behavioral performances of the brain and its models before comparing where, when, and how these features are processed.
Collapse
Affiliation(s)
- Christoph Daube
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Tian Xu
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, England, UK
| | - Jiayu Zhan
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Andrew Webb
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Robin A.A. Ince
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Oliver G.B. Garrod
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Philippe G. Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| |
Collapse
|
21
|
Abstract
Conceptual abstraction and analogy-making are key abilities underlying humans' abilities to learn, reason, and robustly adapt their knowledge to new domains. Despite a long history of research on constructing artificial intelligence (AI) systems with these abilities, no current AI system is anywhere close to a capability of forming humanlike abstractions or analogies. This paper reviews the advantages and limitations of several approaches toward this goal, including symbolic methods, deep learning, and probabilistic program induction. The paper concludes with several proposals for designing challenge tasks and evaluation measures in order to make quantifiable and generalizable progress in this area.
Collapse
|
22
|
Unsupervised learning predicts human perception and misperception of gloss. Nat Hum Behav 2021; 5:1402-1417. [PMID: 33958744 PMCID: PMC8526360 DOI: 10.1038/s41562-021-01097-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 03/09/2021] [Indexed: 02/01/2023]
Abstract
Reflectance, lighting and geometry combine in complex ways to create images. How do we disentangle these to perceive individual properties, such as surface glossiness? We suggest that brains disentangle properties by learning to model statistical structure in proximal images. To test this hypothesis, we trained unsupervised generative neural networks on renderings of glossy surfaces and compared their representations with human gloss judgements. The networks spontaneously cluster images according to distal properties such as reflectance and illumination, despite receiving no explicit information about these properties. Intriguingly, the resulting representations also predict the specific patterns of ‘successes’ and ‘errors’ in human perception. Linearly decoding specular reflectance from the model’s internal code predicts human gloss perception better than ground truth, supervised networks or control models, and it predicts, on an image-by-image basis, illusions of gloss perception caused by interactions between material, shape and lighting. Unsupervised learning may underlie many perceptual dimensions in vision and beyond. Storrs et al. train unsupervised generative neural networks on glossy surfaces and show how gloss perception in humans may emerge in an unsupervised fashion from learning to model statistical structure.
Collapse
|