1
|
Hernández-Cámara P, Vila-Tomás J, Laparra V, Malo J. Dissecting the effectiveness of deep features as metric of perceptual image quality. Neural Netw 2025; 185:107189. [PMID: 39874824 DOI: 10.1016/j.neunet.2025.107189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/07/2025] [Accepted: 01/15/2025] [Indexed: 01/30/2025]
Abstract
There is an open debate on the role of artificial networks to understand the visual brain. Internal representations of images in artificial networks develop human-like properties. In particular, evaluating distortions using differences between internal features is correlated to human perception of distortion. However, the origins of this correlation are not well understood. Here, we dissect the different factors involved in the emergence of human-like behavior: function, architecture, and environment. To do so, we evaluate the aforementioned human-network correlation at different depths of 46 pre-trained model configurations that include no psycho-visual information. The results show that most of the models correlate better with human opinion than SSIM (a de-facto standard in subjective image quality). Moreover, some models are better than state-of-the-art networks specifically tuned for the application (LPIPS, DISTS). Regarding the function, supervised classification leads to nets that correlate better with humans than the explored models for self- and non-supervised tasks. However, we found that better performance in the task does not imply more human behavior. Regarding the architecture, simpler models correlate better with humans than very deep nets and generally, the highest correlation is not achieved in the last layer. Finally, regarding the environment, training with large natural datasets leads to bigger correlations than training in smaller databases with restricted content, as expected. We also found that the best classification models are not the best for predicting human distances. In the general debate about understanding human vision, our empirical findings imply that explanations have not to be focused on a single abstraction level, but all function, architecture, and environment are relevant.
Collapse
Affiliation(s)
| | - Jorge Vila-Tomás
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Valero Laparra
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| | - Jesús Malo
- Image Processing Lab., Universitat de València, 46980 Paterna, Spain.
| |
Collapse
|
2
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgments of natural stimuli. Sci Rep 2025; 15:5316. [PMID: 39939679 PMCID: PMC11821992 DOI: 10.1038/s41598-025-88910-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/31/2025] [Indexed: 02/14/2025] Open
Abstract
In natural visually guided behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on blank backgrounds. Natural images, however, contain task-irrelevant background elements that might interfere with the perception of object features. Recent studies suggest that visual feature estimation can be modeled through the linear decoding of task-relevant information from visual cortex. So, if the representations of task-relevant and irrelevant features are not orthogonal in the neural population, then variation in the task-irrelevant features would impair task performance. We tested this hypothesis using human psychophysics and monkey neurophysiology combined with parametrically variable naturalistic stimuli. We demonstrate that (1) the neural representation of one feature (the position of an object) in visual area V4 is orthogonal to those of several background features, (2) the ability of human observers to precisely judge object position was largely unaffected by those background features, and (3) many features of the object and the background (and of objects from a separate stimulus set) are orthogonally represented in V4 neural population responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of object features despite the richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - Amy M Ni
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marlene R Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL, 60637, USA
| | - David H Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
3
|
Hernández-Cámara P, Daudén-Oliver P, Laparra V, Malo J. Alignment of color discrimination in humans and image segmentation networks. Front Psychol 2024; 15:1415958. [PMID: 39507086 PMCID: PMC11538077 DOI: 10.3389/fpsyg.2024.1415958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024] Open
Abstract
The experiments allowed by current machine learning models imply a revival of the debate on the causes of specific trends of human visual psychophysics. Machine learning facilitates the exploration of the effect of specific visual goals (such as image segmentation) by different neural architectures in different statistical environments in an unprecedented manner. In this way, (1) the principles behind psychophysical facts such as the non-Euclidean nature of human color discrimination and (2) the emergence of human-like behaviour in artificial systems can be explored under a new light. In this work, we show for the first time that the tolerance or invariance of image segmentation networks for natural images under changes of illuminant in the color space (a sort of insensitivity region around the white) is an ellipsoid oriented similarly to a (human) MacAdam ellipse. This striking similarity between an artificial system and human vision motivates a set of experiments checking the relevance of the statistical environment on the emergence of such insensitivity regions. Results suggest, that in this case, the statistics of the environment may be more relevant than the architecture selected to perform the image segmentation.
Collapse
|
4
|
Bertalmío M, Durán Vizcaíno A, Malo J, Wichmann FA. Plaid masking explained with input-dependent dendritic nonlinearities. Sci Rep 2024; 14:24856. [PMID: 39438555 PMCID: PMC11496684 DOI: 10.1038/s41598-024-75471-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
A serious obstacle for understanding early spatial vision comes from the failure of the so-called standard model (SM) to predict the perception of plaid masking. But the SM originated from a major oversimplification of single neuron computations, ignoring fundamental properties of dendrites. Here we show that a spatial vision model including computations mimicking the input-dependent nature of dendritic nonlinearities, i.e. including nonlinear neural summation, has the potential to explain plaid masking data.
Collapse
Affiliation(s)
| | | | - Jesús Malo
- Universitat de València, València, Spain
| | | |
Collapse
|
5
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgements of natural stimuli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580134. [PMID: 38464018 PMCID: PMC10925131 DOI: 10.1101/2024.02.14.580134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
In natural behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on simple backgrounds. Natural viewing, however, carries a set of challenges that are inaccessible using artificial stimuli, including neural responses to background objects that are task-irrelevant. An emerging body of evidence suggests that the visual abilities of humans and animals can be modeled through the linear decoding of task-relevant information from visual cortex. This idea suggests the hypothesis that irrelevant features of a natural scene should impair performance on a visual task only if their neural representations intrude on the linear readout of the task relevant feature, as would occur if the representations of task-relevant and irrelevant features are not orthogonal in the underlying neural population. We tested this hypothesis using human psychophysics and monkey neurophysiology, in response to parametrically variable naturalistic stimuli. We demonstrate that 1) the neural representation of one feature (the position of a central object) in visual area V4 is orthogonal to those of several background features, 2) the ability of human observers to precisely judge object position was largely unaffected by task-irrelevant variation in those background features, and 3) many features of the object and the background are orthogonally represented by V4 neural responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of objects and features despite the tremendous richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Amy M. Ni
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marlene R. Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- equal contribution
| | - David H. Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
- equal contribution
| |
Collapse
|
6
|
Vila-Tomás J, Hernández-Cámara P, Malo J. Artificial psychophysics questions classical hue cancellation experiments. Front Neurosci 2023; 17:1208882. [PMID: 37483357 PMCID: PMC10358728 DOI: 10.3389/fnins.2023.1208882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/16/2023] [Indexed: 07/25/2023] Open
Abstract
We show that classical hue cancellation experiments lead to human-like opponent curves even if the task is done by trivial (identity) artificial networks. Specifically, human-like opponent spectral sensitivities always emerge in artificial networks as long as (i) the retina converts the input radiation into any tristimulus-like representation, and (ii) the post-retinal network solves the standard hue cancellation task, e.g. the network looks for the weights of the cancelling lights so that every monochromatic stimulus plus the weighted cancelling lights match a grey reference in the (arbitrary) color representation used by the network. In fact, the specific cancellation lights (and not the network architecture) are key to obtain human-like curves: results show that the classical choice of the lights is the one that leads to the best (more human-like) result, and any other choices lead to progressively different spectral sensitivities. We show this in two ways: through artificial psychophysics using a range of networks with different architectures and a range of cancellation lights, and through a change-of-basis theoretical analogy of the experiments. This suggests that the opponent curves of the classical experiment are just a by-product of the front-end photoreceptors and of a very specific experimental choice but they do not inform about the downstream color representation. In fact, the architecture of the post-retinal network (signal recombination or internal color space) seems irrelevant for the emergence of the curves in the classical experiment. This result in artificial networks questions the conventional interpretation of the classical result in humans by Jameson and Hurvich.
Collapse
|
7
|
Abstract
The THINGS database is a freely available stimulus set that has the potential to facilitate the generation of theory that bridges multiple areas within cognitive neuroscience. The database consists of 26,107 high quality digital photos that are sorted into 1,854 concepts. While a valuable resource, relatively few technical details relevant to the design of studies in cognitive neuroscience have been described. We present an analysis of two key low-level properties of THINGS images, luminance and luminance contrast. These image statistics are known to influence common physiological and neural correlates of perceptual and cognitive processes. In general, we found that the distributions of luminance and contrast are in close agreement with the statistics of natural images reported previously. However, we found that image concepts are separable in their luminance and contrast: we show that luminance and contrast alone are sufficient to classify images into their concepts with above chance accuracy. We describe how these factors may confound studies using the THINGS images, and suggest simple controls that can be implemented a priori or post-hoc. We discuss the importance of using such natural images as stimuli in psychological research.
Collapse
Affiliation(s)
- William J Harrison
- Queensland Brain Institute and School of Psychology, 1974The University of Queensland
| |
Collapse
|
8
|
Malo J. Spatio-chromatic information available from different neural layers via Gaussianization. JOURNAL OF MATHEMATICAL NEUROSCIENCE 2020; 10:18. [PMID: 33175257 PMCID: PMC7658285 DOI: 10.1186/s13408-020-00095-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 10/22/2020] [Indexed: 06/11/2023]
Abstract
How much visual information about the retinal images can be extracted from the different layers of the visual pathway? This question depends on the complexity of the visual input, the set of transforms applied to this multivariate input, and the noise of the sensors in the considered layer. Separate subsystems (e.g. opponent channels, spatial filters, nonlinearities of the texture sensors) have been suggested to be organized for optimal information transmission. However, the efficiency of these different layers has not been measured when they operate together on colorimetrically calibrated natural images and using multivariate information-theoretic units over the joint spatio-chromatic array of responses.In this work, we present a statistical tool to address this question in an appropriate (multivariate) way. Specifically, we propose an empirical estimate of the information transmitted by the system based on a recent Gaussianization technique. The total correlation measured using the proposed estimator is consistent with predictions based on the analytical Jacobian of a standard spatio-chromatic model of the retina-cortex pathway. If the noise at certain representation is proportional to the dynamic range of the response, and one assumes sensors of equivalent noise level, then transmitted information shows the following trends: (1) progressively deeper representations are better in terms of the amount of captured information, (2) the transmitted information up to the cortical representation follows the probability of natural scenes over the chromatic and achromatic dimensions of the stimulus space, (3) the contribution of spatial transforms to capture visual information is substantially greater than the contribution of chromatic transforms, and (4) nonlinearities of the responses contribute substantially to the transmitted information but less than the linear transforms.
Collapse
Affiliation(s)
- Jesús Malo
- Image Processing Lab, Universitat de València, Catedrático Escardino, 46980, Valencia, Paterna, Spain.
| |
Collapse
|
9
|
Canonical Retina-to-Cortex Vision Model Ready for Automatic Differentiation. Brain Inform 2020. [DOI: 10.1007/978-3-030-59277-6_30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|