1
|
Laparra V, Johnson JE, Camps-Valls G, Santos-Rodriguez R, Malo J. Estimating Information Theoretic Measures via Multidimensional Gaussianization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1293-1308. [PMID: 39527441 DOI: 10.1109/tpami.2024.3495827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Information theory is an outstanding framework for measuring uncertainty, dependence, and relevance in data and systems. It has several desirable properties for real-world applications: naturally deals with multivariate data, can handle heterogeneous data, and the measures can be interpreted. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. We propose an indirect way of estimating information based on a multivariate iterative Gaussianization transform. The proposed method has a multivariate-to-univariate property: it reduces the challenging estimation of multivariate measures to a composition of marginal operations applied in each iteration of the Gaussianization. Therefore, the convergence of the resulting estimates depends on the convergence of well-understood univariate entropy estimates, and the global error linearly depends on the number of times the marginal estimator is invoked. We introduce Gaussianization-based estimates for Total Correlation, Entropy, Mutual Information, and Kullback-Leibler Divergence. Results on artificial data show that our approach is superior to previous estimators, particularly in high-dimensional scenarios. We also illustrate the method's performance in different fields to obtain interesting insights. We make the tools and datasets publicly available to provide a test bed for analyzing future methodologies.
Collapse
|
2
|
Hernández-Cámara P, Daudén-Oliver P, Laparra V, Malo J. Alignment of color discrimination in humans and image segmentation networks. Front Psychol 2024; 15:1415958. [PMID: 39507086 PMCID: PMC11538077 DOI: 10.3389/fpsyg.2024.1415958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024] Open
Abstract
The experiments allowed by current machine learning models imply a revival of the debate on the causes of specific trends of human visual psychophysics. Machine learning facilitates the exploration of the effect of specific visual goals (such as image segmentation) by different neural architectures in different statistical environments in an unprecedented manner. In this way, (1) the principles behind psychophysical facts such as the non-Euclidean nature of human color discrimination and (2) the emergence of human-like behaviour in artificial systems can be explored under a new light. In this work, we show for the first time that the tolerance or invariance of image segmentation networks for natural images under changes of illuminant in the color space (a sort of insensitivity region around the white) is an ellipsoid oriented similarly to a (human) MacAdam ellipse. This striking similarity between an artificial system and human vision motivates a set of experiments checking the relevance of the statistical environment on the emergence of such insensitivity regions. Results suggest, that in this case, the statistics of the environment may be more relevant than the architecture selected to perform the image segmentation.
Collapse
|
3
|
Vila-Tomás J, Hernández-Cámara P, Malo J. Artificial psychophysics questions classical hue cancellation experiments. Front Neurosci 2023; 17:1208882. [PMID: 37483357 PMCID: PMC10358728 DOI: 10.3389/fnins.2023.1208882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/16/2023] [Indexed: 07/25/2023] Open
Abstract
We show that classical hue cancellation experiments lead to human-like opponent curves even if the task is done by trivial (identity) artificial networks. Specifically, human-like opponent spectral sensitivities always emerge in artificial networks as long as (i) the retina converts the input radiation into any tristimulus-like representation, and (ii) the post-retinal network solves the standard hue cancellation task, e.g. the network looks for the weights of the cancelling lights so that every monochromatic stimulus plus the weighted cancelling lights match a grey reference in the (arbitrary) color representation used by the network. In fact, the specific cancellation lights (and not the network architecture) are key to obtain human-like curves: results show that the classical choice of the lights is the one that leads to the best (more human-like) result, and any other choices lead to progressively different spectral sensitivities. We show this in two ways: through artificial psychophysics using a range of networks with different architectures and a range of cancellation lights, and through a change-of-basis theoretical analogy of the experiments. This suggests that the opponent curves of the classical experiment are just a by-product of the front-end photoreceptors and of a very specific experimental choice but they do not inform about the downstream color representation. In fact, the architecture of the post-retinal network (signal recombination or internal color space) seems irrelevant for the emergence of the curves in the classical experiment. This result in artificial networks questions the conventional interpretation of the classical result in humans by Jameson and Hurvich.
Collapse
|
4
|
Information Flow in Biological Networks for Color Vision. ENTROPY 2022; 24:1442. [PMCID: PMC9601526 DOI: 10.3390/e24101442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/03/2022] [Indexed: 06/17/2023]
Abstract
Biological neural networks for color vision (also known as color appearance models) consist of a cascade of linear + nonlinear layers that modify the linear measurements at the retinal photo-receptors leading to an internal (nonlinear) representation of color that correlates with psychophysical experience. The basic layers of these networks include: (1) chromatic adaptation (normalization of the mean and covariance of the color manifold); (2) change to opponent color channels (PCA-like rotation in the color space); and (3) saturating nonlinearities to obtain perceptually Euclidean color representations (similar to dimension-wise equalization). The Efficient Coding Hypothesis argues that these transforms should emerge from information-theoretic goals. In case this hypothesis holds in color vision, the question is what is the coding gain due to the different layers of the color appearance networks? In this work, a representative family of color appearance models is analyzed in terms of how the redundancy among the chromatic components is modified along the network and how much information is transferred from the input data to the noisy response. The proposed analysis is performed using data and methods that were not available before: (1) new colorimetrically calibrated scenes in different CIE illuminations for the proper evaluation of chromatic adaptation; and (2) new statistical tools to estimate (multivariate) information-theoretic quantities between multidimensional sets based on Gaussianization. The results confirm that the efficient coding hypothesis holds for current color vision models, and identify the psychophysical mechanisms critically responsible for gains in information transference: opponent channels and their nonlinear nature are more important than chromatic adaptation at the retina.
Collapse
|
5
|
Gomez-Villa A, Martín A, Vazquez-Corral J, Bertalmío M, Malo J. On the synthesis of visual illusions using deep generative models. J Vis 2022; 22:2. [PMID: 35833884 PMCID: PMC9290318 DOI: 10.1167/jov.22.8.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference.
Collapse
Affiliation(s)
- Alex Gomez-Villa
- Computer Vision Center, Universitat Autónoma de Barcelona, Barcelona, Spain.,
| | - Adrián Martín
- Department of Information and Communications Technologies, Universitat Pompeu Fabra, Barcelona, Spain.,
| | - Javier Vazquez-Corral
- Computer Science Department, Universitat Autónoma de Barcelona and Computer Vision Center, Barcelona, Spain.,
| | | | - Jesús Malo
- Image Processing Lab, Faculty of Physics, Universitat de Valéncia, Spain.,
| |
Collapse
|
6
|
Li L, Guedj B. Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly. ENTROPY 2021; 23:e23111534. [PMID: 34828234 PMCID: PMC8622390 DOI: 10.3390/e23111534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 10/23/2021] [Accepted: 11/01/2021] [Indexed: 11/16/2022]
Abstract
When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data.
Collapse
Affiliation(s)
- Le Li
- Department of Statistics, Central China Normal University, Wuhan 430079, China;
| | - Benjamin Guedj
- Inria, Lille-Nord Europe Research Centre and Inria London, France and Centre for Artificial Intelligence, Department of Computer Science, University College London, London WC1V 6LJ, UK
- Correspondence:
| |
Collapse
|
7
|
Malo J. Spatio-chromatic information available from different neural layers via Gaussianization. JOURNAL OF MATHEMATICAL NEUROSCIENCE 2020; 10:18. [PMID: 33175257 PMCID: PMC7658285 DOI: 10.1186/s13408-020-00095-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 10/22/2020] [Indexed: 06/11/2023]
Abstract
How much visual information about the retinal images can be extracted from the different layers of the visual pathway? This question depends on the complexity of the visual input, the set of transforms applied to this multivariate input, and the noise of the sensors in the considered layer. Separate subsystems (e.g. opponent channels, spatial filters, nonlinearities of the texture sensors) have been suggested to be organized for optimal information transmission. However, the efficiency of these different layers has not been measured when they operate together on colorimetrically calibrated natural images and using multivariate information-theoretic units over the joint spatio-chromatic array of responses.In this work, we present a statistical tool to address this question in an appropriate (multivariate) way. Specifically, we propose an empirical estimate of the information transmitted by the system based on a recent Gaussianization technique. The total correlation measured using the proposed estimator is consistent with predictions based on the analytical Jacobian of a standard spatio-chromatic model of the retina-cortex pathway. If the noise at certain representation is proportional to the dynamic range of the response, and one assumes sensors of equivalent noise level, then transmitted information shows the following trends: (1) progressively deeper representations are better in terms of the amount of captured information, (2) the transmitted information up to the cortical representation follows the probability of natural scenes over the chromatic and achromatic dimensions of the stimulus space, (3) the contribution of spatial transforms to capture visual information is substantially greater than the contribution of chromatic transforms, and (4) nonlinearities of the responses contribute substantially to the transmitted information but less than the linear transforms.
Collapse
Affiliation(s)
- Jesús Malo
- Image Processing Lab, Universitat de València, Catedrático Escardino, 46980, Valencia, Paterna, Spain.
| |
Collapse
|
8
|
Gomez-Villa A, Martín A, Vazquez-Corral J, Bertalmío M, Malo J. Color illusions also deceive CNNs for low-level vision tasks: Analysis and implications. Vision Res 2020; 176:156-174. [PMID: 32896717 DOI: 10.1016/j.visres.2020.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 07/10/2020] [Accepted: 07/22/2020] [Indexed: 11/18/2022]
Abstract
The study of visual illusions has proven to be a very useful approach in vision science. In this work we start by showing that, while convolutional neural networks (CNNs) trained for low-level visual tasks in natural images may be deceived by brightness and color illusions, some network illusions can be inconsistent with the perception of humans. Next, we analyze where these similarities and differences may come from. On one hand, the proposed linear eigenanalysis explains the overall similarities: in simple CNNs trained for tasks like denoising or deblurring, the linear version of the network has center-surround receptive fields, and global transfer functions are very similar to the human achromatic and chromatic contrast sensitivity functions in human-like opponent color spaces. These similarities are consistent with the long-standing hypothesis that considers low-level visual illusions as a by-product of the optimization to natural environments. Specifically, here human-like features emerge from error minimization. On the other hand, the observed differences must be due to the behavior of the human visual system not explained by the linear approximation. However, our study also shows that more 'flexible' network architectures, with more layers and a higher degree of nonlinearity, may actually have a worse capability of reproducing visual illusions. This implies, in line with other works in the vision science literature, a word of caution on using CNNs to study human vision: on top of the intrinsic limitations of the L + NL formulation of artificial networks to model vision, the nonlinear behavior of flexible architectures may easily be markedly different from that of the visual system.
Collapse
Affiliation(s)
- A Gomez-Villa
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - A Martín
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - J Vazquez-Corral
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - M Bertalmío
- Dept. Inf. Comm. Tech., Universitat Pompeu Fabra, Barcelona, Spain.
| | - J Malo
- Image Proc., Lab, Universitat de València, València, Spain.
| |
Collapse
|
9
|
Martinez-Garcia M, Bertalmío M, Malo J. In Praise of Artifice Reloaded: Caution With Natural Image Databases in Modeling Vision. Front Neurosci 2019; 13:8. [PMID: 30894796 PMCID: PMC6414813 DOI: 10.3389/fnins.2019.00008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 01/07/2019] [Indexed: 11/13/2022] Open
Abstract
Subjective image quality databases are a major source of raw data on how the visual system works in naturalistic environments. These databases describe the sensitivity of many observers to a wide range of distortions of different nature and intensity seen on top of a variety of natural images. Data of this kind seems to open a number of possibilities for the vision scientist to check the models in realistic scenarios. However, while these natural databases are great benchmarks for models developed in some other way (e.g., by using the well-controlled artificial stimuli of traditional psychophysics), they should be carefully used when trying to fit vision models. Given the high dimensionality of the image space, it is very likely that some basic phenomena are under-represented in the database. Therefore, a model fitted on these large-scale natural databases will not reproduce these under-represented basic phenomena that could otherwise be easily illustrated with well selected artificial stimuli. In this work we study a specific example of the above statement. A standard cortical model using wavelets and divisive normalization tuned to reproduce subjective opinion on a large image quality dataset fails to reproduce basic cross-masking. Here we outline a solution for this problem by using artificial stimuli and by proposing a modification that makes the model easier to tune. Then, we show that the modified model is still competitive in the large-scale database. Our simulations with these artificial stimuli show that when using steerable wavelets, the conventional unit norm Gaussian kernels in divisive normalization should be multiplied by high-pass filters to reproduce basic trends in masking. Basic visual phenomena may be misrepresented in large natural image datasets but this can be solved with model-interpretable stimuli. This is an additional argument in praise of artifice in line with Rust and Movshon (2005).
Collapse
Affiliation(s)
- Marina Martinez-Garcia
- Image Processing Lab, Universitat de València Valencia, Spain.,CSIC, Instituto de Neurociencias Alicante, Spain
| | - Marcelo Bertalmío
- Departamento de Tecnologías de la Información y las Comunicaciones, Universidad Pompeu Fabra Barcelona, Spain
| | - Jesús Malo
- Image Processing Lab, Universitat de València Valencia, Spain
| |
Collapse
|
10
|
Derivatives and inverse of cascaded linear+nonlinear neural models. PLoS One 2018; 13:e0201326. [PMID: 30321175 PMCID: PMC6188639 DOI: 10.1371/journal.pone.0201326] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 07/11/2018] [Indexed: 11/20/2022] Open
Abstract
In vision science, cascades of Linear+Nonlinear transforms are very successful in modeling a number of perceptual experiences. However, the conventional literature is usually too focused on only describing the forward input-output transform. Instead, in this work we present the mathematics of such cascades beyond the forward transform, namely the Jacobian matrices and the inverse. The fundamental reason for this analytical treatment is that it offers useful analytical insight into the psychophysics, the physiology, and the function of the visual system. For instance, we show how the trends of the sensitivity (volume of the discrimination regions) and the adaptation of the receptive fields can be identified in the expression of the Jacobian w.r.t. the stimulus. This matrix also tells us which regions of the stimulus space are encoded more efficiently in multi-information terms. The Jacobian w.r.t. the parameters shows which aspects of the model have bigger impact in the response, and hence their relative relevance. The analytic inverse implies conditions for the response and model parameters to ensure appropriate decoding. From the experimental and applied perspective, (a) the Jacobian w.r.t. the stimulus is necessary in new experimental methods based on the synthesis of visual stimuli with interesting geometrical properties, (b) the Jacobian matrices w.r.t. the parameters are convenient to learn the model from classical experiments or alternative goal optimization, and (c) the inverse is a promising model-based alternative to blind machine-learning methods for neural decoding that do not include meaningful biological information. The theory is checked by building and testing a vision model that actually follows a modular Linear+Nonlinear program. Our illustrative derivable and invertible model consists of a cascade of modules that account for brightness, contrast, energy masking, and wavelet masking. To stress the generality of this modular setting we show examples where some of the canonical Divisive Normalization modules are substituted by equivalent modules such as the Wilson-Cowan interaction model (at the V1 cortex) or a tone-mapping model (at the retina).
Collapse
|
11
|
Abstract
Chromatically perceptive observers are endowed with a sense of similarity between colors. For example, two shades of green that are only slightly discriminable are perceived as similar, whereas other pairs of colors, for example, blue and yellow, typically elicit markedly different sensations. The notion of similarity need not be shared by different observers. Dichromat and trichromat subjects perceive colors differently, and two dichromats (or two trichromats, for that matter) may judge chromatic differences inconsistently. Moreover, there is ample evidence that different animal species sense colors diversely. To capture the subjective metric of color perception, here we construct a notion of distance in color space based on the physiology of the retina, and is thereby individually tailored for different observers. By applying the Fisher metric to an analytical model of color representation, we construct a notion of distance that reproduces behavioral experiments of classical discrimination tasks. We then derive a coordinate transformation that defines a new chromatic space in which the Euclidean distance between any two colors is equal to the perceptual distance, as seen by one individual subject, endowed with an arbitrary number of color-sensitive photoreceptors, each with arbitrary absorption probability curves and appearing in arbitrary proportions.
Collapse
Affiliation(s)
- María da Fonseca
- Medical Physics Department, Instituto Balseiro and Centro Atómico Bariloche, 8400 San Carlos de Bariloche, Argentina
| | - Inés Samengo
- Medical Physics Department, Instituto Balseiro and Centro Atómico Bariloche, 8400 San Carlos de Bariloche, Argentina
| |
Collapse
|
12
|
Abstract
Neurons at primary visual cortex (V1) in humans and other species are edge filters organized in orientation maps. In these maps, neurons with similar orientation preference are clustered together in iso-orientation domains. These maps have two fundamental properties: (1) retinotopy, i.e. correspondence between displacements at the image space and displacements at the cortical surface, and (2) a trade-off between good coverage of the visual field with all orientations and continuity of iso-orientation domains in the cortical space. There is an active debate on the origin of these locally continuous maps. While most of the existing descriptions take purely geometric/mechanistic approaches which disregard the network function, a clear exception to this trend in the literature is the original approach of Hyvärinen and Hoyer based on infomax and Topographic Independent Component Analysis (TICA). Although TICA successfully addresses a number of other properties of V1 simple and complex cells, in this work we question the validity of the orientation maps obtained from TICA. We argue that the maps predicted by TICA can be analyzed in the retinal space, and when doing so, it is apparent that they lack the required continuity and retinotopy. Here we show that in the orientation maps reported in the TICA literature it is easy to find examples of violation of the continuity between similarly tuned mechanisms in the retinal space, which suggest a random scrambling incompatible with the maps in primates. The new experiments in the retinal space presented here confirm this guess: TICA basis vectors actually follow a random salt-and-pepper organization back in the image space. Therefore, the interesting clusters found in the TICA topology cannot be interpreted as the actual cortical orientation maps found in cats, primates or humans. In conclusion, Topographic ICA does not reproduce cortical orientation maps.
Collapse
|