1
|
White DN, Burge J. How distinct sources of nuisance variability in natural images and scenes limit human stereopsis. PLoS Comput Biol 2025; 21:e1012945. [PMID: 40233309 PMCID: PMC12080933 DOI: 10.1371/journal.pcbi.1012945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 05/15/2025] [Accepted: 03/10/2025] [Indexed: 04/17/2025] Open
Abstract
Stimulus variability-a form of nuisance variability-is a primary source of perceptual uncertainty in everyday natural tasks. How do different properties of natural images and scenes contribute to this uncertainty? Using binocular disparity as a model system, we report a systematic investigation of how various forms of natural stimulus variability impact performance in a stereo-depth discrimination task. With stimuli sampled from a stereo-image database of real-world scenes having pixel-by-pixel ground-truth distance data, three human observers completed two closely related double-pass psychophysical experiments. In the two experiments, each human observer responded twice to ten thousand unique trials, in which twenty thousand unique stimuli were presented. New analytical methods reveal, from this data, the specific and nearly dissociable effects of two distinct sources of natural stimulus variability-variation in luminance-contrast patterns and variation in local-depth structure-on discrimination performance, as well as the relative importance of stimulus-driven-variability and internal-noise in determining performance limits. Between-observer analyses show that both stimulus-driven sources of uncertainty are responsible for a large proportion of total variance, have strikingly similar effects on different people, and-surprisingly-make stimulus-by-stimulus responses more predictable (not less). The consistency across observers raises the intriguing prospect that image-computable models can make reasonably accurate performance predictions in natural viewing. Overall, the findings provide a rich picture of stimulus factors that contribute to human perceptual performance in natural scenes. The approach should have broad application to other animal models and other sensory-perceptual tasks with natural or naturalistic stimuli.
Collapse
Affiliation(s)
- David N. White
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Electrical Engineering & Computer Science, York University, Toronto, Ontario, Canada
| | - Johannes Burge
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
2
|
Herrera-Esposito D, Burge J. Optimal Estimation of Local Motion-in-Depth with Naturalistic Stimuli. J Neurosci 2025; 45:e0490242024. [PMID: 39592236 PMCID: PMC11841760 DOI: 10.1523/jneurosci.0490-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 10/30/2024] [Accepted: 11/06/2024] [Indexed: 11/28/2024] Open
Abstract
Estimating the motion of objects in depth is important for behavior and is strongly supported by binocular visual cues. To understand both how the brain should estimate motion in depth and how natural constraints shape and limit performance in two local 3D motion tasks, we develop image-computable ideal observers from a large number of binocular video clips created from a dataset of natural images. The observers spatiotemporally filter the videos and nonlinearly decode 3D motion from the filter responses. The optimal filters and decoder are dictated by the task-relevant image statistics and are specific to each task. Multiple findings emerge. First, two distinct filter subpopulations are spontaneously learned for each task. For 3D speed estimation, filters emerge for processing either changing disparities over time or interocular velocity differences, cues that are used by humans. For 3D direction estimation, filters emerge for discriminating either left-right or toward-away motion. Second, the filter responses, conditioned on the latent variable, are well-described as jointly Gaussian, and the covariance of the filter responses carries the information about the task-relevant latent variable. Quadratic combination is thus necessary for optimal decoding, which can be implemented by biologically plausible neural computations. Finally, the ideal observer yields nonobvious-and in some cases counterintuitive-patterns of performance like those exhibited by humans. Important characteristics of human 3D motion processing and estimation may therefore result from optimal information processing in the early visual system.
Collapse
Affiliation(s)
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
3
|
Srinath R, Ni AM, Marucci C, Cohen MR, Brainard DH. Orthogonal neural representations support perceptual judgements of natural stimuli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580134. [PMID: 38464018 PMCID: PMC10925131 DOI: 10.1101/2024.02.14.580134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
In natural behavior, observers must separate relevant information from a barrage of irrelevant information. Many studies have investigated the neural underpinnings of this ability using artificial stimuli presented on simple backgrounds. Natural viewing, however, carries a set of challenges that are inaccessible using artificial stimuli, including neural responses to background objects that are task-irrelevant. An emerging body of evidence suggests that the visual abilities of humans and animals can be modeled through the linear decoding of task-relevant information from visual cortex. This idea suggests the hypothesis that irrelevant features of a natural scene should impair performance on a visual task only if their neural representations intrude on the linear readout of the task relevant feature, as would occur if the representations of task-relevant and irrelevant features are not orthogonal in the underlying neural population. We tested this hypothesis using human psychophysics and monkey neurophysiology, in response to parametrically variable naturalistic stimuli. We demonstrate that 1) the neural representation of one feature (the position of a central object) in visual area V4 is orthogonal to those of several background features, 2) the ability of human observers to precisely judge object position was largely unaffected by task-irrelevant variation in those background features, and 3) many features of the object and the background are orthogonally represented by V4 neural responses. Our observations are consistent with the hypothesis that orthogonal neural representations can support stable perception of objects and features despite the tremendous richness of natural visual scenes.
Collapse
Affiliation(s)
- Ramanujan Srinath
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
| | - Amy M. Ni
- equal contribution
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Claire Marucci
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marlene R. Cohen
- Department of Neurobiology and Neuroscience Institute, The University of Chicago, Chicago, IL 60637, USA
- equal contribution
| | - David H. Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA
- equal contribution
| |
Collapse
|
4
|
Burge J, Cormack LK. Continuous psychophysics shows millisecond-scale visual processing delays are faithfully preserved in movement dynamics. J Vis 2024; 24:4. [PMID: 38722274 PMCID: PMC11094763 DOI: 10.1167/jov.24.5.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/22/2024] [Indexed: 05/18/2024] Open
Abstract
Image differences between the eyes can cause interocular discrepancies in the speed of visual processing. Millisecond-scale differences in visual processing speed can cause dramatic misperceptions of the depth and three-dimensional direction of moving objects. Here, we develop a monocular and binocular continuous target-tracking psychophysics paradigm that can quantify such tiny differences in visual processing speed. Human observers continuously tracked a target undergoing Brownian motion with a range of luminance levels in each eye. Suitable analyses recover the time course of the visuomotor response in each condition, the dependence of visual processing speed on luminance level, and the temporal evolution of processing differences between the eyes. Importantly, using a direct within-observer comparison, we show that continuous target-tracking and traditional forced-choice psychophysical methods provide estimates of interocular delays that agree on average to within a fraction of a millisecond. Thus, visual processing delays are preserved in the movement dynamics of the hand. Finally, we show analytically, and partially confirm experimentally, that differences between the temporal impulse response functions in the two eyes predict how lateral target motion causes misperceptions of motion in depth and associated tracking responses. Because continuous target tracking can accurately recover millisecond-scale differences in visual processing speed and has multiple advantages over traditional psychophysics, it should facilitate the study of temporal processing in the future.
Collapse
Affiliation(s)
- Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| | - Lawrence K Cormack
- Department of Psychology, University of Texas at Austin, Austin, TX, USA
- Center for Perceptual Systems, University of Texas at Austin, Austin, TX, USA
- Institute for Neuroscience, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
5
|
Abdulkarim Z, Guterstam A, Hayatou Z, Ehrsson HH. Neural Substrates of Body Ownership and Agency during Voluntary Movement. J Neurosci 2023; 43:2362-2380. [PMID: 36801824 PMCID: PMC10072298 DOI: 10.1523/jneurosci.1492-22.2023] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 01/18/2023] [Accepted: 02/12/2023] [Indexed: 02/19/2023] Open
Abstract
Body ownership and the sense of agency are two central aspects of bodily self-consciousness. While multiple neuroimaging studies have investigated the neural correlates of body ownership and agency separately, few studies have investigated the relationship between these two aspects during voluntary movement when such experiences naturally combine. By eliciting the moving rubber hand illusion with active or passive finger movements during functional magnetic resonance imaging, we isolated activations reflecting the sense of body ownership and agency, respectively, as well as their interaction, and assessed their overlap and anatomic segregation. We found that perceived hand ownership was associated with activity in premotor, posterior parietal, and cerebellar regions, whereas the sense of agency over the movements of the hand was related to activity in the dorsal premotor cortex and superior temporal cortex. Moreover, one section of the dorsal premotor cortex showed overlapping activity for ownership and agency, and somatosensory cortical activity reflected the interaction of ownership and agency with higher activity when both agency and ownership were experienced. We further found that activations previously attributed to agency in the left insular cortex and right temporoparietal junction reflected the synchrony or asynchrony of visuoproprioceptive stimuli rather than agency. Collectively, these results reveal the neural bases of agency and ownership during voluntary movement. Although the neural representations of these two experiences are largely distinct, there are interactions and functional neuroanatomical overlap during their combination, which has bearing on theories on bodily self-consciousness.SIGNIFICANCE STATEMENT How does the brain generate the sense of being in control of bodily movement (agency) and the sense that body parts belong to one's body (body ownership)? Using fMRI and a bodily illusion triggered by movement, we found that agency is associated with activity in premotor cortex and temporal cortex, and body ownership with activity in premotor, posterior parietal, and cerebellar regions. The activations reflecting the two sensations were largely distinct, but there was overlap in premotor cortex and an interaction in somatosensory cortex. These findings advance our understanding of the neural bases of and interplay between agency and body ownership during voluntary movement, which has implications for the development of advanced controllable prosthetic limbs that feel like real limbs.
Collapse
Affiliation(s)
| | - Arvid Guterstam
- Department of Clinical Neuroscience, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Zineb Hayatou
- Université Paris-Saclay, CNRS, Institut Des Neurosciences Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - H Henrik Ehrsson
- Department of Neuroscience, Karolinska Institutet, 171 77 Stockholm, Sweden
| |
Collapse
|
6
|
Burg MF, Cadena SA, Denfield GH, Walker EY, Tolias AS, Bethge M, Ecker AS. Learning divisive normalization in primary visual cortex. PLoS Comput Biol 2021; 17:e1009028. [PMID: 34097695 PMCID: PMC8211272 DOI: 10.1371/journal.pcbi.1009028] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 06/17/2021] [Accepted: 04/30/2021] [Indexed: 11/18/2022] Open
Abstract
Divisive normalization (DN) is a prominent computational building block in the brain that has been proposed as a canonical cortical operation. Numerous experimental studies have verified its importance for capturing nonlinear neural response properties to simple, artificial stimuli, and computational studies suggest that DN is also an important component for processing natural stimuli. However, we lack quantitative models of DN that are directly informed by measurements of spiking responses in the brain and applicable to arbitrary stimuli. Here, we propose a DN model that is applicable to arbitrary input images. We test its ability to predict how neurons in macaque primary visual cortex (V1) respond to natural images, with a focus on nonlinear response properties within the classical receptive field. Our model consists of one layer of subunits followed by learned orientation-specific DN. It outperforms linear-nonlinear and wavelet-based feature representations and makes a significant step towards the performance of state-of-the-art convolutional neural network (CNN) models. Unlike deep CNNs, our compact DN model offers a direct interpretation of the nature of normalization. By inspecting the learned normalization pool of our model, we gained insights into a long-standing question about the tuning properties of DN that update the current textbook description: we found that within the receptive field oriented features were normalized preferentially by features with similar orientation rather than non-specifically as currently assumed.
Collapse
Affiliation(s)
- Max F. Burg
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- * E-mail:
| | - Santiago A. Cadena
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
| | - George H. Denfield
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
| | - Edgar Y. Walker
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas S. Tolias
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Electrical and Computer Engineering, Rice University, Houston, Texas, United States of America
| | - Matthias Bethge
- Institute for Theoretical Physics and Centre for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
| | - Alexander S. Ecker
- Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Göttingen, Germany
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
| |
Collapse
|
7
|
Kim S, Burge J. Natural scene statistics predict how humans pool information across space in surface tilt estimation. PLoS Comput Biol 2020; 16:e1007947. [PMID: 32579559 PMCID: PMC7340327 DOI: 10.1371/journal.pcbi.1007947] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 07/07/2020] [Accepted: 05/14/2020] [Indexed: 11/30/2022] Open
Abstract
Visual systems estimate the three-dimensional (3D) structure of scenes from information in two-dimensional (2D) retinal images. Visual systems use multiple sources of information to improve the accuracy of these estimates, including statistical knowledge of the probable spatial arrangements of natural scenes. Here, we examine how 3D surface tilts are spatially related in real-world scenes, and show that humans pool information across space when estimating surface tilt in accordance with these spatial relationships. We develop a hierarchical model of surface tilt estimation that is grounded in the statistics of tilt in natural scenes and images. The model computes a global tilt estimate by pooling local tilt estimates within an adaptive spatial neighborhood. The spatial neighborhood in which local estimates are pooled changes according to the value of the local estimate at a target location. The hierarchical model provides more accurate estimates of groundtruth tilt in natural scenes and provides a better account of human performance than the local estimates. Taken together, the results imply that the human visual system pools information about surface tilt across space in accordance with natural scene statistics. Visual systems estimate three-dimensional (3D) properties of scenes from two-dimensional images on the retinas. To solve this difficult problem as accurately as possible, visual systems use many available sources of information, including information about how the 3D properties of the world are spatially arranged. This manuscript reports a systematic analysis of 3D surface tilt in natural scenes, a model of surface tilt estimation that makes use of these scene statistics, and human psychophysical data on the estimation of surface tilt from natural images. The results show that the regularities present in the natural environment predict both how to maximize the accuracy of tilt estimation and how to maximize the prediction of human performance. This work contributes to a growing line of work that establishes links between rigorous measurements of natural scenes and the function of sensory and perceptual systems.
Collapse
Affiliation(s)
- Seha Kim
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| | - Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
8
|
Abstract
An ideal observer is a theoretical model observer that performs a specific sensory-perceptual task optimally, making the best possible use of the available information given physical and biological constraints. An image-computable ideal observer (pixels in, estimates out) is a particularly powerful type of ideal observer that explicitly models the flow of visual information from the stimulus-encoding process to the eventual decoding of a sensory-perceptual estimate. Image-computable ideal observer analyses underlie some of the most important results in vision science. However, most of what we know from ideal observers about visual processing and performance derives from relatively simple tasks and relatively simple stimuli. This review describes recent efforts to develop image-computable ideal observers for a range of tasks with natural stimuli and shows how these observers can be used to predict and understand perceptual and neurophysiological performance. The reviewed results establish principled links among models of neural coding, computational methods for dimensionality reduction, and sensory-perceptual performance in tasks with natural stimuli.
Collapse
Affiliation(s)
- Johannes Burge
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; .,Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
9
|
Chin BM, Burge J. Predicting the Partition of Behavioral Variability in Speed Perception with Naturalistic Stimuli. J Neurosci 2020; 40:864-879. [PMID: 31772139 PMCID: PMC6975300 DOI: 10.1523/jneurosci.1904-19.2019] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 11/12/2019] [Accepted: 11/17/2019] [Indexed: 11/21/2022] Open
Abstract
A core goal of visual neuroscience is to predict human perceptual performance from natural signals. Performance in any natural task can be limited by at least three sources of uncertainty: stimulus variability, internal noise, and suboptimal computations. Determining the relative importance of these factors has been a focus of interest for decades but requires methods for predicting the fundamental limits imposed by stimulus variability on sensory-perceptual precision. Most successes have been limited to simple stimuli and simple tasks. But perception science ultimately aims to understand how vision works with natural stimuli. Successes in this domain have proven elusive. Here, we develop a model of humans based on an image-computable (images in, estimates out) Bayesian ideal observer. Given biological constraints, the ideal optimally uses the statistics relating local intensity patterns in moving images to speed, specifying the fundamental limits imposed by natural stimuli. Next, we propose a theoretical link between two key decision-theoretic quantities that suggests how to experimentally disentangle the impacts of internal noise and deterministic suboptimal computations. In several interlocking discrimination experiments with three male observers, we confirm this link and determine the quantitative impact of each candidate performance-limiting factor. Human performance is near-exclusively limited by natural stimulus variability and internal noise, and humans use near-optimal computations to estimate speed from naturalistic image movies. The findings indicate that the partition of behavioral variability can be predicted from a principled analysis of natural images and scenes. The approach should be extendable to studies of neural variability with natural signals.SIGNIFICANCE STATEMENT Accurate estimation of speed is critical for determining motion in the environment, but humans cannot perform this task without error. Different objects moving at the same speed cast different images on the eyes. This stimulus variability imposes fundamental external limits on the human ability to estimate speed. Predicting these limits has proven difficult. Here, by analyzing natural signals, we predict the quantitative impact of natural stimulus variability on human performance given biological constraints. With integrated experiments, we compare its impact to well-studied performance-limiting factors internal to the visual system. The results suggest that the deterministic computations humans perform are near optimal, and that behavioral responses to natural stimuli can be studied with the rigor and interpretability defining work with simpler stimuli.
Collapse
Affiliation(s)
| | - Johannes Burge
- Department of Psychology,
- Neuroscience Graduate Group, and
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|