1
|
Noneman KK, Mayo JP. Decoding Continuous Tracking Eye Movements from Cortical Spiking Activity. Int J Neural Syst 2025; 35:2450070. [PMID: 39545725 PMCID: PMC12049095 DOI: 10.1142/s0129065724500709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Eye movements are the primary way primates interact with the world. Understanding how the brain controls the eyes is therefore crucial for improving human health and designing visual rehabilitation devices. However, brain activity is challenging to decipher. Here, we leveraged machine learning algorithms to reconstruct tracking eye movements from high-resolution neuronal recordings. We found that continuous eye position could be decoded with high accuracy using spiking data from only a few dozen cortical neurons. We tested eight decoders and found that neural network models yielded the highest decoding accuracy. Simpler models performed well above chance with a substantial reduction in training time. We measured the impact of data quantity (e.g. number of neurons) and data format (e.g. bin width) on training time, inference time, and generalizability. Training models with more input data improved performance, as expected, but the format of the behavioral output was critical for emphasizing or omitting specific oculomotor events. Our results provide the first demonstration, to our knowledge, of continuously decoded eye movements across a large field of view. Our comprehensive investigation of predictive power and computational efficiency for common decoder architectures provides a much-needed foundation for future work on real-time gaze-tracking devices.
Collapse
Affiliation(s)
- Kendra K. Noneman
- Neuroscience Institute, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213,USA
| | - J. Patrick Mayo
- Department of Ophthalmology, University of Pittsburgh, 1622 Locust Street, Pittsburgh, PA 15219, USA
| |
Collapse
|
2
|
Theiss JD, Silver MA. Top-Down Priors Disambiguate Target and Distractor Features in Simulated Covert Visual Search. Neural Comput 2024; 36:2201-2224. [PMID: 39141806 PMCID: PMC11430503 DOI: 10.1162/neco_a_01700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Accepted: 05/30/2024] [Indexed: 08/16/2024]
Abstract
Several models of visual search consider visual attention as part of a perceptual inference process, in which top-down priors disambiguate bottom-up sensory information. Many of these models have focused on gaze behavior, but there are relatively fewer models of covert spatial attention, in which attention is directed to a peripheral location in visual space without a shift in gaze direction. Here, we propose a biologically plausible model of covert attention during visual search that helps to bridge the gap between Bayesian modeling and neurophysiological modeling by using (1) top-down priors over target features that are acquired through Hebbian learning, and (2) spatial resampling of modeled cortical receptive fields to enhance local spatial resolution of image representations for downstream target classification. By training a simple generative model using a Hebbian update rule, top-down priors for target features naturally emerge without the need for hand-tuned or predetermined priors. Furthermore, the implementation of covert spatial attention in our model is based on a known neurobiological mechanism, providing a plausible process through which Bayesian priors could locally enhance the spatial resolution of image representations. We validate this model during simulated visual search for handwritten digits among nondigit distractors, demonstrating that top-down priors improve accuracy for estimation of target location and classification, relative to bottom-up signals alone. Our results support previous reports in the literature that demonstrated beneficial effects of top-down priors on visual search performance, while extending this literature to incorporate known neural mechanisms of covert spatial attention.
Collapse
|
3
|
Morales-Torres R, Wing EA, Deng L, Davis SW, Cabeza R. Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations. J Neurosci 2024; 44:e1479232024. [PMID: 38569925 PMCID: PMC11112637 DOI: 10.1523/jneurosci.1479-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 04/05/2024] Open
Abstract
When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.
Collapse
Affiliation(s)
| | - Erik A Wing
- Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
| | - Lifu Deng
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| | - Simon W Davis
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
- Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
| | - Roberto Cabeza
- Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
| |
Collapse
|
4
|
Wang Y, Cao R, Chakravarthula PN, Yu H, Wang S. Atypical neural encoding of faces in individuals with autism spectrum disorder. Cereb Cortex 2024; 34:172-186. [PMID: 38696606 PMCID: PMC11065108 DOI: 10.1093/cercor/bhae060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 02/02/2024] [Accepted: 02/03/2024] [Indexed: 05/04/2024] Open
Abstract
Individuals with autism spectrum disorder (ASD) experience pervasive difficulties in processing social information from faces. However, the behavioral and neural mechanisms underlying social trait judgments of faces in ASD remain largely unclear. Here, we comprehensively addressed this question by employing functional neuroimaging and parametrically generated faces that vary in facial trustworthiness and dominance. Behaviorally, participants with ASD exhibited reduced specificity but increased inter-rater variability in social trait judgments. Neurally, participants with ASD showed hypo-activation across broad face-processing areas. Multivariate analysis based on trial-by-trial face responses could discriminate participant groups in the majority of the face-processing areas. Encoding social traits in ASD engaged vastly different face-processing areas compared to controls, and encoding different social traits engaged different brain areas. Interestingly, the idiosyncratic brain areas encoding social traits in ASD were still flexible and context-dependent, similar to neurotypicals. Additionally, participants with ASD also showed an altered encoding of facial saliency features in the eyes and mouth. Together, our results provide a comprehensive understanding of the neural mechanisms underlying social trait judgments in ASD.
Collapse
Affiliation(s)
- Yue Wang
- Department of Radiology, Washington University in St. Louis, 4525 Scott Ave, St. Louis, MO 63110, United States
| | - Runnan Cao
- Department of Radiology, Washington University in St. Louis, 4525 Scott Ave, St. Louis, MO 63110, United States
| | - Puneeth N Chakravarthula
- Department of Radiology, Washington University in St. Louis, 4525 Scott Ave, St. Louis, MO 63110, United States
| | - Hongbo Yu
- Department of Psychological & Brain Sciences, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Shuo Wang
- Department of Radiology, Washington University in St. Louis, 4525 Scott Ave, St. Louis, MO 63110, United States
| |
Collapse
|
5
|
Mocz V, Vaziri-Pashkam M, Chun M, Xu Y. Predicting Identity-Preserving Object Transformations in Human Posterior Parietal Cortex and Convolutional Neural Networks. J Cogn Neurosci 2022; 34:2406-2435. [PMID: 36122358 PMCID: PMC9988239 DOI: 10.1162/jocn_a_01916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous research shows that, within human occipito-temporal cortex (OTC), we can use a general linear mapping function to link visual object responses across nonidentity feature changes, including Euclidean features (e.g., position and size) and non-Euclidean features (e.g., image statistics and spatial frequency). Although the learned mapping is capable of predicting responses of objects not included in training, these predictions are better for categories included than those not included in training. These findings demonstrate a near-orthogonal representation of object identity and nonidentity features throughout human OTC. Here, we extended these findings to examine the mapping across both Euclidean and non-Euclidean feature changes in human posterior parietal cortex (PPC), including functionally defined regions in inferior and superior intraparietal sulcus. We additionally examined responses in five convolutional neural networks (CNNs) pretrained with object classification, as CNNs are considered as the current best model of the primate ventral visual system. We separately compared results from PPC and CNNs with those of OTC. We found that a linear mapping function could successfully link object responses in different states of nonidentity transformations in human PPC and CNNs for both Euclidean and non-Euclidean features. Overall, we found that object identity and nonidentity features are represented in a near-orthogonal, rather than complete-orthogonal, manner in PPC and CNNs, just like they do in OTC. Meanwhile, some differences existed among OTC, PPC, and CNNs. These results demonstrate the similarities and differences in how visual object information across an identity-preserving image transformation may be represented in OTC, PPC, and CNNs.
Collapse
|
6
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
7
|
Keles U, Kliemann D, Byrge L, Saarimäki H, Paul LK, Kennedy DP, Adolphs R. Atypical gaze patterns in autistic adults are heterogeneous across but reliable within individuals. Mol Autism 2022; 13:39. [PMID: 36153629 PMCID: PMC9508778 DOI: 10.1186/s13229-022-00517-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/16/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Across behavioral studies, autistic individuals show greater variability than typically developing individuals. However, it remains unknown to what extent this variability arises from heterogeneity across individuals, or from unreliability within individuals. Here, we focus on eye tracking, which provides rich dependent measures that have been used extensively in studies of autism. Autistic individuals have an atypical gaze onto both static visual images and dynamic videos that could be leveraged for diagnostic purposes if the above open question could be addressed. METHODS We tested three competing hypotheses: (1) that gaze patterns of autistic individuals are less reliable or noisier than those of controls, (2) that atypical gaze patterns are individually reliable but heterogeneous across autistic individuals, or (3) that atypical gaze patterns are individually reliable and also homogeneous among autistic individuals. We collected desktop-based eye tracking data from two different full-length television sitcom episodes, at two independent sites (Caltech and Indiana University), in a total of over 150 adult participants (N = 48 autistic individuals with IQ in the normal range, 105 controls) and quantified gaze onto features of the videos using automated computer vision-based feature extraction. RESULTS We found support for the second of these hypotheses. Autistic people and controls showed equivalently reliable gaze onto specific features of videos, such as faces, so much so that individuals could be identified significantly above chance using a fingerprinting approach from video epochs as short as 2 min. However, classification of participants into diagnostic groups based on their eye tracking data failed to produce clear group classifications, due to heterogeneity in the autistic group. LIMITATIONS Three limitations are the relatively small sample size, assessment across only two videos (from the same television series), and the absence of other dependent measures (e.g., neuroimaging or genetics) that might have revealed individual-level variability that was not evident with eye tracking. Future studies should expand to larger samples across longer longitudinal epochs, an aim that is now becoming feasible with Internet- and phone-based eye tracking. CONCLUSIONS These findings pave the way for the investigation of autism subtypes, and for elucidating the specific visual features that best discriminate gaze patterns-directions that will also combine with and inform neuroimaging and genetic studies of this complex disorder.
Collapse
Affiliation(s)
- Umit Keles
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, USA.
| | - Dorit Kliemann
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, USA.,Department of Psychological and Brain Sciences, The University of Iowa, Iowa City, USA
| | - Lisa Byrge
- Department of Psychology, University of North Florida, Jacksonville, USA
| | - Heini Saarimäki
- Faculty of Social Sciences, Tampere University, Tampere, Finland
| | - Lynn K Paul
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, USA
| | - Daniel P Kennedy
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, USA
| | - Ralph Adolphs
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, USA.,Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA.,Chen Neuroscience Institute, California Institute of Technology, Pasadena, USA
| |
Collapse
|
8
|
Tang K, Chin M, Chun M, Xu Y. The contribution of object identity and configuration to scene representation in convolutional neural networks. PLoS One 2022; 17:e0270667. [PMID: 35763531 PMCID: PMC9239439 DOI: 10.1371/journal.pone.0270667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/14/2022] [Indexed: 11/23/2022] Open
Abstract
Scene perception involves extracting the identities of the objects comprising a scene in conjunction with their configuration (the spatial layout of the objects in the scene). How object identity and configuration information is weighted during scene processing and how this weighting evolves over the course of scene processing however, is not fully understood. Recent developments in convolutional neural networks (CNNs) have demonstrated their aptitude at scene processing tasks and identified correlations between processing in CNNs and in the human brain. Here we examined four CNN architectures (Alexnet, Resnet18, Resnet50, Densenet161) and their sensitivity to changes in object and configuration information over the course of scene processing. Despite differences among the four CNN architectures, across all CNNs, we observed a common pattern in the CNN's response to object identity and configuration changes. Each CNN demonstrated greater sensitivity to configuration changes in early stages of processing and stronger sensitivity to object identity changes in later stages. This pattern persists regardless of the spatial structure present in the image background, the accuracy of the CNN in classifying the scene, and even the task used to train the CNN. Importantly, CNNs' sensitivity to a configuration change is not the same as their sensitivity to any type of position change, such as that induced by a uniform translation of the objects without a configuration change. These results provide one of the first documentations of how object identity and configuration information are weighted in CNNs during scene processing.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Matthew Chin
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Marvin Chun
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Yaoda Xu
- Department of Psychology, Yale University, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|
9
|
Coggan DD, Watson DM, Wang A, Brownbridge R, Ellis C, Jones K, Kilroy C, Andrews TJ. The representation of shape and texture in category-selective regions of ventral-temporal cortex. Eur J Neurosci 2022; 56:4107-4120. [PMID: 35703007 PMCID: PMC9545892 DOI: 10.1111/ejn.15737] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/31/2022] [Accepted: 06/01/2022] [Indexed: 11/27/2022]
Abstract
Neuroimaging studies using univariate and multivariate approaches have shown that the fusiform face area (FFA) and parahippocampal place area (PPA) respond selectively to images of faces and places. The aim of this study was to determine the extent to which this selectivity to faces or places is based on the shape or texture properties of the images. Faces and houses were filtered to manipulate their texture properties, while preserving the shape properties (spatial envelope) of the images. In Experiment 1, multivariate pattern analysis (MVPA) showed that patterns of fMRI response to faces and houses in FFA and PPA were predicted by the shape properties, but not by the texture properties of the image. In Experiment 2, a univariate analysis (fMR‐adaptation) showed that responses in the FFA and PPA were sensitive to changes in both the shape and texture properties of the image. These findings can be explained by the spatial scale of the representation of images in the FFA and PPA. At a coarser scale (revealed by MVPA), the neural selectivity to faces and houses is sensitive to variation in the shape properties of the image. However, at a finer scale (revealed by fMR‐adaptation), the neural selectivity is sensitive to the texture properties of the image. By combining these neuroimaging paradigms, our results provide insights into the spatial scale of the neural representation of faces and places in the ventral‐temporal cortex.
Collapse
Affiliation(s)
- David D Coggan
- Department of Psychology, University of York, York, UK.,Department of Psychology, Vanderbilt University, Nashville, Tennessee, USA
| | | | - Ao Wang
- Department of Psychology, University of York, York, UK
| | | | | | - Kathryn Jones
- Department of Psychology, University of York, York, UK
| | | | | |
Collapse
|
10
|
Callahan-Flintoft C, Barentine C, Touryan J, Ries AJ. A Case for Studying Naturalistic Eye and Head Movements in Virtual Environments. Front Psychol 2022; 12:650693. [PMID: 35035362 PMCID: PMC8759101 DOI: 10.3389/fpsyg.2021.650693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 11/10/2021] [Indexed: 12/03/2022] Open
Abstract
Using head mounted displays (HMDs) in conjunction with virtual reality (VR), vision researchers are able to capture more naturalistic vision in an experimentally controlled setting. Namely, eye movements can be accurately tracked as they occur in concert with head movements as subjects navigate virtual environments. A benefit of this approach is that, unlike other mobile eye tracking (ET) set-ups in unconstrained settings, the experimenter has precise control over the location and timing of stimulus presentation, making it easier to compare findings between HMD studies and those that use monitor displays, which account for the bulk of previous work in eye movement research and vision sciences more generally. Here, a visual discrimination paradigm is presented as a proof of concept to demonstrate the applicability of collecting eye and head tracking data from an HMD in VR for vision research. The current work’s contribution is 3-fold: firstly, results demonstrating both the strengths and the weaknesses of recording and classifying eye and head tracking data in VR, secondly, a highly flexible graphical user interface (GUI) used to generate the current experiment, is offered to lower the software development start-up cost of future researchers transitioning to a VR space, and finally, the dataset analyzed here of behavioral, eye and head tracking data synchronized with environmental variables from a task specifically designed to elicit a variety of eye and head movements could be an asset in testing future eye movement classification algorithms.
Collapse
Affiliation(s)
- Chloe Callahan-Flintoft
- Humans in Complex System Directorate, United States Army Research Laboratory, Adelphi, MD, United States
| | - Christian Barentine
- Warfighter Effectiveness Research Center, United States Air Force Academy, Colorado Springs, CO, United States
| | - Jonathan Touryan
- Humans in Complex System Directorate, United States Army Research Laboratory, Adelphi, MD, United States
| | - Anthony J Ries
- Humans in Complex System Directorate, United States Army Research Laboratory, Adelphi, MD, United States.,Warfighter Effectiveness Research Center, United States Air Force Academy, Colorado Springs, CO, United States
| |
Collapse
|
11
|
Rusch KM. Combining fMRI and Eye-tracking for the Study of Social Cognition. Neurosci Insights 2021; 16:26331055211065497. [PMID: 34950876 PMCID: PMC8689432 DOI: 10.1177/26331055211065497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 11/22/2021] [Indexed: 11/25/2022] Open
Abstract
The study of social cognition with functional magnetic resonance imaging (fMRI) affords
the use of complex stimulus material. Visual attention to distinct aspects of these
stimuli can result in the involvement of remarkably different neural systems. Usually, the
influence of gaze on neural signal is either disregarded or dealt with by controlling gaze
of participants through instructions or tasks. However, behavioral restrictions like this
limit the study’s ecological validity. Thus, it would be preferable if participants freely
look at the stimuli while their gaze traces are measured. Yet several impediments hamper a
combination of fMRI and eye-tracking. In our recent work on neural Theory of Mind
processes in alexithymia, we propose a simple way of integrating dwell time on specific
stimulus features into general linear models of fMRI data. By parametrically modeling
fixations, we were able to distinguish neural processes asssociated with specific stimulus
features looked at. Here, I discuss opportunities and obstacles of this approach in more
detail. My goal is to motivate a wider use of parametric models — usually implemented in
common fMRI software packages — to combine fMRI and eye-tracking data.
Collapse
Affiliation(s)
- Kristin Marie Rusch
- Laboratory for Multimodal Neuroimaging, Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany.,Department of Neurology and Neurorehabilitation, Hospital zum Heiligen Geist, Academic Teaching Hospital of the Heinrich-Heine-University Düsseldorf, Kempen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen
| |
Collapse
|
12
|
Frey M, Nau M, Doeller CF. Magnetic resonance-based eye tracking using deep neural networks. Nat Neurosci 2021; 24:1772-1779. [PMID: 34750593 PMCID: PMC10097595 DOI: 10.1038/s41593-021-00947-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 09/17/2021] [Indexed: 12/21/2022]
Abstract
Viewing behavior provides a window into many central aspects of human cognition and health, and it is an important variable of interest or confound in many functional magnetic resonance imaging (fMRI) studies. To make eye tracking freely and widely available for MRI research, we developed DeepMReye, a convolutional neural network (CNN) that decodes gaze position from the magnetic resonance signal of the eyeballs. It performs cameraless eye tracking at subimaging temporal resolution in held-out participants with little training data and across a broad range of scanning protocols. Critically, it works even in existing datasets and when the eyes are closed. Decoded eye movements explain network-wide brain activity also in regions not associated with oculomotor function. This work emphasizes the importance of eye tracking for the interpretation of fMRI results and provides an open source software solution that is widely applicable in research and clinical settings.
Collapse
Affiliation(s)
- Markus Frey
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease, Norwegian University of Science and Technology, Trondheim, Norway. .,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | - Matthias Nau
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease, Norwegian University of Science and Technology, Trondheim, Norway. .,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | - Christian F Doeller
- Kavli Institute for Systems Neuroscience, Centre for Neural Computation, The Egil and Pauline Braathen and Fred Kavli Centre for Cortical Microcircuits, Jebsen Centre for Alzheimer's Disease, Norwegian University of Science and Technology, Trondheim, Norway.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Institute of Psychology, Leipzig University, Leipzig, Germany
| |
Collapse
|
13
|
Yates TS, Ellis CT, Turk-Browne NB. The promise of awake behaving infant fMRI as a deep measure of cognition. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2020.11.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
14
|
Son J, Ai L, Lim R, Xu T, Colcombe S, Franco AR, Cloud J, LaConte S, Lisinski J, Klein A, Craddock RC, Milham M. Evaluating fMRI-Based Estimation of Eye Gaze During Naturalistic Viewing. Cereb Cortex 2021; 30:1171-1184. [PMID: 31595961 DOI: 10.1093/cercor/bhz157] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 06/22/2019] [Accepted: 06/22/2019] [Indexed: 11/13/2022] Open
Abstract
The collection of eye gaze information during functional magnetic resonance imaging (fMRI) is important for monitoring variations in attention and task compliance, particularly for naturalistic viewing paradigms (e.g., movies). However, the complexity and setup requirements of current in-scanner eye tracking solutions can preclude many researchers from accessing such information. Predictive eye estimation regression (PEER) is a previously developed support vector regression-based method for retrospectively estimating eye gaze from the fMRI signal in the eye's orbit using a 1.5-min calibration scan. Here, we provide confirmatory validation of the PEER method's ability to infer eye gaze on a TR-by-TR basis during movie viewing, using simultaneously acquired eye tracking data in five individuals (median angular deviation < 2°). Then, we examine variations in the predictive validity of PEER models across individuals in a subset of data (n = 448) from the Child Mind Institute Healthy Brain Network Biobank, identifying head motion as a primary determinant. Finally, we accurately classify which of the two movies is being watched based on the predicted eye gaze patterns (area under the curve = 0.90 ± 0.02) and map the neural correlates of eye movements derived from PEER. PEER is a freely available and easy-to-use tool for determining eye fixations during naturalistic viewing.
Collapse
Affiliation(s)
- Jake Son
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA.,MATTER Lab, Child Mind Institute, New York, NY, USA
| | - Lei Ai
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA
| | - Ryan Lim
- Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, New York, NY, USA
| | - Ting Xu
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA
| | - Stanley Colcombe
- Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, New York, NY, USA
| | - Alexandre Rosa Franco
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA.,Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, New York, NY, USA
| | - Jessica Cloud
- Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, New York, NY, USA
| | - Stephen LaConte
- Fralin Biomedical Research Institute, Virginia Tech Carilion Research Institute, Blacksburg, VA, USA
| | - Jonathan Lisinski
- Fralin Biomedical Research Institute, Virginia Tech Carilion Research Institute, Blacksburg, VA, USA
| | - Arno Klein
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA.,MATTER Lab, Child Mind Institute, New York, NY, USA
| | - R Cameron Craddock
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA.,Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, New York, NY, USA.,Department of Diagnostic Medicine, Dell Medical School, Austin, TX, USA
| | - Michael Milham
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA.,Center for Biomedical Imaging and Neuromodulation, Nathan S. Kline Institute for Psychiatric Research, New York, NY, USA
| |
Collapse
|
15
|
Xu Y, Vaziri-Pashkam M. Limits to visual representational correspondence between convolutional neural networks and the human brain. Nat Commun 2021; 12:2065. [PMID: 33824315 PMCID: PMC8024324 DOI: 10.1038/s41467-021-22244-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 03/05/2021] [Indexed: 02/01/2023] Open
Abstract
Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs' impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT, USA.
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA
| |
Collapse
|
16
|
Examining the Coding Strength of Object Identity and Nonidentity Features in Human Occipito-Temporal Cortex and Convolutional Neural Networks. J Neurosci 2021; 41:4234-4252. [PMID: 33789916 DOI: 10.1523/jneurosci.1993-20.2021] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 03/12/2021] [Accepted: 03/15/2021] [Indexed: 12/17/2022] Open
Abstract
A visual object is characterized by multiple visual features, including its identity, position and size. Despite the usefulness of identity and nonidentity features in vision and their joint coding throughout the primate ventral visual processing pathway, they have so far been studied relatively independently. Here in both female and male human participants, the coding of identity and nonidentity features was examined together across the human ventral visual pathway. The nonidentity features tested included two Euclidean features (position and size) and two non-Euclidean features (image statistics and spatial frequency (SF) content of an image). Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with identity outweighing the non-Euclidean but not the Euclidean features at higher levels of visual processing. In 14 convolutional neural networks (CNNs) pretrained for object categorization with varying architecture, depth, and with/without recurrent processing, nonidentity feature representation showed an initial large increase from early to mid-stage of processing, followed by a decrease at later stages of processing, different from brain responses. Additionally, from lower to higher levels of visual processing, position became more underrepresented and image statistics and SF became more overrepresented compared with identity in CNNs than in the human brain. Similar results were obtained in a CNN trained with stylized images that emphasized shape representations. Overall, by measuring the coding strength of object identity and nonidentity features together, our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.SIGNIFICANCE STATEMENT This study examined the coding strength of object identity and four types of nonidentity features along the human ventral visual processing pathway and compared brain responses with those of 14 convolutional neural networks (CNNs) pretrained to perform object categorization. Overall, identity representation increased and nonidentity feature representation decreased along the ventral visual pathway, with some notable differences among the different nonidentity features. CNNs differed from the brain in a number of aspects in their representations of identity and nonidentity features over the course of visual processing. Our approach provides a new tool for characterizing feature coding in the human brain and the correspondence between the brain and CNNs.
Collapse
|
17
|
Chen X, Zhou M, Gong Z, Xu W, Liu X, Huang T, Zhen Z, Liu J. DNNBrain: A Unifying Toolbox for Mapping Deep Neural Networks and Brains. Front Comput Neurosci 2020; 14:580632. [PMID: 33328946 PMCID: PMC7734148 DOI: 10.3389/fncom.2020.580632] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 10/27/2020] [Indexed: 01/24/2023] Open
Abstract
Deep neural networks (DNNs) have attained human-level performance on dozens of challenging tasks via an end-to-end deep learning strategy. Deep learning allows data representations that have multiple levels of abstraction; however, it does not explicitly provide any insights into the internal operations of DNNs. Deep learning's success is appealing to neuroscientists not only as a method for applying DNNs to model biological neural systems but also as a means of adopting concepts and methods from cognitive neuroscience to understand the internal representations of DNNs. Although general deep learning frameworks, such as PyTorch and TensorFlow, could be used to allow such cross-disciplinary investigations, the use of these frameworks typically requires high-level programming expertise and comprehensive mathematical knowledge. A toolbox specifically designed as a mechanism for cognitive neuroscientists to map both DNNs and brains is urgently needed. Here, we present DNNBrain, a Python-based toolbox designed for exploring the internal representations of DNNs as well as brains. Through the integration of DNN software packages and well-established brain imaging tools, DNNBrain provides application programming and command line interfaces for a variety of research scenarios. These include extracting DNN activation, probing and visualizing DNN representations, and mapping DNN representations onto the brain. We expect that our toolbox will accelerate scientific research by both applying DNNs to model biological neural systems and utilizing paradigms of cognitive neuroscience to unveil the black box of DNNs.
Collapse
Affiliation(s)
- Xiayu Chen
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Ming Zhou
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Zhengxin Gong
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Wei Xu
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Xingyu Liu
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Taicheng Huang
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| | - Zonglei Zhen
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Jia Liu
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| |
Collapse
|
18
|
Visual fixation prediction with incomplete attention map based on brain storm optimization. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106653] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
19
|
Abstract
Does the human mind resemble the machines that can behave like it? Biologically inspired machine-learning systems approach "human-level" accuracy in an astounding variety of domains, and even predict human brain activity-raising the exciting possibility that such systems represent the world like we do. However, even seemingly intelligent machines fail in strange and "unhumanlike" ways, threatening their status as models of our minds. How can we know when human-machine behavioral differences reflect deep disparities in their underlying capacities, vs. when such failures are only superficial or peripheral? This article draws on a foundational insight from cognitive science-the distinction between performance and competence-to encourage "species-fair" comparisons between humans and machines. The performance/competence distinction urges us to consider whether the failure of a system to behave as ideally hypothesized, or the failure of one creature to behave like another, arises not because the system lacks the relevant knowledge or internal capacities ("competence"), but instead because of superficial constraints on demonstrating that knowledge ("performance"). I argue that this distinction has been neglected by research comparing human and machine behavior, and that it should be essential to any such comparison. Focusing on the domain of image classification, I identify three factors contributing to the species-fairness of human-machine comparisons, extracted from recent work that equates such constraints. Species-fair comparisons level the playing field between natural and artificial intelligence, so that we can separate more superficial differences from those that may be deep and enduring.
Collapse
Affiliation(s)
- Chaz Firestone
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| |
Collapse
|
20
|
A naturalistic viewing paradigm using 360° panoramic video clips and real-time field-of-view changes with eye-gaze tracking. Neuroimage 2020; 216:116617. [DOI: 10.1016/j.neuroimage.2020.116617] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 02/05/2020] [Indexed: 11/18/2022] Open
|
21
|
Geng Z, Wang Y. Automated design of a convolutional neural network with multi-scale filters for cost-efficient seismic data classification. Nat Commun 2020; 11:3311. [PMID: 32620867 PMCID: PMC7335201 DOI: 10.1038/s41467-020-17123-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Accepted: 06/03/2020] [Indexed: 11/09/2022] Open
Abstract
Geoscientists mainly identify subsurface geologic features using exploration-derived seismic data. Classification or segmentation of 2D/3D seismic images commonly relies on conventional deep learning methods for image recognition. However, complex reflections of seismic waves tend to form high-dimensional and multi-scale signals, making traditional convolutional neural networks (CNNs) computationally costly. Here we propose a highly efficient and resource-saving CNN architecture (SeismicPatchNet) with topological modules and multi-scale-feature fusion units for classifying seismic data, which was discovered by an automated data-driven search strategy. The storage volume of the architecture parameters (0.73 M) is only ~2.7 MB, ~0.5% of the well-known VGG-16 architecture. SeismicPatchNet predicts nearly 18 times faster than ResNet-50 and shows an overwhelming advantage in identifying Bottom Simulating Reflection (BSR), an indicator of marine gas-hydrate resources. Saliency mapping demonstrated that our architecture captured key features well. These results suggest the prospect of end-to-end interpretation of multiple seismic datasets at extremely low computational cost.
Collapse
Affiliation(s)
- Zhi Geng
- Key Laboratory of Petroleum Resources Research, Institute of Geology and Geophysics, Chinese Academy of Sciences, 100029, Beijing, P. R. China.
- Innovation Academy for Earth Science, Chinese Academy of Sciences, 100029, Beijing, P. R. China.
| | - Yanfei Wang
- Key Laboratory of Petroleum Resources Research, Institute of Geology and Geophysics, Chinese Academy of Sciences, 100029, Beijing, P. R. China.
- Innovation Academy for Earth Science, Chinese Academy of Sciences, 100029, Beijing, P. R. China.
- University of Chinese Academy of Sciences, 100049, Beijing, P. R. China.
| |
Collapse
|
22
|
Jaegle A, Mehrpour V, Mohsenzadeh Y, Meyer T, Oliva A, Rust N. Population response magnitude variation in inferotemporal cortex predicts image memorability. eLife 2019; 8:47596. [PMID: 31464687 PMCID: PMC6715346 DOI: 10.7554/elife.47596] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 08/13/2019] [Indexed: 01/19/2023] Open
Abstract
Most accounts of image and object encoding in inferotemporal cortex (IT) focus on the distinct patterns of spikes that different images evoke across the IT population. By analyzing data collected from IT as monkeys performed a visual memory task, we demonstrate that variation in a complementary coding scheme, the magnitude of the population response, can largely account for how well images will be remembered. To investigate the origin of IT image memorability modulation, we probed convolutional neural network models trained to categorize objects. We found that, like the brain, different natural images evoked different magnitude responses from these networks, and in higher layers, larger magnitude responses were correlated with the images that humans and monkeys find most memorable. Together, these results suggest that variation in IT population response magnitude is a natural consequence of the optimizations required for visual processing, and that this variation has consequences for visual memory.
Collapse
Affiliation(s)
- Andrew Jaegle
- Department of Psychology, University of Pennsylvania, Philadelphia, United States
| | - Vahid Mehrpour
- Department of Psychology, University of Pennsylvania, Philadelphia, United States
| | - Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States.,Brain and Mind Institute, Western University, London, Canada.,Department of Computer Science, Western University, London, Canada
| | - Travis Meyer
- Department of Psychology, University of Pennsylvania, Philadelphia, United States
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States
| | - Nicole Rust
- Department of Psychology, University of Pennsylvania, Philadelphia, United States
| |
Collapse
|
23
|
Zhou Z, Firestone C. Humans can decipher adversarial images. Nat Commun 2019; 10:1334. [PMID: 30902973 PMCID: PMC6430776 DOI: 10.1038/s41467-019-08931-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Accepted: 01/21/2019] [Indexed: 02/04/2023] Open
Abstract
Does the human mind resemble the machine-learning systems that mirror its performance? Convolutional neural networks (CNNs) have achieved human-level benchmarks in classifying novel images. These advances support technologies such as autonomous vehicles and machine diagnosis; but beyond this, they serve as candidate models for human vision itself. However, unlike humans, CNNs are “fooled” by adversarial examples—nonsense patterns that machines recognize as familiar objects, or seemingly irrelevant image perturbations that nevertheless alter the machine’s classification. Such bizarre behaviors challenge the promise of these new advances; but do human and machine judgments fundamentally diverge? Here, we show that human and machine classification of adversarial images are robustly related: In 8 experiments on 5 prominent and diverse adversarial imagesets, human subjects correctly anticipated the machine’s preferred label over relevant foils—even for images described as “totally unrecognizable to human eyes”. Human intuition may be a surprisingly reliable guide to machine (mis)classification—with consequences for minds and machines alike. Convolutional Neural Networks (CNNs) have reached human-level benchmarks in classifying images, but they can be “fooled” by adversarial examples that elicit bizarre misclassifications from machines. Here, the authors show how humans can anticipate which objects CNNs will see in adversarial images.
Collapse
Affiliation(s)
- Zhenglong Zhou
- Department of Psychological & Brain Sciences, Johns Hopkins University, 3400 N Charles St., Baltimore, MD, 21218, USA
| | - Chaz Firestone
- Department of Psychological & Brain Sciences, Johns Hopkins University, 3400 N Charles St., Baltimore, MD, 21218, USA.
| |
Collapse
|