1
|
Hu Y, Mohsenzadeh Y. Neural processing of naturalistic audiovisual events in space and time. Commun Biol 2025; 8:110. [PMID: 39843939 PMCID: PMC11754444 DOI: 10.1038/s42003-024-07434-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 12/19/2024] [Indexed: 01/24/2025] Open
Abstract
Our brain seamlessly integrates distinct sensory information to form a coherent percept. However, when real-world audiovisual events are perceived, the specific brain regions and timings for processing different levels of information remain less investigated. To address that, we curated naturalistic videos and recorded functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data when participants viewed videos with accompanying sounds. Our findings reveal early asymmetrical cross-modal interaction, with acoustic information represented in both early visual and auditory regions, while visual information only identified in visual cortices. The visual and auditory features were processed with similar onset but different temporal dynamics. High-level categorical and semantic information emerged in multisensory association areas later in time, indicating late cross-modal integration and its distinct role in converging conceptual information. Comparing neural representations to a two-branch deep neural network model highlighted the necessity of early cross-modal connections to build a biologically plausible model of audiovisual perception. With EEG-fMRI fusion, we provided a spatiotemporally resolved account of neural activity during the processing of naturalistic audiovisual stimuli.
Collapse
Affiliation(s)
- Yu Hu
- Western Institute for Neuroscience, Western University, London, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Yalda Mohsenzadeh
- Western Institute for Neuroscience, Western University, London, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
- Department of Computer Science, Western University, London, ON, Canada.
| |
Collapse
|
2
|
Tovar D, Wilmott J, Wu X, Martin D, Proulx M, Lindberg D, Zhao Y, Mercier O, Guan P. Identifying Behavioral Correlates to Visual Discomfort. ACM TRANSACTIONS ON GRAPHICS 2024; 43:1-10. [DOI: 10.1145/3687929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
Abstract
Outside of self-report surveys, there are no proven, reliable methods to quantify visual discomfort or visually induced motion sickness symptoms when using head-mounted displays. While valuable tools, self-report surveys suffer from potential biases and low sensitivity due to variability in how respondents may assess and report their experience. Consequently, extreme visual-vestibular conflicts are generally used to induce discomfort symptoms large enough to measure reliably with surveys (e.g., stationary participants riding virtual roller coasters). An emerging area of research is the prediction of discomfort survey results from physiological and behavioral markers. However, the signals derived from experimental paradigms that are explicitly designed to be uncomfortable may not generalize to more naturalistic experiences where comfort is prioritized. In this work we introduce a custom VR headset designed to introduce significant near-eye optical distortion (i.e., pupil swim) to induce visual discomfort during more typical VR experiences. We evaluate visual comfort in our headset while users play the popular VR title Job Simulator and show that eye-tracked dynamic distortion correction improves visual comfort in a multi-session, within-subjects user study. We additionally use representational similarity analysis to highlight changes in head and gaze behavior that are potentially more sensitive to visual discomfort than surveys.
Collapse
Affiliation(s)
- David Tovar
- Reality Labs, Meta, Redmond, United States of America
- Vanderbilt University, Nashville, United States of America
| | - James Wilmott
- Reality Labs, Meta, Menlo Park, United States of America
| | - Xiuyun Wu
- Reality Labs, Meta, Redmond, United States of America
| | - Daniel Martin
- Reality Labs Research, Meta, Redmond, United States of America
- Universidad de Zaragoza, Zaragoza, Spain
| | - Michael Proulx
- Reality Labs Research, Meta, Redmond, United States of America
| | - Dave Lindberg
- Reality Labs Research, Meta, Redmond, United States of America
| | - Yang Zhao
- Reality Labs Research, Meta, Redmond, United States of America
| | - Olivier Mercier
- Reality Labs Research, Meta, Redmond, United States of America
| | - Phillip Guan
- Reality Labs Research, Meta, Redmond, United States of America
| |
Collapse
|
3
|
Roads BD, Love BC. The Dimensions of dimensionality. Trends Cogn Sci 2024; 28:1118-1131. [PMID: 39153897 DOI: 10.1016/j.tics.2024.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/19/2024]
Abstract
Cognitive scientists often infer multidimensional representations from data. Whether the data involve text, neuroimaging, neural networks, or human judgments, researchers frequently infer and analyze latent representational spaces (i.e., embeddings). However, the properties of a latent representation (e.g., prediction performance, interpretability, compactness) depend on the inference procedure, which can vary widely across endeavors. For example, dimensions are not always globally interpretable and the dimensionality of different embeddings may not be readily comparable. Moreover, the dichotomy between multidimensional spaces and purportedly richer representational formats, such as graph representations, is misleading. We review what the different notions of dimension in cognitive science imply for how these latent representations should be used and interpreted.
Collapse
Affiliation(s)
- Brett D Roads
- Department of Experimental Psychology, University College London, London, WC1E, UK.
| | - Bradley C Love
- Department of Experimental Psychology, University College London, London, WC1E, UK
| |
Collapse
|
4
|
Ossadtchi A, Semenkov I, Zhuravleva A, Kozunov V, Serikov O, Voloshina E. Representational dissimilarity component analysis (ReDisCA). Neuroimage 2024; 301:120868. [PMID: 39343110 DOI: 10.1016/j.neuroimage.2024.120868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 09/20/2024] [Accepted: 09/23/2024] [Indexed: 10/01/2024] Open
Abstract
The principle of Representational Similarity Analysis (RSA) posits that neural representations reflect the structure of encoded information, allowing exploration of spatial and temporal organization of brain information processing. Traditional RSA when applied to EEG or MEG data faces challenges in accessing activation time series at the brain source level due to modeling complexities and insufficient geometric/anatomical data. To overcome this, we introduce Representational Dissimilarity Component Analysis (ReDisCA), a method for estimating spatial-temporal components in EEG or MEG responses aligned with a target representational dissimilarity matrix (RDM). ReDisCA yields informative spatial filters and associated topographies, offering insights into the location of "representationally relevant" sources. Applied to evoked response time series, ReDisCA produces temporal source activation profiles with the desired RDM. Importantly, while ReDisCA does not require inverse modeling its output is consistent with EEG and MEG observation equation and can be used as an input to rigorous source localization procedures. Demonstrating ReDisCA's efficacy through simulations and comparison with conventional methods, we show superior source localization accuracy and apply the method to real EEG and MEG datasets, revealing physiologically plausible representational structures without inverse modeling. ReDisCA adds to the family of inverse modeling free methods such as independent component analysis (Makeig, 1995), Spatial spectral decomposition (Nikulin, 2011), and Source power comodulation (Dähne, 2014) designed for extraction sources with desired properties from EEG or MEG data. Extending its utility beyond EEG and MEG analysis, ReDisCA is likely to find application in fMRI data analysis and exploration of representational structures emerging in multilayered artificial neural networks.
Collapse
Affiliation(s)
- Alexei Ossadtchi
- Higher School of Economics, Moscow, Russia; LIFT, Life Improvement by Future Technologies Institute, Moscow, Russia; Artificial Intelligence Research Institute, Moscow, Russia.
| | - Ilia Semenkov
- Higher School of Economics, Moscow, Russia; Artificial Intelligence Research Institute, Moscow, Russia
| | - Anna Zhuravleva
- Higher School of Economics, Moscow, Russia; Artificial Intelligence Research Institute, Moscow, Russia
| | - Vladimir Kozunov
- MEG Centre, Moscow State University of Psychology and Education, Russia
| | - Oleg Serikov
- AI Initiative, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia
| | - Ekaterina Voloshina
- Higher School of Economics, Moscow, Russia; Artificial Intelligence Research Institute, Moscow, Russia
| |
Collapse
|
5
|
Conwell C, Prince JS, Kay KN, Alvarez GA, Konkle T. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nat Commun 2024; 15:9383. [PMID: 39477923 PMCID: PMC11526138 DOI: 10.1038/s41467-024-53147-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/01/2024] [Indexed: 11/02/2024] Open
Abstract
The rapid release of high-performing computer vision models offers new potential to study the impact of different inductive biases on the emergent brain alignment of learned representations. Here, we perform controlled comparisons among a curated set of 224 diverse models to test the impact of specific model properties on visual brain predictivity - a process requiring over 1.8 billion regressions and 50.3 thousand representational similarity analyses. We find that models with qualitatively different architectures (e.g. CNNs versus Transformers) and task objectives (e.g. purely visual contrastive learning versus vision- language alignment) achieve near equivalent brain predictivity, when other factors are held constant. Instead, variation across visual training diets yields the largest, most consistent effect on brain predictivity. Many models achieve similarly high brain predictivity, despite clear variation in their underlying representations - suggesting that standard methods used to link models to brains may be too flexible. Broadly, these findings challenge common assumptions about the factors underlying emergent brain alignment, and outline how we can leverage controlled model comparison to probe the common computational principles underlying biological and artificial visual systems.
Collapse
Affiliation(s)
- Colin Conwell
- Department of Psychology, Harvard University, Cambridge, MA, USA.
| | - Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Kendrick N Kay
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA
| | - George A Alvarez
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA.
- Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
6
|
Bretton ZH, Kim H, Banich MT, Lewis-Peacock JA. Suppressing the Maintenance of Information in Working Memory Alters Long-term Memory Traces. J Cogn Neurosci 2024; 36:2117-2136. [PMID: 38940738 PMCID: PMC11383534 DOI: 10.1162/jocn_a_02206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
The sensory recruitment hypothesis conceptualizes information in working memory as being activated representations of information in long-term memory. Accordingly, changes made to an item in working memory would be expected to influence its subsequent retention. Here, we tested the hypothesis that suppressing information from working memory, which can reduce short-term access to that information, may also alter its long-term neural representation. We obtained fMRI data (n = 25; 13 female / 12 male participants) while participants completed a working memory removal task with scene images as stimuli, followed by a final surprise recognition test of the examined items. We applied a multivariate pattern analysis to the data to quantify the engagement of suppression on each trial, to track the contents of working memory during suppression, and to assess representational changes afterward. Our analysis confirms previous reports that suppression of information in working memory involves focused attention to target and remove unwanted information. Furthermore, our findings provide new evidence that even a single dose of suppression of an item in working memory can (if engaged with sufficient strength) produce lasting changes in its neural representation, particularly weakening the unique, item-specific features, which leads to forgetting. Our study sheds light on the underlying mechanisms that contribute to the suppression of unwanted thoughts and highlights the dynamic interplay between working memory and long-term memory.
Collapse
Affiliation(s)
| | - Hyojeong Kim
- University of Texas at Austin
- University of Colorado
| | | | | |
Collapse
|
7
|
Walbrin J, Sossounov N, Mahdiani M, Vaz I, Almeida J. Fine-grained knowledge about manipulable objects is well-predicted by contrastive language image pre-training. iScience 2024; 27:110297. [PMID: 39040066 PMCID: PMC11261149 DOI: 10.1016/j.isci.2024.110297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/23/2024] [Accepted: 06/14/2024] [Indexed: 07/24/2024] Open
Abstract
Object recognition is an important ability that relies on distinguishing between similar objects (e.g., deciding which utensil(s) to use at different stages of meal preparation). Recent work describes the fine-grained organization of knowledge about manipulable objects via the study of the constituent dimensions that are most relevant to human behavior, for example, vision, manipulation, and function-based properties. A logical extension of this work concerns whether or not these dimensions are uniquely human, or can be approximated by deep learning. Here, we show that behavioral dimensions are generally well-predicted by CLIP-ViT - a multimodal network trained on a large and diverse set of image-text pairs. Moreover, this model outperforms comparison networks pre-trained on smaller, image-only datasets. These results demonstrate the impressive capacity of CLIP-ViT to approximate fine-grained object knowledge. We discuss the possible sources of this benefit relative to other models (e.g., multimodal vs. image-only pre-training, dataset size, architecture).
Collapse
Affiliation(s)
- Jon Walbrin
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Nikita Sossounov
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | | | - Igor Vaz
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| | - Jorge Almeida
- Proaction Laboratory, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
- CINEICC, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
8
|
Shoham A, Grosbard ID, Patashnik O, Cohen-Or D, Yovel G. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 2024:10.1038/s41562-024-01816-9. [PMID: 38332339 DOI: 10.1038/s41562-024-01816-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 12/22/2023] [Indexed: 02/10/2024]
Abstract
Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual-semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual-semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.
Collapse
Affiliation(s)
- Adva Shoham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Idan Daniel Grosbard
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Or Patashnik
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Daniel Cohen-Or
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
9
|
Jiahui G, Feilong M, Visconti di Oleggio Castello M, Nastase SA, Haxby JV, Gobbini MI. Modeling naturalistic face processing in humans with deep convolutional neural networks. Proc Natl Acad Sci U S A 2023; 120:e2304085120. [PMID: 37847731 PMCID: PMC10614847 DOI: 10.1073/pnas.2304085120] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance. The ways in which the internal face representations in DCNNs relate to human cognitive representations and brain activity are not well understood. Nearly all previous studies focused on static face image processing with rapid display times and ignored the processing of naturalistic, dynamic information. To address this gap, we developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces). We used this naturalistic dataset to compare representational geometries estimated from DCNNs, behavioral responses, and brain responses. We found that DCNN representational geometries were consistent across architectures, cognitive representational geometries were consistent across raters in a behavioral arrangement task, and neural representational geometries in face areas were consistent across brains. Representational geometries in late, fully connected DCNN layers, which are optimized for individuation, were much more weakly correlated with cognitive and neural geometries than were geometries in late-intermediate layers. The late-intermediate face-DCNN layers successfully matched cognitive representational geometries, as measured with a behavioral arrangement task that primarily reflected categorical attributes, and correlated with neural representational geometries in known face-selective topographies. Our study suggests that current DCNNs successfully capture neural cognitive processes for categorical attributes of faces but less accurately capture individuation and dynamic features.
Collapse
Affiliation(s)
- Guo Jiahui
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - Ma Feilong
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | | | - Samuel A. Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ08544
| | - James V. Haxby
- Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH03755
| | - M. Ida Gobbini
- Department of Medical and Surgical Sciences, University of Bologna, Bologna40138, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico, Istituto delle Scienze Neurologiche di Bologna, Bologna40139, Italia
| |
Collapse
|
10
|
Zhang Y, Rennig J, Magnotti JF, Beauchamp MS. Multivariate fMRI responses in superior temporal cortex predict visual contributions to, and individual differences in, the intelligibility of noisy speech. Neuroimage 2023; 278:120271. [PMID: 37442310 PMCID: PMC10460966 DOI: 10.1016/j.neuroimage.2023.120271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/20/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023] Open
Abstract
Humans have the unique ability to decode the rapid stream of language elements that constitute speech, even when it is contaminated by noise. Two reliable observations about noisy speech perception are that seeing the face of the talker improves intelligibility and the existence of individual differences in the ability to perceive noisy speech. We introduce a multivariate BOLD fMRI measure that explains both observations. In two independent fMRI studies, clear and noisy speech was presented in visual, auditory and audiovisual formats to thirty-seven participants who rated intelligibility. An event-related design was used to sort noisy speech trials by their intelligibility. Individual-differences multidimensional scaling was applied to fMRI response patterns in superior temporal cortex and the dissimilarity between responses to clear speech and noisy (but intelligible) speech was measured. Neural dissimilarity was less for audiovisual speech than auditory-only speech, corresponding to the greater intelligibility of noisy audiovisual speech. Dissimilarity was less in participants with better noisy speech perception, corresponding to individual differences. These relationships held for both single word and entire sentence stimuli, suggesting that they were driven by intelligibility rather than the specific stimuli tested. A neural measure of perceptual intelligibility may aid in the development of strategies for helping those with impaired speech perception.
Collapse
Affiliation(s)
- Yue Zhang
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Department of Neurosurgery, Baylor College of Medicine, Houston, TX, United States
| | - Johannes Rennig
- Division of Neuropsychology, Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| |
Collapse
|
11
|
Boch M, Wagner IC, Karl S, Huber L, Lamm C. Functionally analogous body- and animacy-responsive areas are present in the dog (Canis familiaris) and human occipito-temporal lobe. Commun Biol 2023; 6:645. [PMID: 37369804 PMCID: PMC10300132 DOI: 10.1038/s42003-023-05014-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
Comparing the neural correlates of socio-cognitive skills across species provides insights into the evolution of the social brain and has revealed face- and body-sensitive regions in the primate temporal lobe. Although from a different lineage, dogs share convergent visuo-cognitive skills with humans and a temporal lobe which evolved independently in carnivorans. We investigated the neural correlates of face and body perception in dogs (N = 15) and humans (N = 40) using functional MRI. Combining univariate and multivariate analysis approaches, we found functionally analogous occipito-temporal regions involved in the perception of animate entities and bodies in both species and face-sensitive regions in humans. Though unpredicted, we also observed neural representations of faces compared to inanimate objects, and dog compared to human bodies in dog olfactory regions. These findings shed light on the evolutionary foundations of human and dog social cognition and the predominant role of the temporal lobe.
Collapse
Affiliation(s)
- Magdalena Boch
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria.
- Department of Cognitive Biology, Faculty of Life Sciences, University of Vienna, Vienna, Austria.
| | - Isabella C Wagner
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
- Vienna Cognitive Science Hub, University of Vienna, Vienna, Austria
- Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Sabrina Karl
- Comparative Cognition, Messerli Research Institute, University of Veterinary Medicine Vienna, Medical University of Vienna and University of Vienna, Vienna, Austria
| | - Ludwig Huber
- Comparative Cognition, Messerli Research Institute, University of Veterinary Medicine Vienna, Medical University of Vienna and University of Vienna, Vienna, Austria
| | - Claus Lamm
- Social, Cognitive and Affective Neuroscience Unit, Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
- Vienna Cognitive Science Hub, University of Vienna, Vienna, Austria
| |
Collapse
|
12
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
13
|
Li Z, Dong Q, Hu B, Wu H. Every individual makes a difference: A trinity derived from linking individual brain morphometry, connectivity and mentalising ability. Hum Brain Mapp 2023; 44:3343-3358. [PMID: 37051692 PMCID: PMC10171537 DOI: 10.1002/hbm.26285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 02/01/2023] [Accepted: 03/08/2023] [Indexed: 04/14/2023] Open
Abstract
Mentalising ability, indexed as the ability to understand others' beliefs, feelings, intentions, thoughts and traits, is a pivotal and fundamental component of human social cognition. However, considering the multifaceted nature of mentalising ability, little research has focused on characterising individual differences in different mentalising components. And even less research has been devoted to investigating how the variance in the structural and functional patterns of the amygdala and hippocampus, two vital subcortical regions of the "social brain", are related to inter-individual variability in mentalising ability. Here, as a first step toward filling these gaps, we exploited inter-subject representational similarity analysis (IS-RSA) to assess relationships between amygdala and hippocampal morphometry (surface-based multivariate morphometry statistics, MMS), connectivity (resting-state functional connectivity, rs-FC) and mentalising ability (interactive mentalisation questionnaire [IMQ] scores) across the participants ( N = 24 $$ N=24 $$ ). In IS-RSA, we proposed a novel pipeline, that is, computing patching and pooling operations-based surface distance (CPP-SD), to obtain a decent representation for high-dimensional MMS data. On this basis, we found significant correlations (i.e., second-order isomorphisms) between these three distinct modalities, indicating that a trinity existed in idiosyncratic patterns of brain morphometry, connectivity and mentalising ability. Notably, a region-related mentalising specificity emerged from these associations: self-self and self-other mentalisation are more related to the hippocampus, while other-self mentalisation shows a closer link with the amygdala. Furthermore, by utilising the dyadic regression analysis, we observed significant interactions such that subject pairs with similar morphometry had even greater mentalising similarity if they were also similar in rs-FC. Altogether, we demonstrated the feasibility and illustrated the promise of using IS-RSA to study individual differences, deepening our understanding of how individual brains give rise to their mentalising abilities.
Collapse
Affiliation(s)
- Zhaoning Li
- Centre for Cognitive and Brain Sciences and Department of Psychology, University of Macau, Taipa, China
| | - Qunxi Dong
- School of Medical Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Hu
- School of Medical Technology, Beijing Institute of Technology, Beijing, China
| | - Haiyan Wu
- Centre for Cognitive and Brain Sciences and Department of Psychology, University of Macau, Taipa, China
| |
Collapse
|
14
|
Jozwik KM, Kietzmann TC, Cichy RM, Kriegeskorte N, Mur M. Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics. J Neurosci 2023; 43:1731-1741. [PMID: 36759190 PMCID: PMC10010451 DOI: 10.1523/jneurosci.1424-22.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/08/2022] [Accepted: 12/20/2022] [Indexed: 02/11/2023] Open
Abstract
Deep neural networks (DNNs) are promising models of the cortical computations supporting human object recognition. However, despite their ability to explain a significant portion of variance in neural data, the agreement between models and brain representational dynamics is far from perfect. We address this issue by asking which representational features are currently unaccounted for in neural time series data, estimated for multiple areas of the ventral stream via source-reconstructed magnetoencephalography data acquired in human participants (nine females, six males) during object viewing. We focus on the ability of visuo-semantic models, consisting of human-generated labels of object features and categories, to explain variance beyond the explanatory power of DNNs alone. We report a gradual reversal in the relative importance of DNN versus visuo-semantic features as ventral-stream object representations unfold over space and time. Although lower-level visual areas are better explained by DNN features starting early in time (at 66 ms after stimulus onset), higher-level cortical dynamics are best accounted for by visuo-semantic features starting later in time (at 146 ms after stimulus onset). Among the visuo-semantic features, object parts and basic categories drive the advantage over DNNs. These results show that a significant component of the variance unexplained by DNNs in higher-level cortical dynamics is structured and can be explained by readily nameable aspects of the objects. We conclude that current DNNs fail to fully capture dynamic representations in higher-level human visual cortex and suggest a path toward more accurate models of ventral-stream computations.SIGNIFICANCE STATEMENT When we view objects such as faces and cars in our visual environment, their neural representations dynamically unfold over time at a millisecond scale. These dynamics reflect the cortical computations that support fast and robust object recognition. DNNs have emerged as a promising framework for modeling these computations but cannot yet fully account for the neural dynamics. Using magnetoencephalography data acquired in human observers during object viewing, we show that readily nameable aspects of objects, such as 'eye', 'wheel', and 'face', can account for variance in the neural dynamics over and above DNNs. These findings suggest that DNNs and humans may in part rely on different object features for visual recognition and provide guidelines for model improvement.
Collapse
Affiliation(s)
- Kamila M Jozwik
- Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
| | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, 49069 Osnabrück, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, 14195 Berlin, Germany
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York 10027
| | - Marieke Mur
- Department of Psychology, Western University, London, Ontario N6A 3K7, Canada
- Department of Computer Science, Western University, London, Ontario N6A 3K7, Canada
| |
Collapse
|
15
|
Hebart MN, Contier O, Teichmann L, Rockter AH, Zheng CY, Kidder A, Corriveau A, Vaziri-Pashkam M, Baker CI. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 2023; 12:e82580. [PMID: 36847339 PMCID: PMC10038662 DOI: 10.7554/elife.82580] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 02/25/2023] [Indexed: 03/01/2023] Open
Abstract
Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here, we present THINGS-data, a multimodal collection of large-scale neuroimaging and behavioral datasets in humans, comprising densely sampled functional MRI and magnetoencephalographic recordings, as well as 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core public release of the THINGS initiative (https://things-initiative.org) for bridging the gap between disciplines and the advancement of cognitive neuroscience.
Collapse
Affiliation(s)
- Martin N Hebart
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- Department of Medicine, Justus Liebig University GiessenGiessenGermany
| | - Oliver Contier
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- Max Planck School of Cognition, Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Lina Teichmann
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Adam H Rockter
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Charles Y Zheng
- Machine Learning Core, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Alexis Kidder
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Anna Corriveau
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Maryam Vaziri-Pashkam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| | - Chris I Baker
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
16
|
Revsine C, Gonzalez-Castillo J, Merriam EP, Bandettini PA, Ramírez FM. A unifying model for discordant and concordant results in human neuroimaging studies of facial viewpoint selectivity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.08.527219. [PMID: 36945636 PMCID: PMC10028835 DOI: 10.1101/2023.02.08.527219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Our ability to recognize faces regardless of viewpoint is a key property of the primate visual system. Traditional theories hold that facial viewpoint is represented by view-selective mechanisms at early visual processing stages and that representations become increasingly tolerant to viewpoint changes in higher-level visual areas. Newer theories, based on single-neuron monkey electrophysiological recordings, suggest an additional intermediate processing stage invariant to mirror-symmetric face views. Consistent with traditional theories, human studies combining neuroimaging and multivariate pattern analysis (MVPA) methods have provided evidence of view-selectivity in early visual cortex. However, contradictory results have been reported in higher-level visual areas concerning the existence in humans of mirror-symmetrically tuned representations. We believe these results reflect low-level stimulus confounds and data analysis choices. To probe for low-level confounds, we analyzed images from two popular face databases. Analyses of mean image luminance and contrast revealed biases across face views described by even polynomials-i.e., mirror-symmetric. To explain major trends across human neuroimaging studies of viewpoint selectivity, we constructed a network model that incorporates three biological constraints: cortical magnification, convergent feedforward projections, and interhemispheric connections. Given the identified low-level biases, we show that a gradual increase of interhemispheric connections across network layers is sufficient to replicate findings of mirror-symmetry in high-level processing stages, as well as view-tuning in early processing stages. Data analysis decisions-pattern dissimilarity measure and data recentering-accounted for the variable observation of mirror-symmetry in late processing stages. The model provides a unifying explanation of MVPA studies of viewpoint selectivity. We also show how common analysis choices can lead to erroneous conclusions.
Collapse
Affiliation(s)
- Cambria Revsine
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Department of Psychology, University of Chicago, Chicago, IL
| | - Javier Gonzalez-Castillo
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Elisha P Merriam
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Peter A Bandettini
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Functional MRI Core, National Institutes of Health, Bethesda, MD
| | - Fernando M Ramírez
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| |
Collapse
|
17
|
Kob L. Exploring the role of structuralist methodology in the neuroscience of consciousness: a defense and analysis. Neurosci Conscious 2023; 2023:niad011. [PMID: 37205986 PMCID: PMC10191193 DOI: 10.1093/nc/niad011] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/27/2023] [Accepted: 04/13/2023] [Indexed: 05/21/2023] Open
Abstract
Traditional contrastive analysis has been the foundation of consciousness science, but its limitations due to the lack of a reliable method for measuring states of consciousness have prompted the exploration of alternative approaches. Structuralist theories have gained attention as an alternative that focuses on the structural properties of phenomenal experience and seeks to identify their neural encoding via structural similarities between quality spaces and neural state spaces. However, the intertwining of philosophical assumptions about structuralism and structuralist methodology may pose a challenge to those who are skeptical of the former. In this paper, I offer an analysis and defense of structuralism as a methodological approach in consciousness science, which is partly independent of structuralist assumptions on the nature of consciousness. By doing so, I aim to make structuralist methodology more accessible to a broader scientific and philosophical audience. I situate methodological structuralism in the context of questions concerning mental representation, psychophysical measurement, holism, and functional relevance of neural processes. At last, I analyze the relationship between the structural approach and the distinction between conscious and unconscious states.
Collapse
Affiliation(s)
- Lukas Kob
- *Corresponding author. Philosophy Department, Otto-von-Guericke University, Zschokkestraße 32, Magdeburg 39104, Germany. E-mail:
| |
Collapse
|
18
|
Prince JS, Charest I, Kurzawski JW, Pyles JA, Tarr MJ, Kay KN. Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife 2022; 11:77599. [PMID: 36444984 PMCID: PMC9708069 DOI: 10.7554/elife.77599] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 10/15/2022] [Indexed: 11/30/2022] Open
Abstract
Advances in artificial intelligence have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle, a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses (glmsingle.org). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experiment. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Comparable improvements in reliability are also observed in a smaller-scale auditory dataset from the StudyForrest experiment. These improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. We demonstrate that GLMsingle: (i) helps decorrelate response estimates between trials nearby in time; (ii) enhances representational similarity between subjects within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets sampling brain activity across many experimental conditions.
Collapse
Affiliation(s)
- Jacob S Prince
- Department of Psychology, Harvard University, Cambridge, United States
| | - Ian Charest
- Center for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, United Kingdom.,cerebrUM, Département de Psychologie, Université de Montréal, Montréal, Canada
| | - Jan W Kurzawski
- Department of Psychology, New York University, New York, United States
| | - John A Pyles
- Center for Human Neuroscience, Department of Psychology, University of Washington, Seattle, United States
| | - Michael J Tarr
- Department of Psychology, Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Kendrick N Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, United States
| |
Collapse
|