1
|
Quilty-Dunn J, Porot N, Mandelbaum E. The language-of-thought hypothesis as a working hypothesis in cognitive science. Behav Brain Sci 2023; 46:e292. [PMID: 37766639 DOI: 10.1017/s0140525x23002431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
The target article attempted to draw connections between broad swaths of evidence by noticing a common thread: Abstract, symbolic, compositional codes, that is, language-of-thoughts (LoTs). Commentators raised concerns about the evidence and offered fascinating extensions to areas we overlooked. Here we respond and highlight the many specific empirical questions to be answered in the next decade and beyond.
Collapse
Affiliation(s)
- Jake Quilty-Dunn
- Department of Philosophy and Philosophy-Neuroscience-Psychology Program, Washington University in St. Louis, St. Louis, MO, USA ; sites.google.com/site/jakequiltydunn/
| | - Nicolas Porot
- Africa Institute for Research in Economics and Social Sciences, Mohammed VI Polytechnic University, Ben Guerir, Morocco ; nicolasporot.com
| | - Eric Mandelbaum
- Department of Philosophy and Department of Psychology, The Graduate Center & Baruch College, CUNY, New York, NY, USA ; ericmandelbaum.com
| |
Collapse
|
2
|
Sanchez-Cesteros O, Rincon M, Bachiller M, Valladares-Rodriguez S. A Long Skip Connection for Enhanced Color Selectivity in CNN Architectures. SENSORS (BASEL, SWITZERLAND) 2023; 23:7582. [PMID: 37688036 PMCID: PMC10490730 DOI: 10.3390/s23177582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/25/2023] [Accepted: 08/29/2023] [Indexed: 09/10/2023]
Abstract
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation space. Inspired by the direct connection between the LGN and V4, which allows V4 to handle low-level information closer to the trichromatic input in addition to processed information that comes from V2/V3, we propose the addition of a long skip connection (LSC) between the first and last blocks of the feature extraction stage to allow deeper parts of the network to receive information from shallower layers. This type of connection improves classification accuracy by combining simple-visual and complex-abstract features to create more color-selective ones. We have applied this strategy to classic CNN architectures and quantitatively and qualitatively analyzed the improvement in accuracy while focusing on color selectivity. The results show that, in general, skip connections improve accuracy, but LSC improves it even more and enhances the color selectivity of the original CNN architectures. As a side result, we propose a new color representation procedure for organizing and filtering feature maps, making their visualization more manageable for qualitative color selectivity analysis.
Collapse
Affiliation(s)
- Oscar Sanchez-Cesteros
- Department of Artificial Intelligence, National University of Distance Education (UNED), 28040 Madrid, Spain; (M.R.); (M.B.); (S.V.-R.)
| | - Mariano Rincon
- Department of Artificial Intelligence, National University of Distance Education (UNED), 28040 Madrid, Spain; (M.R.); (M.B.); (S.V.-R.)
| | - Margarita Bachiller
- Department of Artificial Intelligence, National University of Distance Education (UNED), 28040 Madrid, Spain; (M.R.); (M.B.); (S.V.-R.)
| | - Sonia Valladares-Rodriguez
- Department of Artificial Intelligence, National University of Distance Education (UNED), 28040 Madrid, Spain; (M.R.); (M.B.); (S.V.-R.)
- Department of Electronics and Computing, University of Santiago de Compostela (USC), 15705 Santiago de Compostela, Spain
| |
Collapse
|
3
|
Mocz V, Jeong SK, Chun M, Xu Y. Multiple visual objects are represented differently in the human brain and convolutional neural networks. Sci Rep 2023; 13:9088. [PMID: 37277406 PMCID: PMC10241785 DOI: 10.1038/s41598-023-36029-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 05/27/2023] [Indexed: 06/07/2023] Open
Abstract
Objects in the real world usually appear with other objects. To form object representations independent of whether or not other objects are encoded concurrently, in the primate brain, responses to an object pair are well approximated by the average responses to each constituent object shown alone. This is found at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in fMRI voxel response patterns in human ventral object processing regions (e.g., LO). Here, we compare how the human brain and convolutional neural networks (CNNs) represent paired objects. In human LO, we show that averaging exists in both single fMRI voxels and voxel population responses. However, in the higher layers of five CNNs pretrained for object classification varying in architecture, depth and recurrent processing, slope distribution across units and, consequently, averaging at the population level both deviated significantly from the brain data. Object representations thus interact with each other in CNNs when objects are shown together and differ from when objects are shown individually. Such distortions could significantly limit CNNs' ability to generalize object representations formed in different contexts.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, Cheongju, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA.
| |
Collapse
|
4
|
Taylor J, Xu Y. Comparing the Dominance of Color and Form Information across the Human Ventral Visual Pathway and Convolutional Neural Networks. J Cogn Neurosci 2023; 35:816-840. [PMID: 36877074 PMCID: PMC11283826 DOI: 10.1162/jocn_a_01979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
Color and form information can be decoded in every region of the human ventral visual hierarchy, and at every layer of many convolutional neural networks (CNNs) trained to recognize objects, but how does the coding strength of these features vary over processing? Here, we characterize for these features both their absolute coding strength-how strongly each feature is represented independent of the other feature-and their relative coding strength-how strongly each feature is encoded relative to the other, which could constrain how well a feature can be read out by downstream regions across variation in the other feature. To quantify relative coding strength, we define a measure called the form dominance index that compares the relative influence of color and form on the representational geometry at each processing stage. We analyze brain and CNN responses to stimuli varying based on color and either a simple form feature, orientation, or a more complex form feature, curvature. We find that while the brain and CNNs largely differ in how the absolute coding strength of color and form vary over processing, comparing them in terms of their relative emphasis of these features reveals a striking similarity: For both the brain and for CNNs trained for object recognition (but not for untrained CNNs), orientation information is increasingly de-emphasized, and curvature information is increasingly emphasized, relative to color information over processing, with corresponding processing stages showing largely similar values of the form dominance index.
Collapse
|
5
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
6
|
Tang K, Chin M, Chun M, Xu Y. The contribution of object identity and configuration to scene representation in convolutional neural networks. PLoS One 2022; 17:e0270667. [PMID: 35763531 PMCID: PMC9239439 DOI: 10.1371/journal.pone.0270667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/14/2022] [Indexed: 11/23/2022] Open
Abstract
Scene perception involves extracting the identities of the objects comprising a scene in conjunction with their configuration (the spatial layout of the objects in the scene). How object identity and configuration information is weighted during scene processing and how this weighting evolves over the course of scene processing however, is not fully understood. Recent developments in convolutional neural networks (CNNs) have demonstrated their aptitude at scene processing tasks and identified correlations between processing in CNNs and in the human brain. Here we examined four CNN architectures (Alexnet, Resnet18, Resnet50, Densenet161) and their sensitivity to changes in object and configuration information over the course of scene processing. Despite differences among the four CNN architectures, across all CNNs, we observed a common pattern in the CNN's response to object identity and configuration changes. Each CNN demonstrated greater sensitivity to configuration changes in early stages of processing and stronger sensitivity to object identity changes in later stages. This pattern persists regardless of the spatial structure present in the image background, the accuracy of the CNN in classifying the scene, and even the task used to train the CNN. Importantly, CNNs' sensitivity to a configuration change is not the same as their sensitivity to any type of position change, such as that induced by a uniform translation of the objects without a configuration change. These results provide one of the first documentations of how object identity and configuration information are weighted in CNNs during scene processing.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Matthew Chin
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Marvin Chun
- Department of Psychology, Yale University, New Haven, CT, United States of America
| | - Yaoda Xu
- Department of Psychology, Yale University, New Haven, CT, United States of America
- * E-mail:
| |
Collapse
|