1
|
Sueur C, Maitre E, Falck J, Shimada M, Pelé M. Beyond human perception: challenges in AI interpretability of orangutan artwork. Primates 2025; 66:249-258. [PMID: 39992583 DOI: 10.1007/s10329-025-01185-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 02/11/2025] [Indexed: 02/26/2025]
Abstract
Drawings serve as a profound medium of expression for both humans and apes, offering unique insights into the cognitive and emotional landscapes of the artists, regardless of their species. This study employs artificial intelligence (AI), specifically Convolutional Neural Networks (CNNs) and the interpretability tool Captum, to analyse non-figurative drawings by Molly, an orangutan. The research utilizes VGG19 and ResNet18 models to decode seasonal nuances in the drawings, achieving notable accuracy in seasonal classification and revealing complex influences beyond human-centric methods. Techniques, such as occlusion, integrated gradients, PCA, t-SNE, and Louvain clustering, highlight critical areas and elements influencing seasonal recognition, providing deeper insights into the drawings. This approach not only advances the analysis of non-human art but also demonstrates the potential of AI to enrich our understanding of non-human cognitive and emotional expressions, with significant implications for fields like evolutionary anthropology and comparative psychology.
Collapse
Affiliation(s)
- Cédric Sueur
- Université de Strasbourg, IPHC, CNRS, UMR 7178, 67000, Strasbourg, France.
- Institut Universitaire de France, 75231, Paris, France.
| | - Elliot Maitre
- Université de Toulouse - IRIT UMR5505, 31400, Toulouse, France
| | - Jimmy Falck
- Laboratoire Lorrain de Recherche en Informatique et ses Applications, (Loria - CNRS/Université de Lorraine/Inria), Nancy, France
| | - Masaki Shimada
- Department of Animal Sciences, Teikyo University of Science, Uenohara Yamanashi, Yatsusawa, 2525409-0193, Japan
| | - Marie Pelé
- ANTHROPO-LAB - ETHICS EA 7446, Université Catholique de Lille, 59000, Lille, France
| |
Collapse
|
2
|
Deng Y, Huang Y, Wu H, He D, Qiu W, Jing B, Lv X, Xia W, Li B, Sun Y, Li C, Xie C, Ke L. Establishment of a deep-learning-assisted recurrent nasopharyngeal carcinoma detecting simultaneous tactic (DARNDEST) with high cost-effectiveness based on magnetic resonance images: a multicenter study in an endemic area. Cancer Imaging 2025; 25:39. [PMID: 40128777 PMCID: PMC11931764 DOI: 10.1186/s40644-025-00853-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 02/26/2025] [Indexed: 03/26/2025] Open
Abstract
BACKGROUND To investigate the feasibility of detecting local recurrent nasopharyngeal carcinoma (rNPC) using unenhanced magnetic resonance images (MRI) and optimize a layered management strategy for follow-up with a deep learning model. METHODS Deep learning models based on 3D DenseNet or ResNet frames using unique sequence (T1WI, T2WI, or T1WIC) or a combination of T1WI and T2WI sequences (T1_T2) were developed to detect local rNPC. A deep-learning-assisted recurrent NPC detecting simultaneous tactic (DARNDEST) utilized DenseNet was optimized by superimposing the T1WIC model over the T1_T2 model in a specific population. Diagnostic efficacy (accuracy, sensitivity, specificity) and examination cost of a single MR scan were compared among the conventional method, T1_T2 model, and DARNDEST using McNemar's Z test. RESULTS No significant differences in overall accuracy, sensitivity, and specificity were found between the T1WIC model and T1WI, T2WI, or T1_T2 models in both test sets (all P > 0.0167). The DARNDEST had higher accuracy and sensitivity but lower specificity than the T1_T2 model in both the internal (accuracy, 85.91% vs. 84.99%; sensitivity, 90.36% vs. 84.26%; specificity, 82.20% vs. 85.59%) and external (accuracy, 86.14% vs. 84.16%; sensitivity, 90.32% vs. 84.95%; specificity, 82.57% vs. 83.49%) test sets. The cost of a single MR examination using DARNDEST was $330,724 (internal) and $328,971 (external) with a hypothetical cohort of 1,000 patients, relative to $313,250 of the T1_T2 model and $340,865 of the conventional method. CONCLUSIONS Detecting local rNPC using unenhanced MRI with deep learning is feasible and DARNDEST-driven follow-up management is efficient and economic.
Collapse
Affiliation(s)
- Yishu Deng
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Information Technology Center, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Yingying Huang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Department of Radiology, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Haijun Wu
- Department of Radiation Oncology, The First People's Hospital of Foshan, Foshan, 528000, China
| | - Dongxia He
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Department of Radiology, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Wenze Qiu
- Department of Radiation Oncology, Affiliated Cancer Hospital and Institute of Guangzhou Medical University, Guangzhou, 510095, China
| | - Bingzhong Jing
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Information Technology Center, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Xing Lv
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Department of Nasopharyngeal Carcinoma, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Weixiong Xia
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Department of Nasopharyngeal Carcinoma, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Bin Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Information Technology Center, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Ying Sun
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China
- Department of Radiation Oncology, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China
| | - Chaofeng Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China.
- Information Technology Center, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China.
- Precision Medicine Center, Sun Yat-Sen University, Guangzhou, 510060, China.
| | - Chuanmiao Xie
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China.
- Department of Radiology, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China.
| | - Liangru Ke
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, 510060, China.
- Department of Radiology, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, China.
| |
Collapse
|
3
|
Kanwisher N. Animal models of the human brain: Successes, limitations, and alternatives. Curr Opin Neurobiol 2025; 90:102969. [PMID: 39914250 DOI: 10.1016/j.conb.2024.102969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 12/19/2024] [Accepted: 12/21/2024] [Indexed: 02/21/2025]
Abstract
The last three decades of research in human cognitive neuroscience have given us an initial "parts list" for the human mind in the form of a set of cortical regions with distinct and often very specific functions. But current neuroscientific methods in humans have limited ability to reveal exactly what these regions represent and compute, the causal role of each in behavior, and the interactions among regions that produce real-world cognition. Animal models can help to answer these questions when homologues exist in other species, like the face system in macaques. When homologues do not exist in animals, for example for speech and music perception, and understanding of language or other people's thoughts, intracranial recordings in humans play a central role, along with a new alternative to animal models: artificial neural networks.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, United States.
| |
Collapse
|
4
|
Gupta P, Dobs K. Human-like face pareidolia emerges in deep neural networks optimized for face and object recognition. PLoS Comput Biol 2025; 21:e1012751. [PMID: 39869654 PMCID: PMC11790231 DOI: 10.1371/journal.pcbi.1012751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/03/2025] [Accepted: 12/24/2024] [Indexed: 01/29/2025] Open
Abstract
The human visual system possesses a remarkable ability to detect and process faces across diverse contexts, including the phenomenon of face pareidolia--seeing faces in inanimate objects. Despite extensive research, it remains unclear why the visual system employs such broadly tuned face detection capabilities. We hypothesized that face pareidolia results from the visual system's optimization for recognizing both faces and objects. To test this hypothesis, we used task-optimized deep convolutional neural networks (CNNs) and evaluated their alignment with human behavioral signatures and neural responses, measured via magnetoencephalography (MEG), related to pareidolia processing. Specifically, we trained CNNs on tasks involving combinations of face identification, face detection, object categorization, and object detection. Using representational similarity analysis, we found that CNNs that included object categorization in their training tasks represented pareidolia faces, real faces, and matched objects more similarly to neural responses than those that did not. Although these CNNs showed similar overall alignment with neural data, a closer examination of their internal representations revealed that specific training tasks had distinct effects on how pareidolia faces were represented across layers. Finally, interpretability methods revealed that only a CNN trained for both face identification and object categorization relied on face-like features-such as 'eyes'-to classify pareidolia stimuli as faces, mirroring findings in human perception. Our results suggest that human-like face pareidolia may emerge from the visual system's optimization for face identification within the context of generalized object categorization.
Collapse
Affiliation(s)
- Pranjul Gupta
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
| | - Katharina Dobs
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany
- Center for Mind, Brain, and Behavior, Universities of Marburg, Giessen and Darmstadt, Marburg, Germany
| |
Collapse
|
5
|
Duyck S, Costantino AI, Bracci S, Op de Beeck H. A computational deep learning investigation of animacy perception in the human brain. Commun Biol 2024; 7:1718. [PMID: 39741161 DOI: 10.1038/s42003-024-07415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 12/18/2024] [Indexed: 01/02/2025] Open
Abstract
The functional organization of the human object vision pathway distinguishes between animate and inanimate objects. To understand animacy perception, we explore the case of zoomorphic objects resembling animals. While the perception of these objects as animal-like seems obvious to humans, such "Animal bias" is a striking discrepancy between the human brain and deep neural networks (DNNs). We computationally investigated the potential origins of this bias. We successfully induced this bias in DNNs trained explicitly with zoomorphic objects. Alternative training schedules failed to cause an Animal bias. We considered the superordinate distinction between animate and inanimate classes, the sensitivity for faces and bodies, the bias for shape over texture, the role of ecologically valid categories, recurrent connections, and language-informed visual processing. These findings provide computational support that the Animal bias for zoomorphic objects is a unique property of human perception yet can be explained by human learning history.
Collapse
Affiliation(s)
- Stefanie Duyck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Andrea I Costantino
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium.
| | - Stefania Bracci
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy
| | - Hans Op de Beeck
- Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
6
|
St-Yves G, Kay K, Naselaris T. Variation in the geometry of concept manifolds across human visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.26.625280. [PMID: 39651255 PMCID: PMC11623644 DOI: 10.1101/2024.11.26.625280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Brain activity patterns in high-level visual cortex support accurate linear classification of visual concepts (e.g., objects or scenes). It has long been appreciated that the accuracy of linear classification in any brain area depends on the geometry of its concept manifolds-sets of brain activity patterns that encode images of a concept. However, it is unclear how the geometry of concept manifolds differs between regions of visual cortex that support accurate classification and those that don't, or how it differs between visual cortex and deep neural networks (DNNs). We estimated geometric properties of concept manifolds that, per a recent theory, directly determine the accuracy of simple "few-shot" linear classifiers. Using a large fMRI dataset, we show that variation in classification accuracy across human visual cortex is driven by a variation in a single geometric property: the distance between manifold centers ("geometric Signal"). In contrast, variation in classification accuracy across most DNN layers is driven by an increase in the effective number of manifold dimensions ("Dimensionality"). Despite this difference in the geometric properties that affect few-shot classification performance in the brain and DNNs, we find that Signal and Dimensionality are strongly, negatively correlated: when Signal increases across brain regions or DNN layers, Dimensionality decreases, and vice versa. We conclude that visual cortex and DNNs deploy different geometric strategies for accurate linear classification of concepts, even though both are subject to the same constraint.
Collapse
|
7
|
Lenskjold A, Brejnebøl MW, Rose MH, Gudbergsen H, Chaudhari A, Troelsen A, Moller A, Nybing JU, Boesen M. Artificial intelligence tools trained on human-labeled data reflect human biases: a case study in a large clinical consecutive knee osteoarthritis cohort. Sci Rep 2024; 14:26782. [PMID: 39500908 PMCID: PMC11538298 DOI: 10.1038/s41598-024-75752-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024] Open
Abstract
Humans have been shown to have biases when reading medical images, raising questions about whether humans are uniform in their disease gradings. Artificial intelligence (AI) tools trained on human-labeled data may have inherent human non-uniformity. In this study, we used a radiographic knee osteoarthritis external validation dataset of 50 patients and a six-year retrospective consecutive clinical cohort of 8,273 patients. An FDA-approved and CE-marked AI tool was tested for potential non-uniformity in Kellgren-Lawrence grades between the right and left sides of the images. We flipped the images horizontally so that a left knee looked like a right knee and vice versa. According to human review, the AI tool showed non-uniformity with 20-22% disagreements on the external validation dataset and 13.6% on the cohort. However, we found no evidence of a significant difference in the accuracy compared to senior radiologists on the external validation dataset, or age bias or sex bias on the cohort. AI non-uniformity can boost the evaluated performance against humans, but image areas with inferior performance should be investigated.
Collapse
Affiliation(s)
- Anders Lenskjold
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark.
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark.
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark.
| | - Mathias W Brejnebøl
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark
| | - Martin H Rose
- Center for Surgical Science, Zealand University Hospital, Køge, Denmark
| | - Henrik Gudbergsen
- The Parker Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Public Health, Center for General Practice, University of Copenhagen, Copenhagen, Denmark
| | - Akshay Chaudhari
- Department of Radiology, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Anders Troelsen
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark
- Department of Orthopaedic Surgery, Copenhagen University Hospital Hvidovre & CAG, ROAD - Research OsteoArthritis, Hvidovre, Denmark
| | - Anne Moller
- Department of Public Health, Center for General Practice, University of Copenhagen, Copenhagen, Denmark
- Primary Health Care Research Unit, Region Zealand, Denmark
| | - Janus U Nybing
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark
| | - Mikael Boesen
- Department of Radiology, Copenhagen University Hospital Bispebjerg-Frederiksberg, Copenhagen, Denmark
- Radiological Artificial Intelligence Testcenter, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen University, Copenhagen, Denmark
| |
Collapse
|
8
|
Cherian T, Arun SP. What do we see behind an occluder? Amodal completion of statistical properties in complex objects. Atten Percept Psychophys 2024; 86:2721-2739. [PMID: 39461932 DOI: 10.3758/s13414-024-02948-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2024] [Indexed: 10/28/2024]
Abstract
When a spiky object is occluded, we expect its spiky features to continue behind the occluder. Although many real-world objects contain complex features, it is unclear how more complex features are amodally completed and whether this process is automatic. To investigate this issue, we created pairs of displays with identical contour edges up to the point of occlusion, but with occluded portions exchanged. We then asked participants to search for oddball targets among distractors and asked whether relations between searches involving occluded displays would match better with relations between searches involving completions that are either globally consistent or inconsistent with the visible portions of these displays. Across two experiments involving simple and complex shapes, search times involving occluded displays matched better with those involving globally consistent compared with inconsistent displays. Analogous analyses on deep networks pretrained for object categorization revealed a similar pattern of results for simple but not complex shapes. Thus, deep networks seem to extrapolate simple occluded contours but not more complex contours. Taken together, our results show that amodal completion in humans is sophisticated and can be based on extrapolating global statistical properties.
Collapse
Affiliation(s)
- Thomas Cherian
- Centre for Neuroscience, Indian Institute of Science, Bengaluru, 560012, India
| | - S P Arun
- Centre for Neuroscience, Indian Institute of Science, Bengaluru, 560012, India.
| |
Collapse
|
9
|
Hernández-Cámara P, Daudén-Oliver P, Laparra V, Malo J. Alignment of color discrimination in humans and image segmentation networks. Front Psychol 2024; 15:1415958. [PMID: 39507086 PMCID: PMC11538077 DOI: 10.3389/fpsyg.2024.1415958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024] Open
Abstract
The experiments allowed by current machine learning models imply a revival of the debate on the causes of specific trends of human visual psychophysics. Machine learning facilitates the exploration of the effect of specific visual goals (such as image segmentation) by different neural architectures in different statistical environments in an unprecedented manner. In this way, (1) the principles behind psychophysical facts such as the non-Euclidean nature of human color discrimination and (2) the emergence of human-like behaviour in artificial systems can be explored under a new light. In this work, we show for the first time that the tolerance or invariance of image segmentation networks for natural images under changes of illuminant in the color space (a sort of insensitivity region around the white) is an ellipsoid oriented similarly to a (human) MacAdam ellipse. This striking similarity between an artificial system and human vision motivates a set of experiments checking the relevance of the statistical environment on the emergence of such insensitivity regions. Results suggest, that in this case, the statistics of the environment may be more relevant than the architecture selected to perform the image segmentation.
Collapse
|
10
|
Kar K, DiCarlo JJ. The Quest for an Integrated Set of Neural Mechanisms Underlying Object Recognition in Primates. Annu Rev Vis Sci 2024; 10:91-121. [PMID: 38950431 DOI: 10.1146/annurev-vision-112823-030616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Inferences made about objects via vision, such as rapid and accurate categorization, are core to primate cognition despite the algorithmic challenge posed by varying viewpoints and scenes. Until recently, the brain mechanisms that support these capabilities were deeply mysterious. However, over the past decade, this scientific mystery has been illuminated by the discovery and development of brain-inspired, image-computable, artificial neural network (ANN) systems that rival primates in these behavioral feats. Apart from fundamentally changing the landscape of artificial intelligence, modified versions of these ANN systems are the current leading scientific hypotheses of an integrated set of mechanisms in the primate ventral visual stream that support core object recognition. What separates brain-mapped versions of these systems from prior conceptual models is that they are sensory computable, mechanistic, anatomically referenced, and testable (SMART). In this article, we review and provide perspective on the brain mechanisms addressed by the current leading SMART models. We review their empirical brain and behavioral alignment successes and failures, discuss the next frontiers for an even more accurate mechanistic understanding, and outline the likely applications.
Collapse
Affiliation(s)
- Kohitij Kar
- Department of Biology, Centre for Vision Research, and Centre for Integrative and Applied Neuroscience, York University, Toronto, Ontario, Canada;
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, MIT Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
11
|
Zhang Q, Zhang Y, Liu N, Sun X. Understanding of facial features in face perception: insights from deep convolutional neural networks. Front Comput Neurosci 2024; 18:1209082. [PMID: 38655070 PMCID: PMC11035738 DOI: 10.3389/fncom.2024.1209082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Introduction Face recognition has been a longstanding subject of interest in the fields of cognitive neuroscience and computer vision research. One key focus has been to understand the relative importance of different facial features in identifying individuals. Previous studies in humans have demonstrated the crucial role of eyebrows in face recognition, potentially even surpassing the importance of the eyes. However, eyebrows are not only vital for face recognition but also play a significant role in recognizing facial expressions and intentions, which might occur simultaneously and influence the face recognition process. Methods To address these challenges, our current study aimed to leverage the power of deep convolutional neural networks (DCNNs), an artificial face recognition system, which can be specifically tailored for face recognition tasks. In this study, we investigated the relative importance of various facial features in face recognition by selectively blocking feature information from the input to the DCNN. Additionally, we conducted experiments in which we systematically blurred the information related to eyebrows to varying degrees. Results Our findings aligned with previous human research, revealing that eyebrows are the most critical feature for face recognition, followed by eyes, mouth, and nose, in that order. The results demonstrated that the presence of eyebrows was more crucial than their specific high-frequency details, such as edges and textures, compared to other facial features, where the details also played a significant role. Furthermore, our results revealed that, unlike other facial features, the activation map indicated that the significance of eyebrows areas could not be readily adjusted to compensate for the absence of eyebrow information. This finding explains why masking eyebrows led to more significant deficits in face recognition performance. Additionally, we observed a synergistic relationship among facial features, providing evidence for holistic processing of faces within the DCNN. Discussion Overall, our study sheds light on the underlying mechanisms of face recognition and underscores the potential of using DCNNs as valuable tools for further exploration in this field.
Collapse
Affiliation(s)
- Qianqian Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Yueyi Zhang
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Ning Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoyan Sun
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| |
Collapse
|
12
|
Vallée R, Gomez T, Bourreille A, Normand N, Mouchère H, Coutrot A. Influence of training and expertise on deep neural network attention and human attention during a medical image classification task. J Vis 2024; 24:6. [PMID: 38587421 PMCID: PMC11008746 DOI: 10.1167/jov.24.4.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 11/19/2023] [Indexed: 04/09/2024] Open
Abstract
In many different domains, experts can make complex decisions after glancing very briefly at an image. However, the perceptual mechanisms underlying expert performance are still largely unknown. Recently, several machine learning algorithms have been shown to outperform human experts in specific tasks. But these algorithms often behave as black boxes and their information processing pipeline remains unknown. This lack of transparency and interpretability is highly problematic in applications involving human lives, such as health care. One way to "open the black box" is to compute an artificial attention map from the model, which highlights the pixels of the input image that contributed the most to the model decision. In this work, we directly compare human visual attention to machine visual attention when performing the same visual task. We have designed a medical diagnosis task involving the detection of lesions in small bowel endoscopic images. We collected eye movements from novices and gastroenterologist experts while they classified medical images according to their relevance for Crohn's disease diagnosis. We trained three state-of-the-art deep learning models on our carefully labeled dataset. Both humans and machine performed the same task. We extracted artificial attention with six different post hoc methods. We show that the model attention maps are significantly closer to human expert attention maps than to novices', especially for pathological images. As the model gets trained and its performance gets closer to the human experts, the similarity between model and human attention increases. Through the understanding of the similarities between the visual decision-making process of human experts and deep neural networks, we hope to inform both the training of new doctors and the architecture of new algorithms.
Collapse
Affiliation(s)
- Rémi Vallée
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Tristan Gomez
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Arnaud Bourreille
- CHU Nantes, Institut des Maladies de l'Appareil Digestif, CIC Inserm 1413, Université de Nantes, Nantes, France
| | - Nicolas Normand
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Harold Mouchère
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Antoine Coutrot
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
- Univ Lyon, CNRS, INSA Lyon, UCBL, LIRIS, UMR5205, Lyon, France
| |
Collapse
|
13
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.16.537079. [PMID: 37163104 PMCID: PMC10168209 DOI: 10.1101/2023.04.16.537079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that (1) in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and (2) lesioning these neurons by setting their output to 0 or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
14
|
Liu P, Bo K, Ding M, Fang R. Emergence of Emotion Selectivity in Deep Neural Networks Trained to Recognize Visual Objects. PLoS Comput Biol 2024; 20:e1011943. [PMID: 38547053 PMCID: PMC10977720 DOI: 10.1371/journal.pcbi.1011943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 02/24/2024] [Indexed: 04/02/2024] Open
Abstract
Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories.
Collapse
Affiliation(s)
- Peng Liu
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Ke Bo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Mingzhou Ding
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
15
|
Nara S, Kaiser D. Integrative processing in artificial and biological vision predicts the perceived beauty of natural images. SCIENCE ADVANCES 2024; 10:eadi9294. [PMID: 38427730 PMCID: PMC10906925 DOI: 10.1126/sciadv.adi9294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 01/29/2024] [Indexed: 03/03/2024]
Abstract
Previous research shows that the beauty of natural images is already determined during perceptual analysis. However, it is unclear which perceptual computations give rise to the perception of beauty. Here, we tested whether perceived beauty is predicted by spatial integration across an image, a perceptual computation that reduces processing demands by aggregating image parts into more efficient representations of the whole. We quantified integrative processing in an artificial deep neural network model, where the degree of integration was determined by the amount of deviation between activations for the whole image and its constituent parts. This quantification of integration predicted beauty ratings for natural images across four studies with different stimuli and designs. In a complementary functional magnetic resonance imaging study, we show that integrative processing in human visual cortex similarly predicts perceived beauty. Together, our results establish integration as a computational principle that facilitates perceptual analysis and thereby mediates the perception of beauty.
Collapse
Affiliation(s)
- Sanjeev Nara
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, Gießen Germany
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, Gießen Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, Marburg, Germany
| |
Collapse
|
16
|
Zhang H, Yoshida S, Li Z. Brain-like illusion produced by Skye's Oblique Grating in deep neural networks. PLoS One 2024; 19:e0299083. [PMID: 38394261 PMCID: PMC10889903 DOI: 10.1371/journal.pone.0299083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024] Open
Abstract
The analogy between the brain and deep neural networks (DNNs) has sparked interest in neuroscience. Although DNNs have limitations, they remain valuable for modeling specific brain characteristics. This study used Skye's Oblique Grating illusion to assess DNNs' relevance to brain neural networks. We collected data on human perceptual responses to a series of visual illusions. This data was then used to assess how DNN responses to these illusions paralleled or differed from human behavior. We performed two analyses:(1) We trained DNNs to perform horizontal vs. non-horizontal classification on images with bars tilted different degrees (non-illusory images) and tested them on images with horizontal bars with different illusory strengths measured by human behavior (illusory images), finding that DNNs showed human-like illusions; (2) We performed representational similarity analysis to assess whether illusory representation existed in different layers within DNNs, finding that DNNs showed illusion-like responses to illusory images. The representational similarity between real tilted images and illusory images was calculated, which showed the highest values in the early layers and decreased layer-by-layer. Our findings suggest that DNNs could serve as potential models for explaining the mechanism of visual illusions in human brain, particularly those that may originate in early visual areas like the primary visual cortex (V1). While promising, further research is necessary to understand the nuanced differences between DNNs and human visual pathways.
Collapse
Affiliation(s)
- Hongtao Zhang
- Graduate School of Engineering, Kochi University of Technology, Kami, Kochi, Japan
| | - Shinichi Yoshida
- School of Information, Kochi University of Technology, Kami, Kochi, Japan
| | - Zhen Li
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University, Shenzhen, China
- Department of Engineering, Shenzhen MSU-BIT University, Shenzhen, China
| |
Collapse
|
17
|
Shoham A, Grosbard ID, Patashnik O, Cohen-Or D, Yovel G. Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nat Hum Behav 2024:10.1038/s41562-024-01816-9. [PMID: 38332339 DOI: 10.1038/s41562-024-01816-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 12/22/2023] [Indexed: 02/10/2024]
Abstract
Mental representations of familiar categories are composed of visual and semantic information. Disentangling the contributions of visual and semantic information in humans is challenging because they are intermixed in mental representations. Deep neural networks that are trained either on images or on text or by pairing images and text enable us now to disentangle human mental representations into their visual, visual-semantic and semantic components. Here we used these deep neural networks to uncover the content of human mental representations of familiar faces and objects when they are viewed or recalled from memory. The results show a larger visual than semantic contribution when images are viewed and a reversed pattern when they are recalled. We further reveal a previously unknown unique contribution of an integrated visual-semantic representation in both perception and memory. We propose a new framework in which visual and semantic information contribute independently and interactively to mental representations in perception and memory.
Collapse
Affiliation(s)
- Adva Shoham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Idan Daniel Grosbard
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Or Patashnik
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Daniel Cohen-Or
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
18
|
Yovel G, Abudarham N. Why psychologists should embrace rather than abandon DNNs. Behav Brain Sci 2023; 46:e414. [PMID: 38054326 DOI: 10.1017/s0140525x2300167x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep neural networks (DNNs) are powerful computational models, which generate complex, high-level representations that were missing in previous models of human cognition. By studying these high-level representations, psychologists can now gain new insights into the nature and origin of human high-level vision, which was not possible with traditional handcrafted models. Abandoning DNNs would be a huge oversight for psychological sciences.
Collapse
Affiliation(s)
- Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel ; https://people.socsci.tau.ac.il/mu/galityovel/
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Naphtali Abudarham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel ; https://people.socsci.tau.ac.il/mu/galityovel/
| |
Collapse
|
19
|
Slagter HA. Perceptual learning in humans: An active, top-down-guided process. Behav Brain Sci 2023; 46:e406. [PMID: 38054288 DOI: 10.1017/s0140525x23001644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Deep neural network (DNN) models of human-like vision are typically built by feeding blank slate DNN visual images as training data. However, the literature on human perception and perceptual learning suggests that developing DNNs that truly model human vision requires a shift in approach in which perception is not treated as a largely bottom-up process, but as an active, top-down-guided process.
Collapse
Affiliation(s)
- Heleen A Slagter
- Department of Cognitive Psychology, Institute for Brain and Behavior Amsterdam, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands ://research.vu.nl/en/persons/heleen-slagter
| |
Collapse
|
20
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
21
|
Lee MJ, DiCarlo JJ. How well do rudimentary plasticity rules predict adult visual object learning? PLoS Comput Biol 2023; 19:e1011713. [PMID: 38079444 PMCID: PMC10754461 DOI: 10.1371/journal.pcbi.1011713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 12/28/2023] [Accepted: 11/27/2023] [Indexed: 12/29/2023] Open
Abstract
A core problem in visual object learning is using a finite number of images of a new object to accurately identify that object in future, novel images. One longstanding, conceptual hypothesis asserts that this core problem is solved by adult brains through two connected mechanisms: 1) the re-representation of incoming retinal images as points in a fixed, multidimensional neural space, and 2) the optimization of linear decision boundaries in that space, via simple plasticity rules applied to a single downstream layer. Though this scheme is biologically plausible, the extent to which it explains learning behavior in humans has been unclear-in part because of a historical lack of image-computable models of the putative neural space, and in part because of a lack of measurements of human learning behaviors in difficult, naturalistic settings. Here, we addressed these gaps by 1) drawing from contemporary, image-computable models of the primate ventral visual stream to create a large set of testable learning models (n = 2,408 models), and 2) using online psychophysics to measure human learning trajectories over a varied set of tasks involving novel 3D objects (n = 371,000 trials), which we then used to develop (and publicly release) empirical benchmarks for comparing learning models to humans. We evaluated each learning model on these benchmarks, and found those based on deep, high-level representations from neural networks were surprisingly aligned with human behavior. While no tested model explained the entirety of replicable human behavior, these results establish that rudimentary plasticity rules, when combined with appropriate visual representations, have high explanatory power in predicting human behavior with respect to this core object learning problem.
Collapse
Affiliation(s)
- Michael J. Lee
- Department of Brain and Cognitive Sciences, MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - James J. DiCarlo
- Department of Brain and Cognitive Sciences, MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds and Machines, MIT, Cambridge, Massachusetts, United States of America
- McGovern Institute for Brain Research, MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
22
|
Jennifer SS, Shamim MH, Reza AW, Siddique N. Sickle cell disease classification using deep learning. Heliyon 2023; 9:e22203. [PMID: 38045118 PMCID: PMC10692811 DOI: 10.1016/j.heliyon.2023.e22203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/24/2023] [Accepted: 11/06/2023] [Indexed: 12/05/2023] Open
Abstract
This paper presents a transfer and deep learning based approach to the classification of Sickle Cell Disease (SCD). Five transfer learning models such as ResNet-50, AlexNet, MobileNet, VGG-16 and VGG-19, and a sequential convolutional neural network (CNN) have been implemented for SCD classification. ErythrocytesIDB dataset has been used for training and testing the models. In order to make up for the data insufficiency of the erythrocytesIDB dataset, advanced image augmentation techniques are employed to ensure the robustness of the dataset, enhance dataset diversity and improve the accuracy of the models. An ablation experiment using Random Forest and Support Vector Machine (SVM) classifiers along with various hyperparameter tweaking was carried out to determine the contribution of different model elements on their predicted accuracy. A rigorous statistical analysis was carried out for evaluation and to further evaluate the model's robustness, an adversarial attack test was conducted. The experimental results demonstrate compelling performance across all models. After performing the statistical tests, it was observed that MobileNet showed a significant improvement (p = 0.0229), while other models (ResNet-50, AlexNet, VGG-16, VGG-19) did not (p > 0.05). Notably, the ResNet-50 model achieves remarkable precision, recall, and F1-score values of 100 % for circular, elongated, and other cell shapes when experimented with a smaller dataset. The AlexNet model achieves a balanced precision (98 %) and recall (99 %) for circular and elongated shapes. Meanwhile, the other models showcase competitive performance.
Collapse
Affiliation(s)
- Sanjeda Sara Jennifer
- Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh
| | - Mahbub Hasan Shamim
- Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh
| | - Ahmed Wasif Reza
- Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh
| | - Nazmul Siddique
- School of Computing, Engineering and Intelligent Systems, Ulster University, UK
| |
Collapse
|
23
|
van Dyck LE, Gruber WR. Modeling Biological Face Recognition with Deep Convolutional Neural Networks. J Cogn Neurosci 2023; 35:1521-1537. [PMID: 37584587 DOI: 10.1162/jocn_a_02040] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Deep convolutional neural networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground, and recent efforts have started to transfer this achievement to research on biological face recognition. In this regard, face detection can be investigated by comparing face-selective biological neurons and brain areas to artificial neurons and model layers. Similarly, face identification can be examined by comparing in vivo and in silico multidimensional "face spaces." In this review, we summarize the first studies that use DCNNs to model biological face recognition. On the basis of a broad spectrum of behavioral and computational evidence, we conclude that DCNNs are useful models that closely resemble the general hierarchical organization of face recognition in the ventral visual pathway and the core face network. In two exemplary spotlights, we emphasize the unique scientific contributions of these models. First, studies on face detection in DCNNs indicate that elementary face selectivity emerges automatically through feedforward processing even in the absence of visual experience. Second, studies on face identification in DCNNs suggest that identity-specific experience and generative mechanisms facilitate this particular challenge. Taken together, as this novel modeling approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), it may be suited to inform long-standing debates on the substrates of biological face recognition.
Collapse
|
24
|
Dobs K, Yuan J, Martinez J, Kanwisher N. Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition. Proc Natl Acad Sci U S A 2023; 120:e2220642120. [PMID: 37523537 PMCID: PMC10410721 DOI: 10.1073/pnas.2220642120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 06/08/2023] [Indexed: 08/02/2023] Open
Abstract
Human face recognition is highly accurate and exhibits a number of distinctive and well-documented behavioral "signatures" such as the use of a characteristic representational space, the disproportionate performance cost when stimuli are presented upside down, and the drop in accuracy for faces from races the participant is less familiar with. These and other phenomena have long been taken as evidence that face recognition is "special". But why does human face perception exhibit these properties in the first place? Here, we use deep convolutional neural networks (CNNs) to test the hypothesis that all of these signatures of human face perception result from optimization for the task of face recognition. Indeed, as predicted by this hypothesis, these phenomena are all found in CNNs trained on face recognition, but not in CNNs trained on object recognition, even when additionally trained to detect faces while matching the amount of face experience. To test whether these signatures are in principle specific to faces, we optimized a CNN on car discrimination and tested it on upright and inverted car images. As we found for face perception, the car-trained network showed a drop in performance for inverted vs. upright cars. Similarly, CNNs trained on inverted faces produced an inverted face inversion effect. These findings show that the behavioral signatures of human face perception reflect and are well explained as the result of optimization for the task of face recognition, and that the nature of the computations underlying this task may not be so special after all.
Collapse
Affiliation(s)
- Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen35394, Germany
- Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Marburg35302, Germany
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Joanne Yuan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Julio Martinez
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Psychology, Stanford University, Stanford, CA94305
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
25
|
Towler A, Dunn JD, Castro Martínez S, Moreton R, Eklöf F, Ruifrok A, Kemp RI, White D. Diverse types of expertise in facial recognition. Sci Rep 2023; 13:11396. [PMID: 37452069 PMCID: PMC10349110 DOI: 10.1038/s41598-023-28632-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 01/20/2023] [Indexed: 07/18/2023] Open
Abstract
Facial recognition errors can jeopardize national security, criminal justice, public safety and civil rights. Here, we compare the most accurate humans and facial recognition technology in a detailed lab-based evaluation and international proficiency test for forensic scientists involving 27 forensic departments from 14 countries. We find striking cognitive and perceptual diversity between naturally skilled super-recognizers, trained forensic examiners and deep neural networks, despite them achieving equivalent accuracy. Clear differences emerged in super-recognizers' and forensic examiners' perceptual processing, errors, and response patterns: super-recognizers were fast, biased to respond 'same person' and misidentified people with extreme confidence, whereas forensic examiners were slow, unbiased and strategically avoided misidentification errors. Further, these human experts and deep neural networks disagreed on the similarity of faces, pointing to differences in their representations of faces. Our findings therefore reveal multiple types of facial recognition expertise, with each type lending itself to particular facial recognition roles in operational settings. Finally, we show that harnessing the diversity between individual experts provides a robust method of maximizing facial recognition accuracy. This can be achieved either via collaboration between experts in forensic laboratories, or most promisingly, by statistical fusion of match scores provided by different types of expert.
Collapse
Affiliation(s)
- Alice Towler
- School of Psychology, University of New South Wales, Sydney, 2052, Australia.
- School of Psychology, The University of Queensland, Brisbane, 4072, Australia.
| | - James D Dunn
- School of Psychology, University of New South Wales, Sydney, 2052, Australia
| | - Sergio Castro Martínez
- Sección Técnicas Identificativas, Comisaría General de Policía Científica, 28039, Madrid, Spain
| | - Reuben Moreton
- School of Psychology, The Open University, Milton Keynes, MK7 6AA, UK
| | - Fredrick Eklöf
- Forensic Imaging Biometrics, Information Technology Section, National Forensic Centre, Swedish Police Authority, 581 94, Linköping, Sweden
| | - Arnout Ruifrok
- Forensic Biometrics, Netherlands Forensic Institute, 2497 GB, The Hague, The Netherlands
| | - Richard I Kemp
- School of Psychology, University of New South Wales, Sydney, 2052, Australia
| | - David White
- School of Psychology, University of New South Wales, Sydney, 2052, Australia
| |
Collapse
|
26
|
Mocz V, Jeong SK, Chun M, Xu Y. Multiple visual objects are represented differently in the human brain and convolutional neural networks. Sci Rep 2023; 13:9088. [PMID: 37277406 PMCID: PMC10241785 DOI: 10.1038/s41598-023-36029-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 05/27/2023] [Indexed: 06/07/2023] Open
Abstract
Objects in the real world usually appear with other objects. To form object representations independent of whether or not other objects are encoded concurrently, in the primate brain, responses to an object pair are well approximated by the average responses to each constituent object shown alone. This is found at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in fMRI voxel response patterns in human ventral object processing regions (e.g., LO). Here, we compare how the human brain and convolutional neural networks (CNNs) represent paired objects. In human LO, we show that averaging exists in both single fMRI voxels and voxel population responses. However, in the higher layers of five CNNs pretrained for object classification varying in architecture, depth and recurrent processing, slope distribution across units and, consequently, averaging at the population level both deviated significantly from the brain data. Object representations thus interact with each other in CNNs when objects are shown together and differ from when objects are shown individually. Such distortions could significantly limit CNNs' ability to generalize object representations formed in different contexts.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, Cheongju, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, 2 Hillhouse Ave, New Haven, CT, 06520, USA.
| |
Collapse
|
27
|
Doerig A, Sommers RP, Seeliger K, Richards B, Ismael J, Lindsay GW, Kording KP, Konkle T, van Gerven MAJ, Kriegeskorte N, Kietzmann TC. The neuroconnectionist research programme. Nat Rev Neurosci 2023:10.1038/s41583-023-00705-w. [PMID: 37253949 DOI: 10.1038/s41583-023-00705-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 06/01/2023]
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call 'neuroconnectionism'. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain.
Collapse
Affiliation(s)
- Adrien Doerig
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
| | - Rowan P Sommers
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Katja Seeliger
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Blake Richards
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
- School of Computer Science, McGill University, Montréal, QC, Canada
- Mila, Montréal, QC, Canada
- Montréal Neurological Institute, Montréal, QC, Canada
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
| | | | | | - Konrad P Kording
- Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada
- Bioengineering, Neuroscience, University of Pennsylvania, Pennsylvania, PA, USA
| | | | | | | | - Tim C Kietzmann
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| |
Collapse
|
28
|
Yovel G, Grosbard I, Abudarham N. Deep learning models challenge the prevailing assumption that face-like effects for objects of expertise support domain-general mechanisms. Proc Biol Sci 2023; 290:20230093. [PMID: 37161322 PMCID: PMC10170201 DOI: 10.1098/rspb.2023.0093] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 04/04/2023] [Indexed: 05/11/2023] Open
Abstract
The question of whether task performance is best achieved by domain-specific, or domain-general processing mechanisms is fundemental for both artificial and biological systems. This question has generated a fierce debate in the study of expert object recognition. Because humans are experts in face recognition, face-like neural and cognitive effects for objects of expertise were considered support for domain-general mechanisms. However, effects of domain, experience and level of categorization, are confounded in human studies, which may lead to erroneous inferences. To overcome these limitations, we trained deep learning algorithms on different domains (objects, faces, birds) and levels of categorization (basic, sub-ordinate, individual), matched for amount of experience. Like humans, the models generated a larger inversion effect for faces than for objects. Importantly, a face-like inversion effect was found for individual-based categorization of non-faces (birds) but only in a network specialized for that domain. Thus, contrary to prevalent assumptions, face-like effects for objects of expertise do not support domain-general mechanisms but may originate from domain-specific mechanisms. More generally, we show how deep learning algorithms can be used to dissociate factors that are inherently confounded in the natural environment of biological organisms to test hypotheses about their isolated contributions to cognition and behaviour.
Collapse
Affiliation(s)
- Galit Yovel
- School of Psychological Sciences, Tel Aviv University, Tel Aviv 69987, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69987, Israel
| | - Idan Grosbard
- School of Psychological Sciences, Tel Aviv University, Tel Aviv 69987, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69987, Israel
| | - Naphtali Abudarham
- School of Psychological Sciences, Tel Aviv University, Tel Aviv 69987, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69987, Israel
| |
Collapse
|
29
|
Baker N, Garrigan P, Phillips A, Kellman PJ. Configural relations in humans and deep convolutional neural networks. Front Artif Intell 2023; 5:961595. [PMID: 36937367 PMCID: PMC10014814 DOI: 10.3389/frai.2022.961595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Accepted: 12/23/2022] [Indexed: 03/05/2023] Open
Abstract
Deep convolutional neural networks (DCNNs) have attracted considerable interest as useful devices and as possible windows into understanding perception and cognition in biological systems. In earlier work, we showed that DCNNs differ dramatically from human perceivers in that they have no sensitivity to global object shape. Here, we investigated whether those findings are symptomatic of broader limitations of DCNNs regarding the use of relations. We tested learning and generalization of DCNNs (AlexNet and ResNet-50) for several relations involving objects. One involved classifying two shapes in an otherwise empty field as same or different. Another involved enclosure. Every display contained a closed figure among contour noise fragments and one dot; correct responding depended on whether the dot was inside or outside the figure. The third relation we tested involved a classification that depended on which of two polygons had more sides. One polygon always contained a dot, and correct classification of each display depended on whether the polygon with the dot had a greater number of sides. We used DCNNs that had been trained on the ImageNet database, and we used both restricted and unrestricted transfer learning (connection weights at all layers could change with training). For the same-different experiment, there was little restricted transfer learning (82.2%). Generalization tests showed near chance performance for new shapes. Results for enclosure were at chance for restricted transfer learning and somewhat better for unrestricted (74%). Generalization with two new kinds of shapes showed reduced but above-chance performance (≈66%). Follow-up studies indicated that the networks did not access the enclosure relation in their responses. For the relation of more or fewer sides of polygons, DCNNs showed successful learning with polygons having 3-5 sides under unrestricted transfer learning, but showed chance performance in generalization tests with polygons having 6-10 sides. Experiments with human observers showed learning from relatively few examples of all of the relations tested and complete generalization of relational learning to new stimuli. These results using several different relations suggest that DCNNs have crucial limitations that derive from their lack of computations involving abstraction and relational processing of the sort that are fundamental in human perception.
Collapse
Affiliation(s)
- Nicholas Baker
- Department of Psychology, Loyola University Chicago, Chicago, IL, United States
| | - Patrick Garrigan
- Department of Psychology, Saint Joseph's University, Philadelphia, PA, United States
| | - Austin Phillips
- UCLA Human Perception Laboratory, Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| | - Philip J. Kellman
- UCLA Human Perception Laboratory, Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
30
|
Mocz V, Jeong SK, Chun M, Xu Y. Representing Multiple Visual Objects in the Human Brain and Convolutional Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.530472. [PMID: 36909506 PMCID: PMC10002658 DOI: 10.1101/2023.02.28.530472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Objects in the real world often appear with other objects. To recover the identity of an object whether or not other objects are encoded concurrently, in primate object-processing regions, neural responses to an object pair have been shown to be well approximated by the average responses to each constituent object shown alone, indicating the whole is equal to the average of its parts. This is present at the single unit level in the slope of response amplitudes of macaque IT neurons to paired and single objects, and at the population level in response patterns of fMRI voxels in human ventral object processing regions (e.g., LO). Here we show that averaging exists in both single fMRI voxels and voxel population responses in human LO, with better averaging in single voxels leading to better averaging in fMRI response patterns, demonstrating a close correspondence of averaging at the fMRI unit and population levels. To understand if a similar averaging mechanism exists in convolutional neural networks (CNNs) pretrained for object classification, we examined five CNNs with varying architecture, depth and the presence/absence of recurrent processing. We observed averaging at the CNN unit level but rarely at the population level, with CNN unit response distribution in most cases did not resemble human LO or macaque IT responses. The whole is thus not equal to the average of its parts in CNNs, potentially rendering the individual objects in a pair less accessible in CNNs during visual processing than they are in the human brain.
Collapse
Affiliation(s)
- Viola Mocz
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
| | - Su Keun Jeong
- Department of Psychology, Chungbuk National University, South Korea
| | - Marvin Chun
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA
| | - Yaoda Xu
- Visual Cognitive Neuroscience Lab, Department of Psychology, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
31
|
Kanwisher N, Khosla M, Dobs K. Using artificial neural networks to ask 'why' questions of minds and brains. Trends Neurosci 2023; 46:240-254. [PMID: 36658072 DOI: 10.1016/j.tins.2022.12.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/29/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Neuroscientists have long characterized the properties and functions of the nervous system, and are increasingly succeeding in answering how brains perform the tasks they do. But the question 'why' brains work the way they do is asked less often. The new ability to optimize artificial neural networks (ANNs) for performance on human-like tasks now enables us to approach these 'why' questions by asking when the properties of networks optimized for a given task mirror the behavioral and neural characteristics of humans performing the same task. Here we highlight the recent success of this strategy in explaining why the visual and auditory systems work the way they do, at both behavioral and neural levels.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Meenakshi Khosla
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University, Giessen, Germany.
| |
Collapse
|
32
|
Kirubeswaran OR, Storrs KR. Inconsistent illusory motion in predictive coding deep neural networks. Vision Res 2023; 206:108195. [PMID: 36801664 DOI: 10.1016/j.visres.2023.108195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 01/31/2023] [Accepted: 01/31/2023] [Indexed: 02/19/2023]
Abstract
Why do we perceive illusory motion in some static images? Several accounts point to eye movements, response latencies to different image elements, or interactions between image patterns and motion energy detectors. Recently PredNet, a recurrent deep neural network (DNN) based on predictive coding principles, was reported to reproduce the "Rotating Snakes" illusion, suggesting a role for predictive coding. We begin by replicating this finding, then use a series of "in silico" psychophysics and electrophysiology experiments to examine whether PredNet behaves consistently with human observers and non-human primate neural data. A pretrained PredNet predicted illusory motion for all subcomponents of the Rotating Snakes pattern, consistent with human observers. However, we found no simple response delays in internal units, unlike evidence from electrophysiological data. PredNet's detection of motion in gradients seemed dependent on contrast, but depends predominantly on luminance in humans. Finally, we examined the robustness of the illusion across ten PredNets of identical architecture, retrained on the same video data. There was large variation across network instances in whether they reproduced the Rotating Snakes illusion, and what motion, if any, they predicted for simplified variants. Unlike human observers, no network predicted motion for greyscale variants of the Rotating Snakes pattern. Our results sound a cautionary note: even when a DNN successfully reproduces some idiosyncrasy of human vision, more detailed investigation can reveal inconsistencies between humans and the network, and between different instances of the same network. These inconsistencies suggest that predictive coding does not reliably give rise to human-like illusory motion.
Collapse
Affiliation(s)
| | - Katherine R Storrs
- Department of Experimental Psychology, Justus Liebig University Giessen, Germany; Centre for Mind, Brain and Behaviour (CMBB), University of Marburg and Justus Liebig University Giessen, Germany; School of Psychology, University of Auckland, New Zealand
| |
Collapse
|
33
|
Beltzung B, Pelé M, Renoult JP, Sueur C. Deep learning for studying drawing behavior: A review. Front Psychol 2023; 14:992541. [PMID: 36844320 PMCID: PMC9945213 DOI: 10.3389/fpsyg.2023.992541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 01/17/2023] [Indexed: 02/11/2023] Open
Abstract
In recent years, computer science has made major advances in understanding drawing behavior. Artificial intelligence, and more precisely deep learning, has displayed unprecedented performance in the automatic recognition and classification of large databases of sketches and drawings collected through touchpad devices. Although deep learning can perform these tasks with high accuracy, the way they are performed by the algorithms remains largely unexplored. Improving the interpretability of deep neural networks is a very active research area, with promising recent advances in understanding human cognition. Deep learning thus offers a powerful framework to study drawing behavior and the underlying cognitive processes, particularly in children and non-human animals, on whom knowledge is incomplete. In this literature review, we first explore the history of deep learning as applied to the study of drawing along with the main discoveries in this area, while proposing open challenges. Second, multiple ideas are discussed to understand the inherent structure of deep learning models. A non-exhaustive list of drawing datasets relevant to deep learning approaches is further provided. Finally, the potential benefits of coupling deep learning with comparative cultural analyses are discussed.
Collapse
Affiliation(s)
- Benjamin Beltzung
- CNRS, IPHC UMR, Université de Strasbourg, Strasbourg, France,*Correspondence: Benjamin Beltzung, ✉
| | - Marie Pelé
- ANTHROPO LAB – ETHICS EA 7446, Université Catholique de Lille, Lille, France
| | | | - Cédric Sueur
- CNRS, IPHC UMR, Université de Strasbourg, Strasbourg, France,Institut Universitaire de France, Paris, France
| |
Collapse
|
34
|
Kenneweg P, Stallmann D, Hammer B. Novel transfer learning schemes based on Siamese networks and synthetic data. Neural Comput Appl 2022; 35:8423-8436. [PMID: 36568475 PMCID: PMC9757634 DOI: 10.1007/s00521-022-08115-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022]
Abstract
Transfer learning schemes based on deep networks which have been trained on huge image corpora offer state-of-the-art technologies in computer vision. Here, supervised and semi-supervised approaches constitute efficient technologies which work well with comparably small data sets. Yet, such applications are currently restricted to application domains where suitable deep network models are readily available. In this contribution, we address an important application area in the domain of biotechnology, the automatic analysis of CHO-K1 suspension growth in microfluidic single-cell cultivation, where data characteristics are very dissimilar to existing domains and trained deep networks cannot easily be adapted by classical transfer learning. We propose a novel transfer learning scheme which expands a recently introduced Twin-VAE architecture, which is trained on realistic and synthetic data, and we modify its specialized training procedure to the transfer learning domain. In the specific domain, often only few to no labels exist and annotations are costly. We investigate a novel transfer learning strategy, which incorporates a simultaneous retraining on natural and synthetic data using an invariant shared representation as well as suitable target variables, while it learns to handle unseen data from a different microscopy technology. We show the superiority of the variation of our Twin-VAE architecture over the state-of-the-art transfer learning methodology in image processing as well as classical image processing technologies, which persists, even with strongly shortened training times and leads to satisfactory results in this domain. The source code is available at https://github.com/dstallmann/transfer_learning_twinvae, works cross-platform, is open-source and free (MIT licensed) software. We make the data sets available at https://pub.uni-bielefeld.de/record/2960030.
Collapse
Affiliation(s)
- Philip Kenneweg
- grid.7491.b0000 0001 0944 9128Machine Learning Group, Bielefeld University, Bielefeld, Germany
| | - Dominik Stallmann
- grid.7491.b0000 0001 0944 9128Machine Learning Group, Bielefeld University, Bielefeld, Germany
| | - Barbara Hammer
- grid.7491.b0000 0001 0944 9128Machine Learning Group, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
35
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
36
|
Xu Y, Vaziri-Pashkam M. Understanding transformation tolerant visual object representations in the human brain and convolutional neural networks. Neuroimage 2022; 263:119635. [PMID: 36116617 PMCID: PMC11283825 DOI: 10.1016/j.neuroimage.2022.119635] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Forming transformation-tolerant object representations is critical to high-level primate vision. Despite its significance, many details of tolerance in the human brain remain unknown. Likewise, despite the ability of convolutional neural networks (CNNs) to exhibit human-like object categorization performance, whether CNNs form tolerance similar to that of the human brain is unknown. Here we provide the first comprehensive documentation and comparison of three tolerance measures in the human brain and CNNs. We measured fMRI responses from human ventral visual areas to real-world objects across both Euclidean and non-Euclidean feature changes. In single fMRI voxels in higher visual areas, we observed robust object response rank-order preservation across feature changes. This is indicative of functional smoothness in tolerance at the fMRI meso-scale level that has never been reported before. At the voxel population level, we found highly consistent object representational structure across feature changes towards the end of ventral processing. Rank-order preservation, consistency, and a third tolerance measure, cross-decoding success (i.e., a linear classifier's ability to generalize performance across feature changes) showed an overall tight coupling. These tolerance measures were in general lower for Euclidean than non-Euclidean feature changes in lower visual areas, but increased over the course of ventral processing for all feature changes. These characteristics of tolerance, however, were absent in eight CNNs pretrained with ImageNet images with varying network architecture, depth, the presence/absence of recurrent processing, or whether a network was pretrained with the original or stylized ImageNet images that encouraged shape processing. CNNs do not appear to develop the same kind of tolerance as the human brain over the course of visual processing.
Collapse
Affiliation(s)
- Yaoda Xu
- Psychology Department, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
37
|
Beltzung B, Pelé M, Renoult JP, Shimada M, Sueur C. Using Artificial Intelligence to Analyze Non-Human Drawings: A First Step with Orangutan Productions. Animals (Basel) 2022; 12:2761. [PMID: 36290146 PMCID: PMC9597765 DOI: 10.3390/ani12202761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/06/2022] [Accepted: 10/11/2022] [Indexed: 11/07/2022] Open
Abstract
Drawings have been widely used as a window to the mind; as such, they can reveal some aspects of the cognitive and emotional worlds of other animals that can produce them. The study of non-human drawings, however, is limited by human perception, which can bias the methodology and interpretation of the results. Artificial intelligence can circumvent this issue by allowing automated, objective selection of features used to analyze drawings. In this study, we use artificial intelligence to investigate seasonal variations in drawings made by Molly, a female orangutan who produced more than 1299 drawings between 2006 and 2011 at the Tama Zoological Park in Japan. We train the VGG19 model to first classify the drawings according to the season in which they are produced. The results show that deep learning is able to identify subtle but significant seasonal variations in Molly's drawings, with a classification accuracy of 41.6%. We use VGG19 to investigate the features that influence this seasonal variation. We analyze separate features, both simple and complex, related to color and patterning, and to drawing content and style. Content and style classification show maximum performance for moderately complex, highly complex, and holistic features, respectively. We also show that both color and patterning drive seasonal variation, with the latter being more important than the former. This study demonstrates how deep learning can be used to objectively analyze non-figurative drawings and calls for applications to non-primate species and scribbles made by human toddlers.
Collapse
Affiliation(s)
- Benjamin Beltzung
- IPHC, University of Strasbourg, CNRS, UMR 7178, 67000 Strasbourg, France
| | - Marie Pelé
- ANTHROPO-LAB, ETHICS EA 7446, Université Catholique de Lille, 59000 Lille, France
| | - Julien P. Renoult
- CEFE, University of Montpellier, CNRS, EPHE, IRD, 34293 Montpellier, France
| | - Masaki Shimada
- Department of Animal Sciences, Teikyo University of Science, 2525, Yatsusawa, Uenohara 409-0193, Yamanashi, Japan
| | - Cédric Sueur
- IPHC, University of Strasbourg, CNRS, UMR 7178, 67000 Strasbourg, France
- University Institute of France, 75231 Paris, France
| |
Collapse
|
38
|
Utsumi A. A test of indirect grounding of abstract concepts using multimodal distributional semantics. Front Psychol 2022; 13:906181. [PMID: 36267060 PMCID: PMC9577286 DOI: 10.3389/fpsyg.2022.906181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
How are abstract concepts grounded in perceptual experiences for shaping human conceptual knowledge? Recent studies on abstract concepts emphasizing the role of language have argued that abstract concepts are grounded indirectly in perceptual experiences and language (or words) functions as a bridge between abstract concepts and perceptual experiences. However, this “indirect grounding” view remains largely speculative and has hardly been supported directly by empirical evidence. In this paper, therefore, we test the indirect grounding view by means of multimodal distributional semantics, in which the meaning of a word (i.e., a concept) is represented as the combination of textual and visual vectors. The newly devised multimodal distributional semantic model incorporates the indirect grounding view by computing the visual vector of an abstract word through the visual vectors of concrete words semantically related to that abstract word. An evaluation experiment is conducted in which conceptual representation is predicted from multimodal vectors using a multilayer feed-forward neural network. The analysis of prediction performance demonstrates that the indirect grounding model achieves significantly better performance in predicting human conceptual representation of abstract words than other models that mimic competing views on abstract concepts, especially than the direct grounding model in which the visual vectors of abstract words are computed directly from the images of abstract concepts. This result lends some plausibility to the indirect grounding view as a cognitive mechanism of grounding abstract concepts.
Collapse
|
39
|
Janini D, Hamblin C, Deza A, Konkle T. General object-based features account for letter perception. PLoS Comput Biol 2022; 18:e1010522. [PMID: 36155642 PMCID: PMC9536565 DOI: 10.1371/journal.pcbi.1010522] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 10/06/2022] [Accepted: 08/29/2022] [Indexed: 11/30/2022] Open
Abstract
After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or by reusing general visual features previously learned in service of object categorization? To explore this question, we first measured the perceptual similarity of letters in two behavioral tasks, visual search and letter categorization. Then, we trained deep convolutional neural networks on either 26-way letter categorization or 1000-way object categorization, as a way to operationalize possible specialized letter features and general object-based features, respectively. We found that the general object-based features more robustly correlated with the perceptual similarity of letters. We then operationalized additional forms of experience-dependent letter specialization by altering object-trained networks with varied forms of letter training; however, none of these forms of letter specialization improved the match to human behavior. Thus, our findings reveal that it is not necessary to appeal to specialized letter representations to account for perceptual similarity of letters. Instead, we argue that it is more likely that the perception of letters depends on domain-general visual features. For over a century, scientists have conducted behavioral experiments to investigate how the visual system recognizes letters, but it has proven difficult to propose a model of the feature space underlying this capacity. Here we leveraged recent advances in machine learning to model a wide variety of features ranging from specialized letter features to general object-based features. Across two large-scale behavioral experiments we find that general object-based features account well for letter perception, and that adding letter specialization did not improve the correspondence to human behavior. It is plausible that the ability to recognize letters largely relies on general visual features unaltered by letter learning.
Collapse
Affiliation(s)
- Daniel Janini
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| | - Chris Hamblin
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Arturo Deza
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
40
|
Gomez-Villa A, Martín A, Vazquez-Corral J, Bertalmío M, Malo J. On the synthesis of visual illusions using deep generative models. J Vis 2022; 22:2. [PMID: 35833884 PMCID: PMC9290318 DOI: 10.1167/jov.22.8.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Visual illusions expand our understanding of the visual system by imposing constraints in the models in two different ways: i) visual illusions for humans should induce equivalent illusions in the model, and ii) illusions synthesized from the model should be compelling for human viewers too. These constraints are alternative strategies to find good vision models. Following the first research strategy, recent studies have shown that artificial neural network architectures also have human-like illusory percepts when stimulated with classical hand-crafted stimuli designed to fool humans. In this work we focus on the second (less explored) strategy: we propose a framework to synthesize new visual illusions using the optimization abilities of current automatic differentiation techniques. The proposed framework can be used with classical vision models as well as with more recent artificial neural network architectures. This framework, validated by psychophysical experiments, can be used to study the difference between a vision model and the actual human perception and to optimize the vision model to decrease this difference.
Collapse
Affiliation(s)
- Alex Gomez-Villa
- Computer Vision Center, Universitat Autónoma de Barcelona, Barcelona, Spain.,
| | - Adrián Martín
- Department of Information and Communications Technologies, Universitat Pompeu Fabra, Barcelona, Spain.,
| | - Javier Vazquez-Corral
- Computer Science Department, Universitat Autónoma de Barcelona and Computer Vision Center, Barcelona, Spain.,
| | | | - Jesús Malo
- Image Processing Lab, Faculty of Physics, Universitat de Valéncia, Spain.,
| |
Collapse
|
41
|
Sp A. Trailblazers in Neuroscience: Using compositionality to understand how parts combine in whole objects. Eur J Neurosci 2022; 56:4378-4392. [PMID: 35760552 PMCID: PMC10084036 DOI: 10.1111/ejn.15746] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/27/2022]
Abstract
A fundamental question for any visual system is whether its image representation can be understood in terms of its components. Decomposing any image into components is challenging because there are many possible decompositions with no common dictionary, and enumerating them leads to a combinatorial explosion. Even in perception, many objects are readily seen as containing parts, but there are many exceptions. These exceptions include objects that are not perceived as containing parts, properties like symmetry that cannot be localized to any single part, and also special categories like words and faces whose perception is widely believed to be holistic. Here, I describe a novel approach we have used to address these issues and evaluate compositionality at the behavioral and neural levels. The key design principle is to create a large number of objects by combining a small number of pre-defined components in all possible ways. This allows for building component-based models that explain whole objects using a combination of these components. Importantly, any systematic error in model fits can be used to detect the presence of emergent or holistic properties. Using this approach, we have found that whole object representations are surprisingly predictable from their components, that some components are preferred to others in perception, and that emergent properties can be discovered or explained using compositional models. Thus, compositionality is a powerful approach for understanding how whole objects relate to their parts.
Collapse
Affiliation(s)
- Arun Sp
- Centre for Neuroscience, Indian Institute of Science Bangalore
| |
Collapse
|
42
|
Taylor AH, Bastos APM, Brown RL, Allen C. The signature-testing approach to mapping biological and artificial intelligences. Trends Cogn Sci 2022; 26:738-750. [PMID: 35773138 DOI: 10.1016/j.tics.2022.06.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 05/24/2022] [Accepted: 06/06/2022] [Indexed: 12/26/2022]
Abstract
Making inferences from behaviour to cognition is problematic due to a many-to-one mapping problem, in which any one behaviour can be generated by multiple possible cognitive processes. Attempts to cross this inferential gap when comparing human intelligence to that of animals or machines can generate great debate. Here, we discuss the challenges of making comparisons using 'success-testing' approaches and call attention to an alternate experimental framework, the 'signature-testing' approach. Signature testing places the search for information-processing errors, biases, and other patterns centre stage, rather than focussing predominantly on problem-solving success. We highlight current research on both biological and artificial intelligence that fits within this framework and is creating proactive research programs that make strong inferences about the similarities and differences between the content of human, animal, and machine minds.
Collapse
Affiliation(s)
- Alex H Taylor
- School of Psychology, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand.
| | - Amalia P M Bastos
- School of Psychology, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand; Department of Cognitive Science, University of California, San Diego, CA, USA
| | - Rachael L Brown
- School of Philosophy, Australian National University, Canberra, ACT 2600, Australia
| | - Colin Allen
- Department of History and Philosophy of Science, University of Pittsburgh, 1101 Cathedral of Learning, Pittsburgh, PA 15260, USA
| |
Collapse
|
43
|
|
44
|
Joint encoding of facial identity, orientation, gaze, and expression in the middle dorsal face area. Proc Natl Acad Sci U S A 2021; 118:2108283118. [PMID: 34385326 DOI: 10.1073/pnas.2108283118] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The last two decades have established that a network of face-selective areas in the temporal lobe of macaque monkeys supports the visual processing of faces. Each area within the network contains a large fraction of face-selective cells. And each area encodes facial identity and head orientation differently. A recent brain-imaging study discovered an area outside of this network selective for naturalistic facial motion, the middle dorsal (MD) face area. This finding offers the opportunity to determine whether coding principles revealed inside the core network would generalize to face areas outside the core network. We investigated the encoding of static faces and objects, facial identity, and head orientation, dimensions which had been studied in multiple areas of the core face-processing network before, as well as facial expressions and gaze. We found that MD populations form a face-selective cluster with a degree of selectivity comparable to that of areas in the core face-processing network. MD encodes facial identity robustly across changes in head orientation and expression, it encodes head orientation robustly against changes in identity and expression, and it encodes expression robustly across changes in identity and head orientation. These three dimensions are encoded in a separable manner. Furthermore, MD also encodes the direction of gaze in addition to head orientation. Thus, MD encodes both structural properties (identity) and changeable ones (expression and gaze) and thus provides information about another animal's direction of attention (head orientation and gaze). MD contains a heterogeneous population of cells that establish a multidimensional code for faces.
Collapse
|