1
|
Funke CM, Borowski J, Stosio K, Brendel W, Wallis TSA, Bethge M. Five points to check when comparing visual perception in humans and machines. J Vis 2021; 21:16. [PMID: 33724362 PMCID: PMC7980041 DOI: 10.1167/jov.21.3.16] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 12/02/2020] [Indexed: 11/24/2022] Open
Abstract
With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas through three case studies. The first case study shows how human bias can affect the interpretation of results and that several analytic tools can help to overcome this human reference point. In the second case study, we highlight the difference between necessary and sufficient mechanisms in visual reasoning tasks. Thereby, we show that contrary to previous suggestions, feedback mechanisms might not be necessary for the tasks in question. The third case study highlights the importance of aligning experimental conditions. We find that a previously observed difference in object recognition does not hold when adapting the experiment to make conditions more equitable between humans and machines. In presenting a checklist for comparative studies of visual reasoning in humans and machines, we hope to highlight how to overcome potential pitfalls in design and inference.
Collapse
Affiliation(s)
| | | | - Karolina Stosio
- University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen and Berlin, Germany
- Volkswagen Group Machine Learning Research Lab, Munich, Germany
| | - Wieland Brendel
- University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen and Berlin, Germany
- Werner Reichardt Centre for Integrative Neuroscience, Tübingen, Germany
| | - Thomas S A Wallis
- University of Tübingen, Tübingen, Germany
- Present address: Amazon.com, Tübingen
| | - Matthias Bethge
- University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, Tübingen and Berlin, Germany
- Werner Reichardt Centre for Integrative Neuroscience, Tübingen, Germany
| |
Collapse
|
2
|
Susi G, Antón-Toro LF, Maestú F, Pereda E, Mirasso C. nMNSD-A Spiking Neuron-Based Classifier That Combines Weight-Adjustment and Delay-Shift. Front Neurosci 2021; 15:582608. [PMID: 33679293 PMCID: PMC7933525 DOI: 10.3389/fnins.2021.582608] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 01/15/2021] [Indexed: 12/01/2022] Open
Abstract
The recent “multi-neuronal spike sequence detector” (MNSD) architecture integrates the weight- and delay-adjustment methods by combining heterosynaptic plasticity with the neurocomputational feature spike latency, representing a new opportunity to understand the mechanisms underlying biological learning. Unfortunately, the range of problems to which this topology can be applied is limited because of the low cardinality of the parallel spike trains that it can process, and the lack of a visualization mechanism to understand its internal operation. We present here the nMNSD structure, which is a generalization of the MNSD to any number of inputs. The mathematical framework of the structure is introduced, together with the “trapezoid method,” that is a reduced method to analyze the recognition mechanism operated by the nMNSD in response to a specific input parallel spike train. We apply the nMNSD to a classification problem previously faced with the classical MNSD from the same authors, showing the new possibilities the nMNSD opens, with associated improvement in classification performances. Finally, we benchmark the nMNSD on the classification of static inputs (MNIST database) obtaining state-of-the-art accuracies together with advantageous aspects in terms of time- and energy-efficiency if compared to similar classification methods.
Collapse
Affiliation(s)
- Gianluca Susi
- UPM-UCM Laboratory of Cognitive and Computational Neuroscience, Centro de Tecnologia Biomedica, Madrid, Spain.,Departamento de Psicología Experimental, Facultad de Psicología, Universidad Complutense de Madrid, Madrid, Spain.,Department of Civil Engineering and Computer Science, University of Rome "Tor Vergata", Rome, Italy
| | - Luis F Antón-Toro
- UPM-UCM Laboratory of Cognitive and Computational Neuroscience, Centro de Tecnologia Biomedica, Madrid, Spain.,Departamento de Psicología Experimental, Facultad de Psicología, Universidad Complutense de Madrid, Madrid, Spain
| | - Fernando Maestú
- UPM-UCM Laboratory of Cognitive and Computational Neuroscience, Centro de Tecnologia Biomedica, Madrid, Spain.,Departamento de Psicología Experimental, Facultad de Psicología, Universidad Complutense de Madrid, Madrid, Spain.,CIBER-BBN: Networking Research Center on Bioengineering, Biomaterials and Nanomedicine, Madrid, Spain
| | - Ernesto Pereda
- UPM-UCM Laboratory of Cognitive and Computational Neuroscience, Centro de Tecnologia Biomedica, Madrid, Spain.,Departamento de Ingeniería Industrial & IUNE & ITB. Universidad de La Laguna, Tenerife, Spain
| | - Claudio Mirasso
- Instituto de Física Interdisciplinar y Sistemas Complejos (IFISC, UIB-CSIC), Palma de Mallorca, Spain
| |
Collapse
|
3
|
Ben-Yosef G, Kreiman G, Ullman S. Minimal videos: Trade-off between spatial and temporal information in human and machine vision. Cognition 2020; 201:104263. [PMID: 32325309 PMCID: PMC7330814 DOI: 10.1016/j.cognition.2020.104263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 03/03/2020] [Accepted: 03/05/2020] [Indexed: 11/25/2022]
Abstract
Objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by identifying minimal videos: these are short and tiny video clips in which objects, parts, and actions can be reliably recognized, but any reduction in either space or time makes them unrecognizable. Human recognition in minimal videos is invariably accompanied by full interpretation of the internal components of the video. State-of-the-art deep convolutional networks for dynamic recognition cannot replicate human behavior in these configurations. The gap between human and machine vision demonstrated here is due to critical mechanisms for full spatiotemporal interpretation that are lacking in current computational models.
Collapse
Affiliation(s)
- Guy Ben-Yosef
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Gabriel Kreiman
- Children's Hospital, Harvard Medical School, Boston, MA 021155, USA; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Shimon Ullman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|