1
|
Jiao Y, Hang H, Merel J, Kanso E. Sensing flow gradients is necessary for learning autonomous underwater navigation. Nat Commun 2025; 16:3044. [PMID: 40155622 PMCID: PMC11953274 DOI: 10.1038/s41467-025-58125-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 03/11/2025] [Indexed: 04/01/2025] Open
Abstract
Aquatic animals are much better at underwater navigation than robotic vehicles. Robots face major challenges in deep water because of their limited access to global positioning signals and flow maps. These limitations, and the changing nature of water currents, support the use of reinforcement learning approaches, where the navigator learns through trial-and-error interactions with the flow environment. But is it feasible to learn underwater navigation in the agent's Umwelt, without any land references? Here, we tasked an artificial swimmer with learning to reach a specific destination in unsteady flows by relying solely on egocentric observations, collected through on-board flow sensors in the agent's body frame, with no reference to a geocentric inertial frame. We found that while sensing local flow velocities is sufficient for geocentric navigation, successful egocentric navigation requires additional information of local flow gradients. Importantly, egocentric navigation strategies obey rotational symmetry and are more robust in unfamiliar conditions and flows not experienced during training. Our work expands underwater robot-centric learning, helps explain why aquatic organisms have arrays of flow sensors that detect gradients, and provides physics-based guidelines for transfer learning of learned policies to unfamiliar and diverse flow environments.
Collapse
Affiliation(s)
- Yusheng Jiao
- Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA, USA
| | - Haotian Hang
- Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA, USA
| | | | - Eva Kanso
- Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Patel H, Garrido Portilla V, Shneidman AV, Movilli J, Alvarenga J, Dupré C, Aizenberg M, Murthy VN, Tropsha A, Aizenberg J. Design Principles From Natural Olfaction for Electronic Noses. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2412669. [PMID: 39835449 PMCID: PMC11948017 DOI: 10.1002/advs.202412669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/29/2024] [Indexed: 01/22/2025]
Abstract
Natural olfactory systems possess remarkable sensitivity and precision beyond what is currently achievable by engineered gas sensors. Unlike their artificial counterparts, noses are capable of distinguishing scents associated with mixtures of volatile molecules in complex, typically fluctuating environments and can adapt to changes. This perspective examines the multifaceted biological principles that provide olfactory systems their discriminatory prowess, and how these ideas can be ported to the design of electronic noses for substantial improvements in performance across metrics such as sensitivity and ability to speciate chemical mixtures. The topics examined herein include the fluid dynamics of odorants in natural channels; specificity and kinetics of odorant interactions with olfactory receptors and mucus linings; complex signal processing that spatiotemporally encodes physicochemical properties of odorants; active sampling techniques, like biological sniffing and nose repositioning; biological priming; and molecular chaperoning. Each of these components of natural olfactory systems are systmatically investigated, as to how they have been or can be applied to electronic noses. While not all artificial sensors can employ these strategies simultaneously, integrating a subset of bioinspired principles can address issues like sensitivity, drift, and poor selectivity, offering advancements in many sectors such as environmental monitoring, industrial safety, and disease diagnostics.
Collapse
Affiliation(s)
- Haritosh Patel
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
| | - Vicente Garrido Portilla
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
| | - Anna V. Shneidman
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
| | - Jacopo Movilli
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
- Department of Chemical SciencesUniversity of PadovaPadova35131Italy
| | - Jack Alvarenga
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
| | - Christophe Dupré
- Department of Molecular & Cellular BiologyHarvard UniversityCambridgeMA02138USA
| | - Michael Aizenberg
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
| | - Venkatesh N. Murthy
- Department of Molecular & Cellular BiologyHarvard UniversityCambridgeMA02138USA
- Center for Brain ScienceHarvard UniversityCambridgeMA02138USA
- Kempner InstituteHarvard UniversityBostonMA02134USA
| | - Alexander Tropsha
- Department of ChemistryThe University of North Carolina at Chapel HillChapel HillNC27516USA
| | - Joanna Aizenberg
- Harvard John A. Paulson School of Engineering and Applied SciencesHarvard UniversityBostonMA02134USA
- Department of Chemistry and Chemical BiologyHarvard UniversityCambridgeMA02138USA
| |
Collapse
|
3
|
Siliciano AF, Minni S, Morton C, Dowell CK, Eghbali NB, Rhee JY, Abbott L, Ruta V. A vector-based strategy for olfactory navigation in Drosophila. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.15.638426. [PMID: 39990408 PMCID: PMC11844514 DOI: 10.1101/2025.02.15.638426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Odors serve as essential cues for navigation. Although tracking an odor plume has been modeled as a reflexive process, it remains unclear whether animals can use memories of their past odor encounters to infer the spatial structure of their chemical environment or their location within it. Here we developed a virtual-reality olfactory paradigm that allows head-fixed Drosophila to navigate structured chemical landscapes, offering insight into how memory mechanisms shape their navigational strategies. We found that flies track an appetitive odor corridor by following its boundary, alternating between rapid counterturns to exit the plume and directed returns to its edge. Using a combination of behavioral modeling, functional calcium imaging, and neural perturbations, we demonstrate that this 'edge-tracking' strategy relies on vector-based computations within the Drosophila central complex in which flies store and dynamically update memories of the direction to return them to the plume's boundary. Consistent with this, we find that FC2 neurons within the fan-shaped body, which encode a fly's navigational goal, signal the direction back to the odor boundary when flies are outside the plume. Together, our studies suggest that flies leverage the plume's boundary as a dynamic landmark to guide their navigation, analogous to the memory-based strategies other insects use for long-distance migration or homing to their nests. Plume tracking thus uses components of a conserved navigational toolkit, enabling flies to use memory mechanisms to navigate through a complex shifting chemical landscape.
Collapse
Affiliation(s)
- Andrew F. Siliciano
- These authors contributed equally to this work
- Laboratory of Neurophysiology and Behavior and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - Sun Minni
- These authors contributed equally to this work
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Kavli Institute for Brain Science, Department of Neuroscience, Columbia University, New York, NY, USA
| | - Chad Morton
- These authors contributed equally to this work
- Laboratory of Neurophysiology and Behavior and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - Charles K. Dowell
- Laboratory of Neurophysiology and Behavior and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - Noelle B. Eghbali
- Laboratory of Neurophysiology and Behavior and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - Juliana Y. Rhee
- Laboratory of Neurophysiology and Behavior and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| | - L.F. Abbott
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Kavli Institute for Brain Science, Department of Neuroscience, Columbia University, New York, NY, USA
| | - Vanessa Ruta
- Laboratory of Neurophysiology and Behavior and Howard Hughes Medical Institute, The Rockefeller University, New York, NY, USA
| |
Collapse
|
4
|
Abe ETT, Brunton BW. TiDHy: Timescale Demixing via Hypernetworks to learn simultaneous dynamics from mixed observations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.28.635316. [PMID: 39974964 PMCID: PMC11838317 DOI: 10.1101/2025.01.28.635316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Neural activity and behavior arise from multiple concurrent time-varying systems, including neuromodulation, neural state, and history; however, most current approaches model these data as one set of dynamics with a single timescale. Here we develop Timescale Demixing via Hypernetworks (TiDHy) as a new computational method to model spatiotemporal data, decomposing them into multiple simultaneous latent dynamical systems that may span orders-of-magnitude different timescales. Specifically, we train a hypernetwork to dynamically reweigh linear combinations of latent dynamics. This approach enables accurate data reconstruction, converges to true latent dynamics, and captures multiple timescales of variation. We first demonstrate that TiDHy can demix dynamics and timescales from synthetic data comprising multiple independent switching linear dynamical systems, even when the observations are mixed. Next, with a simulated locomotion behavior dataset, we show that TiDHy accurately captures both the fast dynamics of movement kinematics and the slow dynamics of changing terrains. Finally, in an open-source multi-animal social behavior dataset, we show that the keypoint trajectory dynamics extracted with TiDHy can be used to accurately identify social behaviors of multiple mice. Taken together, TiDHy is a powerful new algorithm for demixing simultaneous latent dynamical systems with applications to diverse computational domains.
Collapse
Affiliation(s)
- Elliott T. T. Abe
- Biology Department, University of Washington, Seattle, Washington, USA
- eScience Institute, University of Washington, Seattle, Washington, USA
- Computational Neuroscience Center, University of Washington, Seattle, Washington, USA
| | - Bingni W. Brunton
- Biology Department, University of Washington, Seattle, Washington, USA
- eScience Institute, University of Washington, Seattle, Washington, USA
- Computational Neuroscience Center, University of Washington, Seattle, Washington, USA
| |
Collapse
|
5
|
Rando M, James M, Verri A, Rosasco L, Seminara A. Q-learning with temporal memory to navigate turbulence. ARXIV 2025:arXiv:2404.17495v2. [PMID: 38711433 PMCID: PMC11071615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
Collapse
Affiliation(s)
- Marco Rando
- MaLGa, Department of computer science, bioengineering, robotics and systems engineering, University of Genova, Genova, Italy
| | - Martin James
- MalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genoa, Italy
| | - Alessandro Verri
- MaLGa, Department of computer science, bioengineering, robotics and systems engineering, University of Genova, Genova, Italy
| | - Lorenzo Rosasco
- MaLGa, Department of computer science, bioengineering, robotics and systems engineering, University of Genova, Genova, Italy
| | - Agnese Seminara
- MalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genoa, Italy
| |
Collapse
|
6
|
Sun X, Mangan M, Peng J, Yue S. I2Bot: an open-source tool for multi-modal and embodied simulation of insect navigation. J R Soc Interface 2025; 22:20240586. [PMID: 39837486 PMCID: PMC11750368 DOI: 10.1098/rsif.2024.0586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 10/16/2024] [Accepted: 11/18/2024] [Indexed: 01/23/2025] Open
Abstract
Achieving a comprehensive understanding of animal intelligence demands an integrative approach that acknowledges the interplay between an organism's brain, body and environment. Insects, despite their limited computational resources, demonstrate remarkable abilities in navigation. Existing computational models often fall short in faithfully replicating the morphology of real insects and their interactions with the environment, hindering validation and practical application in robotics. To address these gaps, we present I2Bot, a novel simulation tool based on the morphological characteristics of real insects. This tool empowers robotic models with dynamic sensory capabilities, realistic modelling of insect morphology, physical dynamics and sensory capacity. By integrating gait controllers and computational models into I2Bot, we have implemented classical embodied navigation behaviours and revealed some fundamental navigation principles. By open-sourcing I2Bot, we aim to accelerate the understanding of insect intelligence and foster advances in the development of autonomous robotic systems.
Collapse
Affiliation(s)
- Xuelong Sun
- Machine Life and Intelligence Research Center, Guangzhou University, Guangzhou, People’s Republic of China
- School of Mathematics and Information Science, Guangzhou University, Guangzhou, People’s Republic of China
| | - Michael Mangan
- Department of Computer Science, Sheffield Robotics, University of Sheffield, Sheffield, UK
| | - Jigen Peng
- Machine Life and Intelligence Research Center, Guangzhou University, Guangzhou, People’s Republic of China
- School of Mathematics and Information Science, Guangzhou University, Guangzhou, People’s Republic of China
| | - Shigang Yue
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
| |
Collapse
|
7
|
Basu J, Nagel K. Neural circuits for goal-directed navigation across species. Trends Neurosci 2024; 47:904-917. [PMID: 39393938 PMCID: PMC11563880 DOI: 10.1016/j.tins.2024.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 08/26/2024] [Accepted: 09/17/2024] [Indexed: 10/13/2024]
Abstract
Across species, navigation is crucial for finding both resources and shelter. In vertebrates, the hippocampus supports memory-guided goal-directed navigation, whereas in arthropods the central complex supports similar functions. A growing literature is revealing similarities and differences in the organization and function of these brain regions. We review current knowledge about how each structure supports goal-directed navigation by building internal representations of the position or orientation of an animal in space, and of the location or direction of potential goals. We describe input pathways to each structure - medial and lateral entorhinal cortex in vertebrates, and columnar and tangential neurons in insects - that primarily encode spatial and non-spatial information, respectively. Finally, we highlight similarities and differences in spatial encoding across clades and suggest experimental approaches to compare coding principles and behavioral capabilities across species. Such a comparative approach can provide new insights into the neural basis of spatial navigation and neural computation.
Collapse
Affiliation(s)
- Jayeeta Basu
- Neuroscience Institute, New York University Langone Health, New York, NY 10016, USA; Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA; Center for Neural Science, New York University, New York, NY 10003, USA.
| | - Katherine Nagel
- Neuroscience Institute, New York University Langone Health, New York, NY 10016, USA; Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016, USA; Center for Neural Science, New York University, New York, NY 10003, USA.
| |
Collapse
|
8
|
Sunil A, Pedroncini O, Schaefer AT, Ackels T. How do mammals convert dynamic odor information into neural maps for landscape navigation? PLoS Biol 2024; 22:e3002908. [PMID: 39571004 PMCID: PMC11581409 DOI: 10.1371/journal.pbio.3002908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2024] Open
Abstract
Odors are transported by seemingly chaotic plumes, whose spatiotemporal structure contains rich information about space, with olfaction serving as a gateway for obtaining and processing this spatial information. Beyond tracking odors, olfaction provides localization and chemical communication cues for detecting conspecifics and predators, and linking external environments to internal cognitive maps. In this Essay, we discuss recent physiological, behavioral, and methodological advancements in mammalian olfactory research to present our current understanding of how olfaction can be used to navigate the environment. We also examine potential neural mechanisms that might convert dynamic olfactory inputs into environmental maps along this axis. Finally, we consider technological applications of odor dynamics for developing bio-inspired sensor technologies, robotics, and computational models. By shedding light on the principles underlying the processing of odor dynamics, olfactory research will pave the way for innovative solutions that bridge the gap between biology and technology, enriching our understanding of the natural world.
Collapse
Affiliation(s)
- Anantu Sunil
- Sensory Dynamics and Behaviour Lab, Institute of Experimental Epileptology and Cognition Research, University of Bonn Medical Center, Bonn, Germany
| | - Olivia Pedroncini
- Sensory Circuits and Neurotechnology Laboratory, Francis Crick Institute, London, United Kingdom
| | - Andreas T. Schaefer
- Sensory Circuits and Neurotechnology Laboratory, Francis Crick Institute, London, United Kingdom
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
| | - Tobias Ackels
- Sensory Dynamics and Behaviour Lab, Institute of Experimental Epileptology and Cognition Research, University of Bonn Medical Center, Bonn, Germany
| |
Collapse
|
9
|
Stupski SD, van Breugel F. Wind gates olfaction-driven search states in free flight. Curr Biol 2024; 34:4397-4411.e6. [PMID: 39067453 PMCID: PMC11461137 DOI: 10.1016/j.cub.2024.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/08/2024] [Accepted: 07/01/2024] [Indexed: 07/30/2024]
Abstract
For organisms tracking a chemical cue to its source, the motion of their surrounding fluid provides crucial information for success. Swimming and flying animals engaged in olfaction-driven search often start by turning into the direction of an oncoming wind or water current. However, it is unclear how organisms adjust their strategies when directional cues are absent or unreliable, as is often the case in nature. Here, we use the genetic toolkit of Drosophila melanogaster to develop an optogenetic paradigm to deliver temporally precise "virtual" olfactory experiences for free-flying animals in either laminar wind or still air. We first confirm that in laminar wind flies turn upwind. Furthermore, we show that they achieve this using a rapid (∼100 ms) turn, implying that flies estimate the ambient wind direction prior to "surging" upwind. In still air, flies adopt a remarkably stereotyped "sink and circle" search state characterized by ∼60° turns at 3-4 Hz, biased in a consistent direction. Together, our results show that Drosophila melanogaster assesses the presence and direction of ambient wind prior to deploying a distinct search strategy. In both laminar wind and still air, immediately after odor onset, flies decelerate and often perform a rapid turn. Both maneuvers are consistent with predictions from recent control theoretic analyses for how insects may estimate properties of wind while in flight. We suggest that flies may use their deceleration and "anemometric" turn as active sensing maneuvers to rapidly gauge properties of their wind environment before initiating a proximal or upwind search routine.
Collapse
Affiliation(s)
- S David Stupski
- Integrative Neuroscience Program, University of Nevada, Reno, 1664 N. Virginia St., Reno, NV 89557, USA; Ecology Evolution and Conservation Biology Program, University of Nevada, Reno, 1664 N. Virginia St., Reno, NV 89557, USA; Department of Mechanical Engineering, University of Nevada, Reno, 1664 N. Virginia St., Reno, NV 89557, USA
| | - Floris van Breugel
- Integrative Neuroscience Program, University of Nevada, Reno, 1664 N. Virginia St., Reno, NV 89557, USA; Ecology Evolution and Conservation Biology Program, University of Nevada, Reno, 1664 N. Virginia St., Reno, NV 89557, USA; Department of Mechanical Engineering, University of Nevada, Reno, 1664 N. Virginia St., Reno, NV 89557, USA.
| |
Collapse
|
10
|
Gunnarson P, Dabiri JO. Fish-inspired tracking of underwater turbulent plumes. BIOINSPIRATION & BIOMIMETICS 2024; 19:056024. [PMID: 39163889 DOI: 10.1088/1748-3190/ad7181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 08/20/2024] [Indexed: 08/22/2024]
Abstract
Autonomous ocean-exploring vehicles have begun to take advantage of onboard sensor measurements of water properties such as salinity and temperature to locate oceanic features in real time. Such targeted sampling strategies enable more rapid study of ocean environments by actively steering towards areas of high scientific value. Inspired by the ability of aquatic animals to navigate via flow sensing, this work investigates hydrodynamic cues for accomplishing targeted sampling using a palm-sized robotic swimmer. As proof-of-concept analogy for tracking hydrothermal vent plumes in the ocean, the robot is tasked with locating the center of turbulent jet flows in a 13,000-liter water tank using data from onboard pressure sensors. To learn a navigation strategy, we first implemented RL on a simulated version of the robot navigating in proximity to turbulent jets. After training, the RL algorithm discovered an effective strategy for locating the jets by following transverse velocity gradients sensed by pressure sensors located on opposite sides of the robot. When implemented on the physical robot, this gradient following strategy enabled the robot to successfully locate the turbulent plumes at more than twice the rate of random searching. Additionally, we found that navigation performance improved as the distance between the pressure sensors increased, which can inform the design of distributed flow sensors in ocean robots. Our results demonstrate the effectiveness and limits of flow-based navigation for autonomously locating hydrodynamic features of interest.
Collapse
Affiliation(s)
- Peter Gunnarson
- Graduate Aerospace Laboratories, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, United States of America
| | - John O Dabiri
- Graduate Aerospace Laboratories, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, United States of America
- Mechanical and Civil Engineering, California Institute of Technology, 1200 E California Blvd, Pasadena, CA 91125, United States of America
| |
Collapse
|
11
|
Boccardo F, Pierre-Louis O. Reinforcement learning with thermal fluctuations at the nanoscale. Phys Rev E 2024; 110:L023301. [PMID: 39294981 DOI: 10.1103/physreve.110.l023301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 08/06/2024] [Indexed: 09/21/2024]
Abstract
Reinforcement Learning offers a framework to learn to choose actions in order to control a system. However, at small scales Brownian fluctuations limit the control of nanomachine actuation or nanonavigation and of the molecular machinery of life. We analyze this regime using the general framework of Markov decision processes. We show that at the nanoscale, while optimal control actions should bring an improvement proportional to the small ratio of the applied force times a length scale over the temperature, the learned improvement is smaller and proportional to the square of this small ratio. Consequently, the efficiency of learning, which compares the learning improvement to the theoretical optimal improvement, drops to zero. Nevertheless, these limitations can be circumvented by using actions learned at a lower temperature. These results are illustrated with simulations of the control of the shape of small particle clusters.
Collapse
|
12
|
Aldarondo D, Merel J, Marshall JD, Hasenclever L, Klibaite U, Gellis A, Tassa Y, Wayne G, Botvinick M, Ölveczky BP. A virtual rodent predicts the structure of neural activity across behaviours. Nature 2024; 632:594-602. [PMID: 38862024 PMCID: PMC12080270 DOI: 10.1038/s41586-024-07633-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 05/30/2024] [Indexed: 06/13/2024]
Abstract
Animals have exquisite control of their bodies, allowing them to perform a diverse range of behaviours. How such control is implemented by the brain, however, remains unclear. Advancing our understanding requires models that can relate principles of control to the structure of neural activity in behaving animals. Here, to facilitate this, we built a 'virtual rodent', in which an artificial neural network actuates a biomechanically realistic model of the rat1 in a physics simulator2. We used deep reinforcement learning3-5 to train the virtual agent to imitate the behaviour of freely moving rats, thus allowing us to compare neural activity recorded in real rats to the network activity of a virtual rodent mimicking their behaviour. We found that neural activity in the sensorimotor striatum and motor cortex was better predicted by the virtual rodent's network activity than by any features of the real rat's movements, consistent with both regions implementing inverse dynamics6. Furthermore, the network's latent variability predicted the structure of neural variability across behaviours and afforded robustness in a way consistent with the minimal intervention principle of optimal feedback control7. These results demonstrate how physical simulation of biomechanically realistic virtual animals can help interpret the structure of neural activity across behaviour and relate it to theoretical principles of motor control.
Collapse
Affiliation(s)
- Diego Aldarondo
- Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Fauna Robotics, New York, NY, USA.
| | - Josh Merel
- DeepMind, Google, London, UK
- Fauna Robotics, New York, NY, USA
| | - Jesse D Marshall
- Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, MA, USA
- Reality Labs, Meta, New York, NY, USA
| | | | - Ugne Klibaite
- Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Amanda Gellis
- Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | | | | | - Matthew Botvinick
- DeepMind, Google, London, UK
- Gatsby Computational Neuroscience Unit, University College London, London, UK
| | - Bence P Ölveczky
- Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
13
|
Zhang R, Pitkow X, Angelaki DE. Inductive biases of neural network modularity in spatial navigation. SCIENCE ADVANCES 2024; 10:eadk1256. [PMID: 39028809 PMCID: PMC11259174 DOI: 10.1126/sciadv.adk1256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 06/14/2024] [Indexed: 07/21/2024]
Abstract
The brain may have evolved a modular architecture for daily tasks, with circuits featuring functionally specialized modules that match the task structure. We hypothesize that this architecture enables better learning and generalization than architectures with less specialized modules. To test this, we trained reinforcement learning agents with various neural architectures on a naturalistic navigation task. We found that the modular agent, with an architecture that segregates computations of state representation, value, and action into specialized modules, achieved better learning and generalization. Its learned state representation combines prediction and observation, weighted by their relative uncertainty, akin to recursive Bayesian estimation. This agent's behavior also resembles macaques' behavior more closely. Our results shed light on the possible rationale for the brain's modularity and suggest that artificial systems can use this insight from neuroscience to improve learning and generalization in natural tasks.
Collapse
Affiliation(s)
- Ruiyi Zhang
- Tandon School of Engineering, New York University, New York, NY, USA
| | - Xaq Pitkow
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA
| | - Dora E. Angelaki
- Tandon School of Engineering, New York University, New York, NY, USA
- Center for Neural Science, New York University, New York, NY, USA
| |
Collapse
|
14
|
Stupski SD, van Breugel F. Wind Gates Olfaction Driven Search States in Free Flight. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.30.569086. [PMID: 38076971 PMCID: PMC10705368 DOI: 10.1101/2023.11.30.569086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
For organisms tracking a chemical cue to its source, the motion of their surrounding fluid provides crucial information for success. Swimming and flying animals engaged in olfaction driven search often start by turning into the direction of an oncoming wind or water current. However, it is unclear how organisms adjust their strategies when directional cues are absent or unreliable, as is often the case in nature. Here, we use the genetic toolkit of Drosophila melanogaster to develop an optogenetic paradigm to deliver temporally precise "virtual" olfactory experiences for free-flying animals in either laminar wind or still air. We first confirm that in laminar wind flies turn upwind. Furthermore, we show that they achieve this using a rapid (∼100 ms) turn, implying that flies estimate the ambient wind direction prior to "surging" upwind. In still air, flies adopt remarkably stereotyped "sink and circle" search state characterized by ∼60°turns at 3-4 Hz, biased in a consistent direction. Together, our results show that Drosophila melanogaster assess the presence and direction of ambient wind prior to deploying a distinct search strategy. In both laminar wind and still air, immediately after odor onset, flies decelerate and often perform a rapid turn. Both maneuvers are consistent with predictions from recent control theoretic analyses for how insects may estimate properties of wind while in flight. We suggest that flies may use their deceleration and "anemometric" turn as active sensing maneuvers to rapidly gauge properties of their wind environment before initiating a proximal or upwind search routine.
Collapse
|
15
|
Alonso A, Kirkegaard JB. Learning optimal integration of spatial and temporal information in noisy chemotaxis. PNAS NEXUS 2024; 3:pgae235. [PMID: 38952456 PMCID: PMC11216223 DOI: 10.1093/pnasnexus/pgae235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/06/2024] [Indexed: 07/03/2024]
Abstract
We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a nontrivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.
Collapse
Affiliation(s)
- Albert Alonso
- Niels Bohr Institute, University of Copenhagen, Copenhagen 2100, Denmark
| | - Julius B Kirkegaard
- Niels Bohr Institute, University of Copenhagen, Copenhagen 2100, Denmark
- Department of Computer Science, University of Copenhagen, Copenhagen 2100, Denmark
| |
Collapse
|
16
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
17
|
Loisy A, Heinonen RA. Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2023; 46:17. [PMID: 36939979 DOI: 10.1140/epje/s10189-023-00277-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/05/2023] [Indexed: 06/18/2023]
Abstract
The olfactory search POMDP (partially observable Markov decision process) is a sequential decision-making problem designed to mimic the task faced by insects searching for a source of odor in turbulence, and its solutions have applications to sniffer robots. As exact solutions are out of reach, the challenge consists in finding the best possible approximate solutions while keeping the computational cost reasonable. We provide a quantitative benchmarking of a solver based on deep reinforcement learning against traditional POMDP approximate solvers. We show that deep reinforcement learning is a competitive alternative to standard methods, in particular to generate lightweight policies suitable for robots.
Collapse
Affiliation(s)
- Aurore Loisy
- Aix Marseille Univ, CNRS, Centrale Marseille, IRPHE, Marseille, France.
| | - Robin A Heinonen
- Department of Physics and INFN, University of Rome "Tor Vergata", Via della Ricerca Scientifica 1, 00133, Rome, Italy.
| |
Collapse
|