1
|
Scott KJ, Speers LJ, Bilkey DK. Maternal immune activation alters bout structure of rat 50-kHz ultrasonic vocalizations. Behav Brain Res 2025; 488:115596. [PMID: 40252701 DOI: 10.1016/j.bbr.2025.115596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 04/02/2025] [Accepted: 04/15/2025] [Indexed: 04/21/2025]
Abstract
Dysfunctional sequencing of behaviour and cognition is observed in schizophrenia across multiple domains, including during communication. We examined whether maternal immune activation (MIA), a risk factor for schizophrenia, disrupted the sequential organization of ultrasonic vocalizations (USVs) in a rat model. We analysed the structure of bursts of 50-kHz USVs (bouts) in two independent datasets (paired-rat: 19 control, 18 MIA; reward paradigm: 18 control, 20 MIA), using a Damerau-Levenshtein analysis with a k-fold cross-validation procedure. MIA animals showed greater variability in their bout sequences in both datasets, with lower Levenshtein similarity index (LSI) scores compared to control animals. Notably, MIA set median sequences were more similar to control bout sequences than to their own group's sequences, suggesting a breakdown in sequential organization. Additionally, we found an alteration to 50-kHz USV transitional preferences in MIA in a reward context. While sequence structure was altered, basic call production and call-type distribution remained largely intact across groups. These findings demonstrate that MIA specifically appears to affect the organization of vocal sequences at the bout level, while largely preserving basic vocalization patterns. This work extends our understanding of the effects of maternal infection during pregnancy, and how this can lead to altered communication sequences that are relevant to schizophrenia risk.
Collapse
Affiliation(s)
- K Jack Scott
- Department of Psychology, University of Otago, New Zealand
| | - Lucinda J Speers
- Department of Psychology, University of Otago, New Zealand; Grenoble Institut des Neurosciences, Inserm, France
| | - David K Bilkey
- Department of Psychology, University of Otago, New Zealand.
| |
Collapse
|
2
|
Vengrovski G, Hulsey-Vincent MR, Bemrose MA, Gardner TJ. TweetyBERT: Automated parsing of birdsong through self-supervised machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.09.648029. [PMID: 40291648 PMCID: PMC12027336 DOI: 10.1101/2025.04.09.648029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Deep neural networks can be trained to parse animal vocalizations - serving to identify the units of communication, and annotating sequences of vocalizations for subsequent statistical analysis. However, current methods rely on human labelled data for training. The challenge of parsing animal vocalizations in a fully unsupervised manner remains an open problem. Addressing this challenge, we introduce TweetyBERT, a self-supervised transformer neural network developed for analysis of birdsong. The model is trained to predict masked or hidden fragments of audio, but is not exposed to human supervision or labels. Applied to canary song, TweetyBERT autonomously learns the behavioral units of song such as notes, syllables, and phrases - capturing intricate acoustic and temporal patterns. This approach of developing self-supervised models specifically tailored to animal communication will significantly accelerate the analysis of unlabeled vocal data.
Collapse
|
3
|
Koch TMI, Marks ES, Roberts TF. AVN: A Deep Learning Approach for the Analysis of Birdsong. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593561. [PMID: 39229184 PMCID: PMC11370480 DOI: 10.1101/2024.05.10.593561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Deep learning tools for behavior analysis have enabled important new insights and discoveries in neuroscience. Yet, they often compromise interpretability and generalizability for performance, making it difficult to quantitively compare phenotypes across datasets and research groups. We developed a novel deep learning-based behavior analysis pipeline, Avian Vocalization Network (AVN), for the learned vocalizations of the most extensively studied vocal learning model species - the zebra finch. AVN annotates songs with high accuracy across multiple animal colonies without the need for any additional training data and generates a comprehensive set of interpretable features to describe the syntax, timing, and acoustic properties of song. We use this feature set to compare song phenotypes across multiple research groups and experiments, and to predict a bird's stage in song development. Additionally, we have developed a novel method to measure song imitation that requires no additional training data for new comparisons or recording environments, and outperforms existing similarity scoring methods in its sensitivity and agreement with expert human judgements of song similarity. These tools are available through the open-source AVN python package and graphical application, which makes them accessible to researchers without any prior coding experience. Altogether, this behavior analysis toolkit stands to facilitate and accelerate the study of vocal behavior by enabling a standardized mapping of phenotypes and learning outcomes, thus helping scientists better link behavior to the underlying neural processes.
Collapse
Affiliation(s)
- Therese M I Koch
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| | - Ethan S Marks
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| | - Todd F Roberts
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| |
Collapse
|
4
|
Provost KL, Yang J, Carstens BC. The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics. PLoS One 2022; 17:e0278522. [PMID: 36477744 PMCID: PMC9728902 DOI: 10.1371/journal.pone.0278522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
Collapse
Affiliation(s)
- Kaiya L. Provost
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| | - Jiaying Yang
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| | - Bryan C. Carstens
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
5
|
Cohen Y, Engel TA, Langdon C, Lindsay GW, Ott T, Peters MAK, Shine JM, Breton-Provencher V, Ramaswamy S. Recent Advances at the Interface of Neuroscience and Artificial Neural Networks. J Neurosci 2022; 42:8514-8523. [PMID: 36351830 PMCID: PMC9665920 DOI: 10.1523/jneurosci.1503-22.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/30/2022] [Accepted: 10/03/2022] [Indexed: 11/17/2022] Open
Abstract
Biological neural networks adapt and learn in diverse behavioral contexts. Artificial neural networks (ANNs) have exploited biological properties to solve complex problems. However, despite their effectiveness for specific tasks, ANNs are yet to realize the flexibility and adaptability of biological cognition. This review highlights recent advances in computational and experimental research to advance our understanding of biological and artificial intelligence. In particular, we discuss critical mechanisms from the cellular, systems, and cognitive neuroscience fields that have contributed to refining the architecture and training algorithms of ANNs. Additionally, we discuss how recent work used ANNs to understand complex neuronal correlates of cognition and to process high throughput behavioral data.
Collapse
Affiliation(s)
- Yarden Cohen
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Tatiana A Engel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY 11724
| | | | - Grace W Lindsay
- Department of Psychology, Center for Data Science, New York University, New York, NY 10003
| | - Torben Ott
- Bernstein Center for Computational Neuroscience Berlin, Institute of Biology, Humboldt University of Berlin, 10117, Berlin, Germany
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California-Irvine, Irvine, CA 92697
| | - James M Shine
- Brain and Mind Centre, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Srikanth Ramaswamy
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, United Kingdom
| |
Collapse
|
6
|
Sahu PK, Campbell KA, Oprea A, Phillmore LS, Sturdy CB. Comparing methodologies for classification of zebra finch distance calls. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3305. [PMID: 35649952 DOI: 10.1121/10.0011401] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 05/04/2022] [Indexed: 06/15/2023]
Abstract
Bioacoustic analysis has been used for a variety of purposes including classifying vocalizations for biodiversity monitoring and understanding mechanisms of cognitive processes. A wide range of statistical methods, including various automated methods, have been used to successfully classify vocalizations based on species, sex, geography, and individual. A comprehensive approach focusing on identifying acoustic features putatively involved in classification is required for the prediction of features necessary for discrimination in the real world. Here, we used several classification techniques, namely discriminant function analyses (DFAs), support vector machines (SVMs), and artificial neural networks (ANNs), for sex-based classification of zebra finch (Taeniopygia guttata) distance calls using acoustic features measured from spectrograms. We found that all three methods (DFAs, SVMs, and ANNs) correctly classified the calls to respective sex-based categories with high accuracy between 92 and 96%. Frequency modulation of ascending frequency, total duration, and end frequency of the distance call were the most predictive features underlying this classification in all of our models. Our results corroborate evidence of the importance of total call duration and frequency modulation in the classification of male and female distance calls. Moreover, we provide a methodological approach for bioacoustic classification problems using multiple statistical analyses.
Collapse
Affiliation(s)
- Prateek K Sahu
- Department of Psychology, University of Alberta, Edmonton, Alberta T6G 2R3, Canada
| | - Kimberley A Campbell
- Department of Psychology, University of Alberta, Edmonton, Alberta T6G 2R3, Canada
| | - Alexandra Oprea
- Department of Psychology and Neuroscience, Dalhousie University, Halifax, Nova Scotia B3H 4R2, Canada
| | - Leslie S Phillmore
- Department of Psychology and Neuroscience, Dalhousie University, Halifax, Nova Scotia B3H 4R2, Canada
| | - Christopher B Sturdy
- Department of Psychology, University of Alberta, Edmonton, Alberta T6G 2R3, Canada
| |
Collapse
|
7
|
Linhart P, Mahamoud-Issa M, Stowell D, Blumstein DT. The potential for acoustic individual identification in mammals. Mamm Biol 2022. [DOI: 10.1007/s42991-021-00222-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
8
|
Stowell D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 2022; 10:e13152. [PMID: 35341043 PMCID: PMC8944344 DOI: 10.7717/peerj.13152] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/01/2022] [Indexed: 01/20/2023] Open
Abstract
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain valuable evidence about animal behaviours, populations and ecosystems. They are studied in bioacoustics and ecoacoustics, with signal processing and analysis an important component. Computational bioacoustics has accelerated in recent decades due to the growth of affordable digital sound recording devices, and to huge progress in informatics such as big data, signal processing and machine learning. Methods are inherited from the wider field of deep learning, including speech and image processing. However, the tasks, demands and data characteristics are often different from those addressed in speech or music analysis. There remain unsolved problems, and tasks for which evidence is surely present in many acoustic signals, but not yet realised. In this paper I perform a review of the state of the art in deep learning for computational bioacoustics, aiming to clarify key concepts and identify and analyse knowledge gaps. Based on this, I offer a subjective but principled roadmap for computational bioacoustics with deep learning: topics that the community should aim to address, in order to make the most of future developments in AI and informatics, and to use audio data in answering zoological and ecological questions.
Collapse
Affiliation(s)
- Dan Stowell
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands,Naturalis Biodiversity Center, Leiden, The Netherlands
| |
Collapse
|
9
|
Cohen Y, Nicholson DA, Sanchioni A, Mallaber EK, Skidanova V, Gardner TJ. Automated annotation of birdsong with a neural network that segments spectrograms. eLife 2022; 11:63853. [PMID: 35050849 PMCID: PMC8860439 DOI: 10.7554/elife.63853] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/19/2022] [Indexed: 11/13/2022] Open
Abstract
Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.
Collapse
Affiliation(s)
- Yarden Cohen
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, Israel
| | | | - Alexa Sanchioni
- Department of Biology, Boston University, Boston, United States
| | | | | | - Timothy J Gardner
- Phil and Penny Knight Campus for Accelerating Scientific Impact, University of Oregon, Eugene, United States
| |
Collapse
|
10
|
Steinfath E, Palacios-Muñoz A, Rottschäfer JR, Yuezak D, Clemens J. Fast and accurate annotation of acoustic signals with deep neural networks. eLife 2021; 10:e68837. [PMID: 34723794 PMCID: PMC8560090 DOI: 10.7554/elife.68837] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 10/04/2021] [Indexed: 01/06/2023] Open
Abstract
Acoustic signals serve communication within and across species throughout the animal kingdom. Studying the genetics, evolution, and neurobiology of acoustic communication requires annotating acoustic signals: segmenting and identifying individual acoustic elements like syllables or sound pulses. To be useful, annotations need to be accurate, robust to noise, and fast. We here introduce DeepAudioSegmenter (DAS), a method that annotates acoustic signals across species based on a deep-learning derived hierarchical presentation of sound. We demonstrate the accuracy, robustness, and speed of DAS using acoustic signals with diverse characteristics from insects, birds, and mammals. DAS comes with a graphical user interface for annotating song, training the network, and for generating and proofreading annotations. The method can be trained to annotate signals from new species with little manual annotation and can be combined with unsupervised methods to discover novel signal types. DAS annotates song with high throughput and low latency for experimental interventions in realtime. Overall, DAS is a universal, versatile, and accessible tool for annotating acoustic communication signals.
Collapse
Affiliation(s)
- Elsa Steinfath
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Adrian Palacios-Muñoz
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Julian R Rottschäfer
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Deniz Yuezak
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Jan Clemens
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- Bernstein Center for Computational NeuroscienceGöttingenGermany
| |
Collapse
|
11
|
Sainburg T, Thielk M, Gentner TQ. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput Biol 2020; 16:e1008228. [PMID: 33057332 PMCID: PMC7591061 DOI: 10.1371/journal.pcbi.1008228] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 10/27/2020] [Accepted: 08/08/2020] [Indexed: 12/15/2022] Open
Abstract
Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.
Collapse
Affiliation(s)
- Tim Sainburg
- Department of Psychology, University of California, San Diego, La Jolla, CA, USA
- Center for Academic Research & Training in Anthropogeny, University of California, San Diego, La Jolla, CA, USA
| | - Marvin Thielk
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Timothy Q. Gentner
- Department of Psychology, University of California, San Diego, La Jolla, CA, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Neurobiology Section, Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
- Kavli Institute for Brain and Mind, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
12
|
Affiliation(s)
- Panu Somervuo
- Department of Biosciences, University of Helsinki, Helsinki, Finland
| |
Collapse
|
13
|
Koumura T, Okanoya K. Distributed representation of discrete sequential vocalization in the Bengalese finch ( Lonchura striata var. domestica). BIOACOUSTICS 2019. [DOI: 10.1080/09524622.2019.1607558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Takuya Koumura
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Kazuo Okanoya
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|