1
|
Brickson L, Zhang L, Vollrath F, Douglas-Hamilton I, Titus AJ. Elephants and algorithms: a review of the current and future role of AI in elephant monitoring. J R Soc Interface 2023; 20:20230367. [PMID: 37963556 PMCID: PMC10645515 DOI: 10.1098/rsif.2023.0367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 10/23/2023] [Indexed: 11/16/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) present revolutionary opportunities to enhance our understanding of animal behaviour and conservation strategies. Using elephants, a crucial species in Africa and Asia's protected areas, as our focal point, we delve into the role of AI and ML in their conservation. Given the increasing amounts of data gathered from a variety of sensors like cameras, microphones, geophones, drones and satellites, the challenge lies in managing and interpreting this vast data. New AI and ML techniques offer solutions to streamline this process, helping us extract vital information that might otherwise be overlooked. This paper focuses on the different AI-driven monitoring methods and their potential for improving elephant conservation. Collaborative efforts between AI experts and ecological researchers are essential in leveraging these innovative technologies for enhanced wildlife conservation, setting a precedent for numerous other species.
Collapse
Affiliation(s)
| | | | - Fritz Vollrath
- Save the Elephants, Nairobi, Kenya
- Department of Biology, University of Oxford, Oxford, UK
| | | | - Alexander J. Titus
- Colossal Biosciences, Dallas, TX, USA
- Information Sciences Institute, University of Southern California, Los Angeles, USA
| |
Collapse
|
2
|
Savagian A, Riehl C. Group chorusing as an intragroup signal in the greater ani, a communally breeding bird. Ethology 2022. [DOI: 10.1111/eth.13345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Affiliation(s)
- Amanda Savagian
- Department of Ecology and Evolutionary Biology Princeton University Princeton New Jersey USA
| | - Christina Riehl
- Department of Ecology and Evolutionary Biology Princeton University Princeton New Jersey USA
| |
Collapse
|
3
|
Clark FE, Dunn JC. From Soundwave to Soundscape: A Guide to Acoustic Research in Captive Animal Environments. Front Vet Sci 2022; 9:889117. [PMID: 35782565 PMCID: PMC9244380 DOI: 10.3389/fvets.2022.889117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 05/23/2022] [Indexed: 11/17/2022] Open
Abstract
Sound is a complex feature of all environments, but captive animals' soundscapes (acoustic scenes) have been studied far less than those of wild animals. Furthermore, research across farms, laboratories, pet shelters, and zoos tends to focus on just one aspect of environmental sound measurement: its pressure level or intensity (in decibels). We review the state of the art of captive animal acoustic research and contrast this to the wild, highlighting new opportunities for the former to learn from the latter. We begin with a primer on sound, aimed at captive researchers and animal caregivers with an interest (rather than specific expertise) in acoustics. Then, we summarize animal acoustic research broadly split into measuring sound from animals, or their environment. We guide readers from soundwave to soundscape and through the burgeoning field of conservation technology, which offers new methods to capture multiple features of complex, gestalt soundscapes. Our review ends with suggestions for future research, and a practical guide to sound measurement in captive environments.
Collapse
Affiliation(s)
- Fay E. Clark
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, Cambridge, United Kingdom
- School of Psychological Science, Faculty of Life Sciences, University of Bristol, Bristol, United Kingdom
- *Correspondence: Fay E. Clark
| | - Jacob C. Dunn
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, Cambridge, United Kingdom
- Biological Anthropology, Department of Archaeology, University of Cambridge, Cambridge, United Kingdom
- Department of Cognitive Biology, University of Vienna, Vienna, Austria
| |
Collapse
|
4
|
Trapanotto M, Nanni L, Brahnam S, Guo X. Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. J Imaging 2022; 8:jimaging8040096. [PMID: 35448223 PMCID: PMC9029749 DOI: 10.3390/jimaging8040096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/17/2022] [Accepted: 03/29/2022] [Indexed: 02/05/2023] Open
Abstract
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.
Collapse
Affiliation(s)
- Martino Trapanotto
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Loris Nanni
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Sheryl Brahnam
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
- Correspondence: ; Tel.: +1-417-873-9979
| | - Xiang Guo
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
| |
Collapse
|
5
|
|
6
|
Lattenkamp EZ, Hörpel SG, Mengede J, Firzlaff U. A researcher's guide to the comparative assessment of vocal production learning. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200237. [PMID: 34482725 DOI: 10.1098/rstb.2020.0237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Vocal production learning (VPL) is the capacity to learn to produce new vocalizations, which is a rare ability in the animal kingdom and thus far has only been identified in a handful of mammalian taxa and three groups of birds. Over the last few decades, approaches to the demonstration of VPL have varied among taxa, sound production systems and functions. These discrepancies strongly impede direct comparisons between studies. In the light of the growing number of experimental studies reporting VPL, the need for comparability is becoming more and more pressing. The comparative evaluation of VPL across studies would be facilitated by unified and generalized reporting standards, which would allow a better positioning of species on any proposed VPL continuum. In this paper, we specifically highlight five factors influencing the comparability of VPL assessments: (i) comparison to an acoustic baseline, (ii) comprehensive reporting of acoustic parameters, (iii) extended reporting of training conditions and durations, (iv) investigating VPL function via behavioural, perception-based experiments and (v) validation of findings on a neuronal level. These guidelines emphasize the importance of comparability between studies in order to unify the field of vocal learning. This article is part of the theme issue 'Vocal learning in animals and humans'.
Collapse
Affiliation(s)
- Ella Z Lattenkamp
- Division of Neurobiology, Department of Biology II, LMU Munich, Germany.,Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Stephen G Hörpel
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Department of Animal Sciences, Chair of Zoology, TU Munich, Germany
| | - Janine Mengede
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Uwe Firzlaff
- Department of Animal Sciences, Chair of Zoology, TU Munich, Germany
| |
Collapse
|
7
|
|
8
|
|
9
|
Wierucka K, Henley MD, Mumby HS. Acoustic cues to individuality in wild male adult African savannah elephants ( Loxodonta africana). PeerJ 2021; 9:e10736. [PMID: 33552734 PMCID: PMC7831363 DOI: 10.7717/peerj.10736] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/18/2020] [Indexed: 11/30/2022] Open
Abstract
The ability to recognize conspecifics plays a pivotal role in animal communication systems. It is especially important for establishing and maintaining associations among individuals of social, long-lived species, such as elephants. While research on female elephant sociality and communication is prevalent, until recently male elephants have been considered far less social than females. This resulted in a dearth of information about their communication and recognition abilities. With new knowledge about the intricacies of the male elephant social structure come questions regarding the communication basis that allows for social bonds to be established and maintained. By analyzing the acoustic parameters of social rumbles recorded over 1.5 years from wild, mature, male African savanna elephants (Loxodonta africana) we expand current knowledge about the information encoded within these vocalizations and their potential to facilitate individual recognition. We showed that social rumbles are individually distinct and stable over time and therefore provide an acoustic basis for individual recognition. Furthermore, our results revealed that different frequency parameters contribute to individual differences of these vocalizations.
Collapse
Affiliation(s)
- Kaja Wierucka
- School of Biological Sciences, University of Hong Kong, Hong Kong
| | - Michelle D Henley
- Applied Ecosystem and Conservation Research Unit, University of South Africa, Johannesburg, South Africa.,Elephants Alive, Hoedspruit, South Africa
| | - Hannah S Mumby
- School of Biological Sciences, University of Hong Kong, Hong Kong.,Department of Zoology, University of Cambridge, Cambridge, UK.,Centre for African Ecology, School of Animal, Plant and Environmental Sciences, University of Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
10
|
Wijers M, Trethowan P, Du Preez B, Chamaillé-Jammes S, Loveridge AJ, Macdonald DW, Markham A. Vocal discrimination of African lions and its potential for collar-free tracking. BIOACOUSTICS 2020. [DOI: 10.1080/09524622.2020.1829050] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Matthew Wijers
- Wildlife Conservation Research Unit, Recanati-Kaplan Centre, Department of Zoology, University of Oxford, Oxford, UK
| | - Paul Trethowan
- Wildlife Conservation Research Unit, Recanati-Kaplan Centre, Department of Zoology, University of Oxford, Oxford, UK
| | - Byron Du Preez
- Wildlife Conservation Research Unit, Recanati-Kaplan Centre, Department of Zoology, University of Oxford, Oxford, UK
| | - Simon Chamaillé-Jammes
- CEFE, CNRS, University Montpellier, University Paul Valéry Montpellier, EPHE, IRD, Montpellier, France
- Mammal Research Institute, Department of Zoology and Entomology, University of Pretoria, Pretoria, South Africa
| | - Andrew J. Loveridge
- Wildlife Conservation Research Unit, Recanati-Kaplan Centre, Department of Zoology, University of Oxford, Oxford, UK
| | - David W. Macdonald
- Wildlife Conservation Research Unit, Recanati-Kaplan Centre, Department of Zoology, University of Oxford, Oxford, UK
| | - Andrew Markham
- Department of Computer Science, University of Oxford, Oxford, UK
| |
Collapse
|
11
|
Ravikumar S, Vinod D, Ramesh G, Pulari SR, Mathi S. A layered approach to detect elephants in live surveillance video streams using convolution neural networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179710] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Sourav Ravikumar
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
| | - Dayanand Vinod
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
| | - Gowtham Ramesh
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
| | - Sini Raj Pulari
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
| | - Senthilkumar Mathi
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
| |
Collapse
|
12
|
Clink DJ, Tasirin JS, Klinck H. Vocal individuality and rhythm in male and female duet contributions of a nonhuman primate. Curr Zool 2020; 66:173-186. [PMID: 32440276 PMCID: PMC7233616 DOI: 10.1093/cz/zoz035] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 06/13/2019] [Indexed: 12/02/2022] Open
Abstract
Duetting, or the stereotypical, repeated and often coordinated vocalizations between 2 individuals arose independently multiple times in the Order Primates. Across primate species, there exists substantial variation in terms of timing, degree of overlap, and sex-specificity of duet contributions. There is increasing evidence that primates can modify the timing of their duet contributions relative to their partner, and this vocal flexibility may have been an important precursor to the evolution of human language. Here, we present the results of a fine-scale analysis of Gursky's spectral tarsier Tarsius spectrumgurskyae duet phrases recorded in North Sulawesi, Indonesia. Specifically, we aimed to investigate individual-level variation in the female and male contributions to the duet, quantify individual- and pair-level differences in duet timing, and measure temporal precision of duetting individuals relative to their partner. We were able to classify female duet phrases to the correct individual with an 80% accuracy using support vector machines, whereas our classification accuracy for males was lower at 64%. Females were more variable than males in terms of timing between notes. All tarsier phrases exhibited some degree of overlap between callers, and tarsiers exhibited high temporal precision in their note output relative to their partners. We provide evidence that duetting tarsier individuals can modify their note output relative to their duetting partner, and these results support the idea that flexibility in vocal exchanges-a precursor to human language-evolved early in the primate lineage and long before the emergence of modern humans.
Collapse
Affiliation(s)
- Dena J Clink
- Bioacoustics Research Program, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
| | - Johny S Tasirin
- Faculty of Agriculture, Sam Ratulangi University, Manado, Indonesia
| | - Holger Klinck
- Bioacoustics Research Program, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
| |
Collapse
|
13
|
Bianco MJ, Gerstoft P, Traer J, Ozanich E, Roch MA, Gannot S, Deledalle CA. Machine learning in acoustics: Theory and applications. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3590. [PMID: 31795641 DOI: 10.1121/1.5133944] [Citation(s) in RCA: 136] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 10/14/2019] [Indexed: 06/10/2023]
Abstract
Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, which are often based in statistics, for automatically detecting and utilizing patterns in data. Relative to conventional acoustics and signal processing, ML is data-driven. Given sufficient training data, ML can discover complex relationships between features and desired labels or actions, or between features themselves. With large volumes of training data, ML can discover models describing complex acoustic phenomena such as human speech and reverberation. ML in acoustics is rapidly developing with compelling results and significant future promise. We first introduce ML, then highlight ML developments in four acoustics research areas: source localization in speech processing, source localization in ocean acoustics, bioacoustics, and environmental sounds in everyday scenes.
Collapse
Affiliation(s)
- Michael J Bianco
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - Peter Gerstoft
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - James Traer
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Emma Ozanich
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| | - Sharon Gannot
- Faculty of Engineering, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Charles-Alban Deledalle
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
14
|
Bergler C, Schröter H, Cheng RX, Barth V, Weber M, Nöth E, Hofer H, Maier A. ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning. Sci Rep 2019; 9:10997. [PMID: 31358873 PMCID: PMC6662697 DOI: 10.1038/s41598-019-47335-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/12/2019] [Indexed: 11/09/2022] Open
Abstract
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis - particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository - the Orchive - comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species.
Collapse
Affiliation(s)
- Christian Bergler
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany.
| | - Hendrik Schröter
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany
| | - Rachael Xi Cheng
- Department of Ecological Dynamics, Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V., Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
| | - Volker Barth
- Anthro-Media, Nansenstr. 19, 12047, Berlin, Germany
| | | | - Elmar Nöth
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany.
| | - Heribert Hofer
- Department of Ecological Dynamics, Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V., Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustrasse 3, 14195, Berlin, Germany
- Department of Veterinary Medicine, Freie Universität Berlin, Oertzenweg 19b, 14195, Berlin, Germany
| | - Andreas Maier
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany
| |
Collapse
|
15
|
Morita T, Koda H. Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190139. [PMID: 31417719 PMCID: PMC6689648 DOI: 10.1098/rsos.190139] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 06/14/2019] [Indexed: 06/10/2023]
Abstract
A pervasive belief with regard to the differences between human language and animal vocal sequences (song) is that they belong to different classes of computational complexity, with animal song belonging to regular languages, whereas human language is superregular. This argument, however, lacks empirical evidence since superregular analyses of animal song are understudied. The goal of this paper is to perform a superregular analysis of animal song, using data from gibbons as a case study, and demonstrate that a superregular analysis can be effectively used with non-human data. A key finding is that a superregular analysis does not increase explanatory power but rather provides for compact analysis: fewer grammatical rules are necessary once superregularity is allowed. This pattern is analogous to a previous computational analysis of human language, and accordingly, the null hypothesis, that human language and animal song are governed by the same type of grammatical systems, cannot be rejected.
Collapse
Affiliation(s)
- T. Morita
- Primate Research Institute, Kyoto University, 41-2 Kanrin, Inuyama, Aichi 484-8506, Japan
| | - H. Koda
- Primate Research Institute, Kyoto University, 41-2 Kanrin, Inuyama, Aichi 484-8506, Japan
| |
Collapse
|
16
|
Seasonal Variation of Captive Meagre Acoustic Signalling: A Manual and Automatic Recognition Approach. FISHES 2019. [DOI: 10.3390/fishes4020028] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Many species rely on acoustic communication to fulfil several functions such as advertisement and mediation of social interactions (e.g., agonistic, mating). Therefore, fish calls can be an important source of information, e.g., to recognize reproductive periods or to assess fish welfare, and should be considered a potential non-intrusive tool in aquaculture management. Assessing fish acoustic activity, however, often requires long sound recordings. To analyse these long recordings automatic methods are invaluable tools to detect and extract the relevant biological information. Here we present a study to characterize meagre (Argyrosomus regius) acoustic activity during social contexts in captivity using an automatic pattern-recognition methodology based on the Hidden Markov Model. Calls produced by meagre during the breading season showed a richer repertoire than previously reported. Besides the dense choruses composed by grunts already known for this species, meagre emitted successive series of isolated pulses, audible as ‘knocks’. Grunts with a variable number of pulses were also registered. The overall acoustic activity was concurrent with the number of spawning events. A diel call rhythms exhibit peak of calling activity from 15:00 to midnight. In addition, grunt acoustic parameters varied significantly along the reproduction season. These results open the possibility to use the meagre vocal activity to predict breeding and approaching spawning periods in aquaculture management.
Collapse
|
17
|
|
18
|
Zhang YJ, Huang JF, Gong N, Ling ZH, Hu Y. Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:478. [PMID: 30075670 DOI: 10.1121/1.5047743] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 07/05/2018] [Indexed: 06/08/2023]
Abstract
This paper investigates the methods to detect and classify marmoset vocalizations automatically using a large data set of marmoset vocalizations and deep learning techniques. For vocalization detection, neural networks-based methods, including deep neural network (DNN) and recurrent neural network with long short-term memory units, are designed and compared against a conventional rule-based detection method. For vocalization classification, three different classification algorithms are compared, including a support vector machine (SVM), DNN, and long short-term memory recurrent neural networks (LSTM-RNNs). A 1500-min audio data set containing recordings from four pairs of marmoset twins and manual annotations is employed for experiments. Two test sets are built according to whether the test samples are produced by the marmosets in the training set (test set I) or not (test set II). Experimental results show that the LSTM-RNN-based detection method outperformed others and achieved 0.92% and 1.67% frame error rate on these two test sets. Furthermore, the deep learning models obtained higher classification accuracy than the SVM model, which was 95.60% and 91.67% on the two test sets, respectively.
Collapse
Affiliation(s)
- Ya-Jie Zhang
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, 443 Huangshan Road, Hefei 230027, China
| | - Jun-Feng Huang
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Chinese Academy of Sciences (CAS) Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, CAS, 320 Yueyang Road, Shanghai 200031, China
| | - Neng Gong
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Chinese Academy of Sciences (CAS) Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, CAS, 320 Yueyang Road, Shanghai 200031, China
| | - Zhen-Hua Ling
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, 443 Huangshan Road, Hefei 230027, China
| | - Yu Hu
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, 443 Huangshan Road, Hefei 230027, China
| |
Collapse
|
19
|
Everyday bat vocalizations contain information about emitter, addressee, context, and behavior. Sci Rep 2016; 6:39419. [PMID: 28005079 PMCID: PMC5178335 DOI: 10.1038/srep39419] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 11/22/2016] [Indexed: 11/09/2022] Open
Abstract
Animal vocal communication is often diverse and structured. Yet, the information concealed in animal vocalizations remains elusive. Several studies have shown that animal calls convey information about their emitter and the context. Often, these studies focus on specific types of calls, as it is rarely possible to probe an entire vocal repertoire at once. In this study, we continuously monitored Egyptian fruit bats for months, recording audio and video around-the-clock. We analyzed almost 15,000 vocalizations, which accompanied the everyday interactions of the bats, and were all directed toward specific individuals, rather than broadcast. We found that bat vocalizations carry ample information about the identity of the emitter, the context of the call, the behavioral response to the call, and even the call’s addressee. Our results underline the importance of studying the mundane, pairwise, directed, vocal interactions of animals.
Collapse
|
20
|
Fedurek P, Zuberbühler K, Dahl CD. Sequential information in a great ape utterance. Sci Rep 2016; 6:38226. [PMID: 27910886 PMCID: PMC5133612 DOI: 10.1038/srep38226] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 11/04/2016] [Indexed: 11/12/2022] Open
Abstract
Birdsong is a prime example of acoustically sophisticated vocal behaviour, but its complexity has evolved mainly through sexual selection to attract mates and repel sexual rivals. In contrast, non-human primate calls often mediate complex social interactions, but are generally regarded as acoustically simple. Here, we examine arguably the most complex call in great ape vocal communication, the chimpanzee (Pan troglodytes schweinfurthii) 'pant hoot'. This signal consists of four acoustically distinct phases: introduction, build-up, climax and let-down. We applied state-of-the-art Support Vector Machines (SVM) methodology to pant hoots produced by wild male chimpanzees of Budongo Forest, Uganda. We found that caller identity was apparent in all four phases, but most strongly in the low-amplitude introduction and high-amplitude climax phases. Age was mainly correlated with the low-amplitude introduction and build-up phases, dominance rank (i.e. social status) with the high-amplitude climax phase, and context (reflecting activity of the caller) with the low-amplitude let-down phase. We conclude that the complex acoustic structure of chimpanzee pant hoots is linked to a range of socially relevant information in the different phases of the call, reflecting the complex nature of chimpanzee social lives.
Collapse
Affiliation(s)
- Pawel Fedurek
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
- Budongo Conservation Field Station, Masindi, Uganda
- Max Planck Institute for Evolutionary Anthropology, Department of Primatology, Leipzig, Germany
| | - Klaus Zuberbühler
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
- Budongo Conservation Field Station, Masindi, Uganda
- School of Psychology and Neuroscience, University of St Andrews, Scotland, UK
| | - Christoph D. Dahl
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
21
|
Kaewtip K, Alwan A, O'Reilly C, Taylor CE. A robust automatic birdsong phrase classification: A template-based approach. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3691. [PMID: 27908084 DOI: 10.1121/1.4966592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Automatic phrase detection systems of bird sounds are useful in several applications as they reduce the need for manual annotations. However, birdphrase detection is challenging due to limited training data and background noise. Limited data occur because of limited recordings or the existence of rare phrases. Background noise interference occurs because of the intrinsic nature of the recording environment such as wind or other animals. This paper presents a different approach to birdsong phrase classification using template-based techniques suitable even for limited training data and noisy environments. The algorithm utilizes dynamic time-warping (DTW) and prominent (high-energy) time-frequency regions of training spectrograms to derive templates. The performance of the proposed algorithm is compared with the traditional DTW and hidden Markov models (HMMs) methods under several training and test conditions. DTW works well when the data are limited, while HMMs do better when more data are available, yet they both suffer when the background noise is severe. The proposed algorithm outperforms DTW and HMMs in most training and testing conditions, usually with a high margin when the background noise level is high. The innovation of this work is that the proposed algorithm is robust to both limited training data and background noise.
Collapse
Affiliation(s)
- Kantapon Kaewtip
- Department of Electrical Engineering, University of California, Los Angeles, 56-125B Engineering IV Building, Box 951594, Los Angeles, California 90095, USA
| | - Abeer Alwan
- Department of Electrical Engineering, University of California, Los Angeles, 56-125B Engineering IV Building, Box 951594, Los Angeles, California 90095, USA
| | - Colm O'Reilly
- Sigmedia, Department of Electronic and Electrical Engineering, Trinity College, Dublin, Ireland
| | - Charles E Taylor
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 621 Charles Young Drive South, Los Angeles, California 90095, USA
| |
Collapse
|
22
|
Spillmann B, van Schaik CP, Setia TM, Sadjadi SO. Who shall I say is calling? Validation of a caller recognition procedure in Bornean flanged male orangutan (Pongo pygmaeus wurmbii) long calls. BIOACOUSTICS 2016. [DOI: 10.1080/09524622.2016.1216802] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Brigitte Spillmann
- Anthropological Institute and Museum, University of Zurich, Zurich, Switzerland
| | - Carel P. van Schaik
- Anthropological Institute and Museum, University of Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
23
|
McCordic JA, Root-Gutteridge H, Cusano DA, Denes SL, Parks SE. Calls of North Atlantic right whales Eubalaena glacialis contain information on individual identity and age class. ENDANGER SPECIES RES 2016. [DOI: 10.3354/esr00735] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
24
|
Vieira M, Fonseca PJ, Amorim MCP, Teixeira CJC. Call recognition and individual identification of fish vocalizations based on automatic speech recognition: An example with the Lusitanian toadfish. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:3941-3950. [PMID: 26723348 DOI: 10.1121/1.4936858] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The study of acoustic communication in animals often requires not only the recognition of species specific acoustic signals but also the identification of individual subjects, all in a complex acoustic background. Moreover, when very long recordings are to be analyzed, automatic recognition and identification processes are invaluable tools to extract the relevant biological information. A pattern recognition methodology based on hidden Markov models is presented inspired by successful results obtained in the most widely known and complex acoustical communication signal: human speech. This methodology was applied here for the first time to the detection and recognition of fish acoustic signals, specifically in a stream of round-the-clock recordings of Lusitanian toadfish (Halobatrachus didactylus) in their natural estuarine habitat. The results show that this methodology is able not only to detect the mating sounds (boatwhistles) but also to identify individual male toadfish, reaching an identification rate of ca. 95%. Moreover this method also proved to be a powerful tool to assess signal durations in large data sets. However, the system failed in recognizing other sound types.
Collapse
Affiliation(s)
- Manuel Vieira
- Departamento de Biologia Animal and cE3c - Centre for Ecology, Evolution and Environmental Changes, Faculdade de Ciências, Universidade de Lisboa, Bloco C2. Campo Grande, 1749-016 Lisboa, Portugal
| | - Paulo J Fonseca
- Departamento de Biologia Animal and cE3c - Centre for Ecology, Evolution and Environmental Changes, Faculdade de Ciências, Universidade de Lisboa, Bloco C2. Campo Grande, 1749-016 Lisboa, Portugal
| | - M Clara P Amorim
- MARE-Marine and Environmental Sciences Centre, ISPA-Instituto Universitário, Rua Jardim do Tabaco 34, 1149-041 Lisboa, Portugal
| | - Carlos J C Teixeira
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Bloco C6. Campo Grande, 1749-016 Lisboa, Portugal
| |
Collapse
|
25
|
Ptacek L, Machlica L, Linhart P, Jaska P, Muller L. Automatic recognition of bird individuals on an open set using as-is recordings. BIOACOUSTICS 2015. [DOI: 10.1080/09524622.2015.1089524] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
Scheifele PM, Johnson MT, Fry M, Hamel B, Laclede K. Vocal classification of vocalizations of a pair of Asian small-clawed otters to determine stress. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:EL105-EL109. [PMID: 26233050 DOI: 10.1121/1.4922768] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Asian Small-Clawed Otters (Aonyx cinerea) are a small, protected but threatened species living in freshwater. They are gregarious and live in monogamous pairs for their lifetimes, communicating via scent and acoustic vocalizations. This study utilized a hidden Markov model (HMM) to classify stress versus non-stress calls from a sibling pair under professional care. Vocalizations were expertly annotated by keepers into seven contextual categories. Four of these-aggression, separation anxiety, pain, and prefeeding-were identified as stressful contexts, and three of them-feeding, training, and play-were identified as non-stressful contexts. The vocalizations were segmented, manually categorized into broad vocal type call types, and analyzed to determine signal to noise ratios. From this information, vocalizations from the most common contextual categories were used to implement HMM-based automatic classification experiments, which included individual identification, stress vs non-stress, and individual context classification. Results indicate that both individual identity and stress vs non-stress were distinguishable, with accuracies above 90%, but that individual contexts within the stress category were not easily separable.
Collapse
Affiliation(s)
- Peter M Scheifele
- FETCHLAB, Department of Audiology, University of Cincinnati, 3202 Eden Avenue, Cincinnati, Ohio 45267, USA
| | - Michael T Johnson
- Electrical and Computer Engineering Department, Marquette University, 1515 West Wisconsin Avenue, Milwaukee, Wisconsin 53233, USA
| | - Michelle Fry
- Newport Aquarium, 1 Aquarium Way, Newport, Kentucky 41071, USA , , , ,
| | - Benjamin Hamel
- Electrical and Computer Engineering Department, Marquette University, 1515 West Wisconsin Avenue, Milwaukee, Wisconsin 53233, USA
| | - Kathryn Laclede
- FETCHLAB, Department of Audiology, University of Cincinnati, 3202 Eden Avenue, Cincinnati, Ohio 45267, USA
| |
Collapse
|
27
|
Oosthuizen DJJ, Hanekom JJ. Fuzzy information transmission analysis for continuous speech features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:1983-1994. [PMID: 25920849 DOI: 10.1121/1.4916198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Feature information transmission analysis (FITA) estimates information transmitted by an acoustic feature by assigning tokens to categories according to the feature under investigation and comparing within-category to between-category confusions. FITA was initially developed for categorical features (e.g., voicing) for which the category assignments arise from the feature definition. When used with continuous features (e.g., formants), it may happen that pairs of tokens in different categories are more similar than pairs of tokens in the same category. The estimated transmitted information may be sensitive to category boundary location and the selected number of categories. This paper proposes a fuzzy approach to FITA that provides a smoother transition between categories and compares its sensitivity to grouping parameters with that of the traditional approach. The fuzzy FITA was found to be sufficiently robust to boundary location to allow automation of category boundary selection. Traditional and fuzzy FITA were found to be sensitive to the number of categories. This is inherent to the mechanism of isolating a feature by dividing tokens into categories, so that transmitted information values calculated using different numbers of categories should not be compared. Four categories are recommended for continuous features when twelve tokens are used.
Collapse
Affiliation(s)
- Dirk J J Oosthuizen
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, University Road, Pretoria 0002, South Africa
| | - Johan J Hanekom
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, University Road, Pretoria 0002, South Africa
| |
Collapse
|
28
|
Response of red deer stags (Cervus elaphus) to playback of harsh versus common roars. Naturwissenschaften 2014; 101:851-4. [PMID: 25119193 DOI: 10.1007/s00114-014-1217-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Revised: 07/08/2014] [Accepted: 07/23/2014] [Indexed: 10/24/2022]
Abstract
Red deer stags (Cervus elaphus) give two distinct types of roars during the breeding season, the "common roar" and the "harsh roar." Harsh roars are more frequent during contexts of intense competition, and characterized by a set of features that increase their perceptual salience, suggesting that they signal heightened arousal. While common roars have been shown to encode size information and mediate both male competition and female choice, to our knowledge, the specific function of harsh roars during male competition has not yet been studied. Here, we investigate the hypothesis that the specific structure of male harsh roars signals high arousal to competitors. We contrast the behavioral responses of free ranging, harem-holding stags to the playback of harsh roars from an unfamiliar competitor with their response to the playback of common roars from the same animal. We show that males react less strongly to sequences of harsh roars than to sequences of common roars, possibly because they are reluctant to escalate conflicts with highly motivated and threatening unfamiliar males in the absence of visual information. While future work should investigate the response of stags to harsh roars from familiar opponents, our observations remain consistent with the hypothesis that harsh roars may signal motivation during male competition, and illustrate how intrasexual selection can contribute to the diversification of male vocal signals.
Collapse
|
29
|
Viljoen JJ, Ganswindt A, Reynecke C, Stoeger AS, Langbauer WR. Vocal stress associated with a translocation of a family herd of African elephants (Loxodonta africana) in the Kruger National Park, South Africa. BIOACOUSTICS 2014. [DOI: 10.1080/09524622.2014.906320] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Jozua Jakobus Viljoen
- Department of Nature Conservation, Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa
| | - Andre Ganswindt
- Department of Zoology and Entomology, University of Pretoria, Private Bag X20 Hatfield, Pretoria 0028, South Africa
| | - Christopher Reynecke
- Department of Nature Conservation, Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa
| | - Angela S. Stoeger
- Department of Cognitive Biology, University of Vienna, Althanstrasse 14, 1090Vienna, Austria
| | - William Richard Langbauer
- Department of Science & Conservation, Pittsburgh Zoo & PPG Aquarium, One Wild Place, Pittsburgh, PA15206, USA
| |
Collapse
|
30
|
Röper KM, Scheumann M, Wiechert AB, Nathan S, Goossens B, Owren MJ, Zimmermann E. Vocal acoustics in the endangered proboscis monkey (Nasalis larvatus). Am J Primatol 2013; 76:192-201. [PMID: 24123122 DOI: 10.1002/ajp.22221] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Revised: 09/06/2013] [Accepted: 09/07/2013] [Indexed: 11/12/2022]
Abstract
The endangered proboscis monkey (Nasalis larvatus) is a sexually highly dimorphic Old World primate endemic to the island of Borneo. Previous studies focused mainly on its ecology and behavior, but knowledge of its vocalizations is limited. The present study provides quantified information on vocal rate and on the vocal acoustics of the prominent calls of this species. We audio-recorded vocal behavior of 10 groups over two 4-month periods at the Lower Kinabatangan Wildlife Sanctuary in Sabah, Borneo. We observed monkeys and recorded calls in evening and morning sessions at sleeping trees along riverbanks. We found no differences in the vocal rate between evening and morning observation sessions. Based on multiparametric analysis, we identified acoustic features of the four common call-types "shrieks," "honks," "roars," and "brays." "Chorus" events were also noted in which multiple callers produced a mix of vocalizations. The four call-types were distinguishable based on a combination of fundamental frequency variation, call duration, and degree of voicing. Three of the call-types can be considered as "loud calls" and are therefore deemed promising candidates for non-invasive, vocalization-based monitoring of proboscis monkeys for conservation purposes.
Collapse
Affiliation(s)
- K M Röper
- Institute of Zoology, University of Veterinary Medicine Hannover, Hannover, Germany
| | | | | | | | | | | | | |
Collapse
|
31
|
Mielke A, Zuberbühler K. A method for automated individual, species and call type recognition in free-ranging animals. Anim Behav 2013. [DOI: 10.1016/j.anbehav.2013.04.017] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
32
|
Effective and accurate discrimination of individual dairy cattle through acoustic sensing. Appl Anim Behav Sci 2013. [DOI: 10.1016/j.applanim.2013.03.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
33
|
Ji A, Johnson MT, Walsh EJ, McGee J, Armstrong DL. Discrimination of individual tigers (Panthera tigris) from long distance roars. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:1762-1769. [PMID: 23464045 DOI: 10.1121/1.4789936] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This paper investigates the extent of tiger (Panthera tigris) vocal individuality through both qualitative and quantitative approaches using long distance roars from six individual tigers at Omaha's Henry Doorly Zoo in Omaha, NE. The framework for comparison across individuals includes statistical and discriminant function analysis across whole vocalization measures and statistical pattern classification using a hidden Markov model (HMM) with frame-based spectral features comprised of Greenwood frequency cepstral coefficients. Individual discrimination accuracy is evaluated as a function of spectral model complexity, represented by the number of mixtures in the underlying Gaussian mixture model (GMM), and temporal model complexity, represented by the number of sequential states in the HMM. Results indicate that the temporal pattern of the vocalization is the most significant factor in accurate discrimination. Overall baseline discrimination accuracy for this data set is about 70% using high level features without complex spectral or temporal models. Accuracy increases to about 80% when more complex spectral models (multiple mixture GMMs) are incorporated, and increases to a final accuracy of 90% when more detailed temporal models (10-state HMMs) are used. Classification accuracy is stable across a relatively wide range of configurations in terms of spectral and temporal model resolution.
Collapse
Affiliation(s)
- An Ji
- Department of Electrical and Computer Engineering, Marquette University, 1515 West Wisconsin Avenue, Milwaukee, Wisconsin 53233, USA
| | | | | | | | | |
Collapse
|
34
|
Garland EC, Noad MJ, Goldizen AW, Lilley MS, Rekdahl ML, Garrigue C, Constantine R, Daeschler Hauser N, Poole MM, Robbins J. Quantifying humpback whale song sequences to understand the dynamics of song exchange at the ocean basin scale. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:560-9. [PMID: 23297927 DOI: 10.1121/1.4770232] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Humpback whales have a continually evolving vocal sexual display, or "song," that appears to undergo both evolutionary and "revolutionary" change. All males within a population adhere to the current content and arrangement of the song. Populations within an ocean basin share similarities in their songs; this sharing is complex as multiple variations of the song (song types) may be present within a region at any one time. To quantitatively investigate the similarity of song types, songs were compared at both the individual singer and population level using the Levenshtein distance technique and cluster analysis. The highly stereotyped sequences of themes from the songs of 211 individuals from populations within the western and central South Pacific region from 1998 through 2008 were grouped together based on the percentage of song similarity, and compared to qualitatively assigned song types. The analysis produced clusters of highly similar songs that agreed with previous qualitative assignments. Each cluster contained songs from multiple populations and years, confirming the eastward spread of song types and their progressive evolution through the study region. Quantifying song similarity and exchange will assist in understanding broader song dynamics and contribute to the use of vocal displays as population identifiers.
Collapse
Affiliation(s)
- Ellen C Garland
- Cetacean Ecology and Acoustics Laboratory, School of Veterinary Science, University of Queensland, Gatton, Queensland 4343, Australia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Soltis J, Wilson RP, Douglas-Hamilton I, Vollrath F, King LE, Savage A. Accelerometers in collars identify behavioral states in captive African elephants Loxodonta africana. ENDANGER SPECIES RES 2012. [DOI: 10.3354/esr00452] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
36
|
Cheng J, Xie B, Lin C, Ji L. A comparative study in birds: call-type-independent species and individual recognition using four machine-learning methods and two acoustic features. BIOACOUSTICS 2012. [DOI: 10.1080/09524622.2012.669664] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
37
|
Zhang B, Cheng J, Han Y, Ji L, Shi F. An acoustic system for the individual recognition of insects. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:2859-2865. [PMID: 22501064 DOI: 10.1121/1.3692236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Research into acoustic recognition systems for insects has focused on species identification rather than individual identification. In this paper, the feasibility of applying pattern recognition techniques to construct an acoustic system capable of automatic individual recognition for insects is investigated analytically and experimentally across two species of Orthoptera. Mel-frequency cepstral coefficients serve as the acoustic feature, and α-Gaussian mixture models were selected as the classification models. The performance of the proposed acoustic system is promising and displays high accuracy. The results suggest that the acoustic feature and classifier method developed here have potential for individual animal recognition and can be applied to other species of interest.
Collapse
Affiliation(s)
- Bo Zhang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Beijing 100101, People's Republic of China
| | | | | | | | | |
Collapse
|
38
|
Blumstein DT, Mennill DJ, Clemins P, Girod L, Yao K, Patricelli G, Deppe JL, Krakauer AH, Clark C, Cortopassi KA, Hanser SF, McCowan B, Ali AM, Kirschel ANG. Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus. J Appl Ecol 2011. [DOI: 10.1111/j.1365-2664.2011.01993.x] [Citation(s) in RCA: 384] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
39
|
Giret N, Roy P, Albert A, Pachet F, Kreutzer M, Bovet D. Finding good acoustic features for parrot vocalizations: the feature generation approach. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1089-99. [PMID: 21361465 DOI: 10.1121/1.3531953] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
A crucial step in the understanding of vocal behavior of birds is to be able to classify calls in the repertoire into meaningful types. Methods developed to this aim are limited either because of human subjectivity or because of methodological issues. The present study investigated whether a feature generation system could categorize vocalizations of a bird species automatically and effectively. This procedure was applied to vocalizations of African gray parrots, known for their capacity to reproduce almost any sound of their environment. Outcomes of the feature generation approach agreed well with a much more labor-intensive process of a human expert classifying based on spectrographic representation, while clearly out-performing other automated methods. The method brings significant improvements in precision over commonly used bioacoustical analyses. As such, the method enlarges the scope of automated, acoustics-based sound classification.
Collapse
Affiliation(s)
- Nicolas Giret
- Laboratoire d'Ethologie et Cognition Comparées, Université Paris Ouest Nanterre La Défense, 200 avenue de la République, 92000 Nanterre, France.
| | | | | | | | | | | |
Collapse
|
40
|
Abstract
Research on vocal communication in African elephants has increased in recent years, both in the wild and in captivity, providing an opportunity to present a comprehensive review of research related to their vocal behavior. Current data indicate that the vocal repertoire consists of perhaps nine acoustically distinct call types, "rumbles" being the most common and acoustically variable. Large vocal production anatomy is responsible for the low-frequency nature of rumbles, with fundamental frequencies in the infrasonic range. Additionally, resonant frequencies of rumbles implicate the trunk in addition to the oral cavity in shaping the acoustic structure of rumbles. Long-distance communication is thought possible because low-frequency sounds propagate more faithfully than high-frequency sounds, and elephants respond to rumbles at distances of up to 2.5 km. Elephant ear anatomy appears designed for detecting low frequencies, and experiments demonstrate that elephants can detect infrasonic tones and discriminate small frequency differences. Two vocal communication functions in the African elephant now have reasonable empirical support. First, closely bonded but spatially separated females engage in rumble exchanges, or "contact calls," that function to coordinate movement or reunite animals. Second, both males and females produce "mate attraction" rumbles that may advertise reproductive states to the opposite sex. Additionally, there is evidence that the structural variation in rumbles reflects the individual identity, reproductive state, and emotional state of callers. Growth in knowledge about the communication system of the African elephant has occurred from a rich combination of research on wild elephants in national parks and captive elephants in zoological parks.
Collapse
Affiliation(s)
- Joseph Soltis
- Education and Science, Disney's Animal Kingdom, Bay Lake, Florida, USA.
| |
Collapse
|
41
|
Pozzi L, Gamba M, Giacoma C. The use of Artificial Neural Networks to classify primate vocalizations: A pilot study on black lemurs. Am J Primatol 2010; 72:337-48. [PMID: 20034021 DOI: 10.1002/ajp.20786] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The identification of the vocal repertoire of a species represents a crucial prerequisite for a correct interpretation of animal behavior. Artificial Neural Networks (ANNs) have been widely used in behavioral sciences, and today are considered a valuable classification tool for reducing the level of subjectivity and allowing replicable results across different studies. However, to date, no studies have applied this tool to nonhuman primate vocalizations. Here, we apply for the first time ANNs, to discriminate the vocal repertoire in a primate species, Eulemur macaco macaco. We designed an automatic procedure to extract both spectral and temporal features from signals, and performed a comparative analysis between a supervised Multilayer Perceptron and two statistical approaches commonly used in primatology (Discriminant Function Analysis and Cluster Analysis), in order to explore pros and cons of these methods in bioacoustic classification. Our results show that ANNs were able to recognize all seven vocal categories previously described (92.5-95.6%) and perform better than either statistical analysis (76.1-88.4%). The results show that ANNs can provide an effective and robust method for automatic classification also in primates, suggesting that neural models can represent a valuable tool to contribute to a better understanding of primate vocal communication. The use of neural networks to identify primate vocalizations and the further development of this approach in studying primate communication are discussed.
Collapse
Affiliation(s)
- Luca Pozzi
- Dipartimento di Biologia Animale e dell'Uomo, Università di Torino, Italy.
| | | | | |
Collapse
|
42
|
Adi K, Johnson MT, Osiejuk TS. Acoustic censusing using automatic vocalization classification and identity recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:874-883. [PMID: 20136210 DOI: 10.1121/1.3273887] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This paper presents an advanced method to acoustically assess animal abundance. The framework combines supervised classification (song-type and individual identity recognition), unsupervised classification (individual identity clustering), and the mark-recapture model of abundance estimation. The underlying algorithm is based on clustering using hidden Markov models (HMMs) and Gaussian mixture models (GMMs) similar to methods used in the speech recognition community for tasks such as speaker identification and clustering. Initial experiments using a Norwegian ortolan bunting (Emberiza hortulana) data set show the feasibility and effectiveness of the approach. Individually distinct acoustic features have been observed in a wide range of animal species, and this combined with the widespread success of speaker identification and verification methods for human speech suggests that robust automatic identification of individuals from their vocalizations is attainable. Only a few studies, however, have yet attempted to use individual acoustic distinctiveness to directly assess population density and structure. The approach introduced here offers a direct mechanism for using individual vocal variability to create simpler and more accurate population assessment tools in vocally active species.
Collapse
Affiliation(s)
- Kuntoro Adi
- Santa Dharma University, Mrican, Yogyakarta 55002, Indonesia
| | | | | |
Collapse
|
43
|
Mouy X, Bahoura M, Simard Y. Automatic recognition of fin and blue whale calls for real-time monitoring in the St. Lawrence. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2918-2928. [PMID: 20000904 DOI: 10.1121/1.3257588] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Monitoring blue and fin whales summering in the St. Lawrence Estuary with passive acoustics requires call recognition algorithms that can cope with the heavy shipping noise of the St. Lawrence Seaway and with multipath propagation characteristics that generate overlapping copies of the calls. In this paper, the performance of three time-frequency methods aiming at such automatic detection and classification is tested on more than 2000 calls and compared at several levels of signal-to-noise ratio using typical recordings collected in this area. For all methods, image processing techniques are used to reduce the noise in the spectrogram. The first approach consists in matching the spectrogram with binary time-frequency templates of the calls (coincidence of spectrograms). The second approach is based on the extraction of the frequency contours of the calls and their classification using dynamic time warping (DTW) and the vector quantization (VQ) algorithms. The coincidence of spectrograms was the fastest method and performed better for blue whale A and B calls. VQ detected more 20 Hz fin whale calls but with a higher false alarm rate. DTW and VQ outperformed for the more variable blue whale D calls.
Collapse
Affiliation(s)
- Xavier Mouy
- Marine Sciences Institute, University of Quebec at Rimouski, 310 Allee des Ursulines, Rimouski, Quebec G5L-3A1, Canada.
| | | | | |
Collapse
|
44
|
A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models. ALGORITHMS 2009. [DOI: 10.3390/a2041410] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
45
|
Nair S, Balakrishnan R, Seelamantula CS, Sukumar R. Vocalizations of wild Asian elephants (Elephas maximus): structural classification and social context. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2768-2778. [PMID: 19894852 DOI: 10.1121/1.3224717] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Elephants use vocalizations for both long and short distance communication. Whereas the acoustic repertoire of the African elephant (Loxodonta africana) has been extensively studied in its savannah habitat, very little is known about the structure and social context of the vocalizations of the Asian elephant (Elephas maximus), which is mostly found in forests. In this study, the vocal repertoire of wild Asian elephants in southern India was examined. The calls could be classified into four mutually exclusive categories, namely, trumpets, chirps, roars, and rumbles, based on quantitative analyses of their spectral and temporal features. One of the call types, the rumble, exhibited high structural diversity, particularly in the direction and extent of frequency modulation of calls. Juveniles produced three of the four call types, including trumpets, roars, and rumbles, in the context of play and distress. Adults produced trumpets and roars in the context of disturbance, aggression, and play. Chirps were typically produced in situations of confusion and alarm. Rumbles were used for contact calling within and among herds, by matriarchs to assemble the herd, in close-range social interactions, and during disturbance and aggression. Spectral and temporal features of the four call types were similar between Asian and African elephants.
Collapse
Affiliation(s)
- Smita Nair
- Centre for Ecological Sciences, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | | | | | | |
Collapse
|
46
|
Brown JC, Smaragdis P. Hidden Markov and Gaussian mixture models for automatic call classification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:EL221-EL224. [PMID: 19507925 DOI: 10.1121/1.3124659] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Automatic methods of classification of animal sounds offer many advantages including speed and consistency in processing massive quantities of data. Calculations have been carried out on a set of 75 calls of Northern Resident killer whales, previously classified perceptually (human classification) into seven call types, using, hidden Markov models (HMMs) and Gaussian mixture models (GMMs). Neither of these methods has been used previously for classification of marine mammal call types. With cepstral coefficients as features both HMMs and GMMs give over 90% agreement with the perceptual classification, with the HMM over 95% for some cases.
Collapse
Affiliation(s)
- Judith C Brown
- Physics Department, Wellesley College, Wellesley, Massachusetts 02481, USA.
| | | |
Collapse
|
47
|
|
48
|
A new perspective on acoustic individual recognition in animals with limited call sharing or changing repertoires. Anim Behav 2008. [DOI: 10.1016/j.anbehav.2007.11.003] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
49
|
Tao J, Johnson MT, Osiejuk TS. Acoustic model adaptation for ortolan bunting (Emberiza hortulana L.) song-type classification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:1582-1590. [PMID: 18345846 DOI: 10.1121/1.2837487] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Automatic systems for vocalization classification often require fairly large amounts of data on which to train models. However, animal vocalization data collection and transcription is a difficult and time-consuming task, so that it is expensive to create large data sets. One natural solution to this problem is the use of acoustic adaptation methods. Such methods, common in human speech recognition systems, create initial models trained on speaker independent data, then use small amounts of adaptation data to build individual-specific models. Since, as in human speech, individual vocal variability is a significant source of variation in bioacoustic data, acoustic model adaptation is naturally suited to classification in this domain as well. To demonstrate and evaluate the effectiveness of this approach, this paper presents the application of maximum likelihood linear regression adaptation to ortolan bunting (Emberiza hortulana L.) song-type classification. Classification accuracies for the adapted system are computed as a function of the amount of adaptation data and compared to caller-independent and caller-dependent systems. The experimental results indicate that given the same amount of data, supervised adaptation significantly outperforms both caller-independent and caller-dependent systems.
Collapse
Affiliation(s)
- Jidong Tao
- Speech and Signal Processing Laboratory, Marquette University, PO Box 1881, Milwaukee, Wisconsin 53233-1881, USA.
| | | | | |
Collapse
|
50
|
Rickwood P, Taylor A. Methods for automatically analyzing humpback song units. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:1763-1772. [PMID: 18345864 DOI: 10.1121/1.2836748] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This paper presents mathematical techniques for automatically extracting and analyzing bioacoustic signals. Automatic techniques are described for isolation of target signals from background noise, extraction of features from target signals and unsupervised classification (clustering) of the target signals based on these features. The only user-provided inputs, other than raw sound, is an initial set of signal processing and control parameters. Of particular note is that the number of signal categories is determined automatically. The techniques, applied to hydrophone recordings of humpback whales (Megaptera novaeangliae), produce promising initial results, suggesting that they may be of use in automated analysis of not only humpbacks, but possibly also in other bioacoustic settings where automated analysis is desirable.
Collapse
Affiliation(s)
- Peter Rickwood
- Building 6, University of Technology, Sydney, PO Box 123, Broadway, NSW 2007, Sydney, Australia.
| | | |
Collapse
|