1
|
Lamothe C, Obliger-Debouche M, Best P, Trapeau R, Ravel S, Artières T, Marxer R, Belin P. A large annotated dataset of vocalizations by common marmosets. Sci Data 2025; 12:782. [PMID: 40360502 DOI: 10.1038/s41597-025-04951-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 04/03/2025] [Indexed: 05/15/2025] Open
Abstract
Non-human primates, our closest relatives, use a wide range of complex vocal signals for communication within their species. Previous research on marmoset (Callithrix jacchus) vocalizations has been limited by sampling rates not covering the whole hearing range and insufficient labeling for advanced analyses using Deep Neural Networks (DNNs). Here, we provide a database of common marmoset vocalizations, which were continuously recorded with a sampling rate of 96 kHz from an animal holding facility housing simultaneously ~20 marmosets in three cages. The dataset comprises more than 800,000 files, amounting to 253 hours of data collected over 40 months. Each recording lasts a few seconds and captures the marmosets' social vocalizations, encompassing their entire known vocal repertoire during the experimental period. Around 215,000 calls are annotated with the vocalization type. We offer a trained classifier to assist future investigations. Finally, we validated our dataset by sampling 700 representative recordings and cross-examining them with four experts.
Collapse
Affiliation(s)
- Charly Lamothe
- La Timone Neuroscience Institute UMR 7289, CNRS, Aix-Marseille University, Marseille, France.
- Laboratoire d'Informatique et Systèmes UMR 7020, CNRS, Aix-Marseille University, Marseille, France.
| | - Manon Obliger-Debouche
- La Timone Neuroscience Institute UMR 7289, CNRS, Aix-Marseille University, Marseille, France.
| | - Paul Best
- Laboratoire d'Informatique et Systèmes UMR 7020, CNRS, Aix-Marseille University, Marseille, France
| | - Régis Trapeau
- La Timone Neuroscience Institute UMR 7289, CNRS, Aix-Marseille University, Marseille, France
| | - Sabrina Ravel
- La Timone Neuroscience Institute UMR 7289, CNRS, Aix-Marseille University, Marseille, France
| | - Thierry Artières
- Laboratoire d'Informatique et Systèmes UMR 7020, CNRS, Aix-Marseille University, Marseille, France
| | - Ricard Marxer
- Laboratoire d'Informatique et Systèmes UMR 7020, CNRS, Aix-Marseille University, Marseille, France
| | - Pascal Belin
- La Timone Neuroscience Institute UMR 7289, CNRS, Aix-Marseille University, Marseille, France.
| |
Collapse
|
2
|
Bosshard AB, Burkart JM, Merlo P, Cathcart C, Townsend SW, Bickel B. Beyond bigrams: call sequencing in the common marmoset ( Callithrix jacchus) vocal system. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240218. [PMID: 39507993 PMCID: PMC11537759 DOI: 10.1098/rsos.240218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 07/30/2024] [Accepted: 10/01/2024] [Indexed: 11/08/2024]
Abstract
Over the last two decades, an emerging body of research has demonstrated that non-human animals exhibit the ability to combine context-specific calls into larger sequences. These structures have frequently been compared with language's syntax, whereby linguistic units are combined to form larger structures, and leveraged to argue that syntax might not be unique to language. Currently, however, the overwhelming majority of examples of call combinations are limited to simple sequences comprising just two calls which differ dramatically from the open-ended hierarchical structuring of the syntax found in language. We revisit this issue by taking a whole-repertoire approach to investigate combinatoriality in common marmosets (Callithrix jacchus). We use Markov chain models to quantify the vocal sequences produced by marmosets providing evidence for structures beyond the bigram, including three-call and even combinations of up to eight or nine calls. Our analyses of these longer vocal sequences are suggestive of potential further internal organization, including some amount of recombination, nestedness and non-adjacent dependencies. We argue that data-driven, whole-repertoire analyses are fundamental to uncovering the combinatorial complexity of non-human animals and will further facilitate meaningful comparisons with language's combinatoriality.
Collapse
Affiliation(s)
- Alexandra B. Bosshard
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich, Switzerland
- Department of Evolutionary Anthropology, University of Zurich, Zurich, Switzerland
| | - Judith M. Burkart
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich, Switzerland
- Department of Evolutionary Anthropology, University of Zurich, Zurich, Switzerland
| | - Paola Merlo
- Department of Linguistics, University of Geneva, Geneva, Switzerland
- Idiap Research Institute, Martigny, Switzerland
| | - Chundra Cathcart
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich, Switzerland
| | - Simon W. Townsend
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich, Switzerland
- Department of Evolutionary Anthropology, University of Zurich, Zurich, Switzerland
| | - Balthasar Bickel
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich, Switzerland
| |
Collapse
|
3
|
Cauzinille J, Favre B, Marxer R, Rey A. Applying machine learning to primate bioacoustics: Review and perspectives. Am J Primatol 2024; 86:e23666. [PMID: 39120066 DOI: 10.1002/ajp.23666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 06/13/2024] [Accepted: 07/12/2024] [Indexed: 08/10/2024]
Abstract
This paper provides a comprehensive review of the use of computational bioacoustics as well as signal and speech processing techniques in the analysis of primate vocal communication. We explore the potential implications of machine learning and deep learning methods, from the use of simple supervised algorithms to more recent self-supervised models, for processing and analyzing large data sets obtained within the emergence of passive acoustic monitoring approaches. In addition, we discuss the importance of automated primate vocalization analysis in tackling essential questions on animal communication and highlighting the role of comparative linguistics in bioacoustic research. We also examine the challenges associated with data collection and annotation and provide insights into potential solutions. Overall, this review paper runs through a set of common or innovative perspectives and applications of machine learning for primate vocal communication analysis and outlines opportunities for future research in this rapidly developing field.
Collapse
Affiliation(s)
- Jules Cauzinille
- LIS, CNRS, Aix-Marseille University, Marseille, France
- CRPN, CNRS, Aix-Marseille University, Marseille, France
- ILCB, Aix-Marseille University, Marseille, France
| | - Benoit Favre
- LIS, CNRS, Aix-Marseille University, Marseille, France
- ILCB, Aix-Marseille University, Marseille, France
| | - Ricard Marxer
- ILCB, Aix-Marseille University, Marseille, France
- LIS, CNRS, Université de Toulon, Toulon, France
| | - Arnaud Rey
- CRPN, CNRS, Aix-Marseille University, Marseille, France
- ILCB, Aix-Marseille University, Marseille, France
| |
Collapse
|
4
|
Erb WM, Ross W, Kazanecki H, Mitra Setia T, Madhusudhana S, Clink DJ. Vocal complexity in the long calls of Bornean orangutans. PeerJ 2024; 12:e17320. [PMID: 38766489 PMCID: PMC11100477 DOI: 10.7717/peerj.17320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 04/09/2024] [Indexed: 05/22/2024] Open
Abstract
Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable "long call" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.
Collapse
Affiliation(s)
- Wendy M. Erb
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
- Department of Anthropology, Rutgers, The State University of New Jersey, New Brunswick, United States of America
| | - Whitney Ross
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| | - Haley Kazanecki
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| | - Tatang Mitra Setia
- Primate Research Center, Universitas Nasional Jakarta, Jakarta, Indonesia
- Department of Biology, Faculty of Biology and Agriculture, Universitas Nasional Jakarta, Jakarta, Indonesia
| | - Shyam Madhusudhana
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
- Centre for Marine Science and Technology, Curtin University, Perth, Australia
| | - Dena J. Clink
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
5
|
Batist CH, Razafindraibe MN, Randriamanantena F, Baden AL. Bioacoustic characterization of the black-and-white ruffed lemur (Varecia variegata) vocal repertoire. Primates 2023; 64:621-635. [PMID: 37584832 DOI: 10.1007/s10329-023-01083-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/04/2023] [Indexed: 08/17/2023]
Abstract
Ruffed lemurs (Varecia spp.) exhibit a unique suite of behavioral traits compared to other lemur species, which includes their fluid fission-fusion social dynamics, communal rearing of parked litters, and pronounced frugivory in their humid rainforest habitats. Given these traits, and the dense rainforests they inhabit, vocal communication may be key to maintaining social cohesion, coordinating infant care, and/or defending their high-quality food resources. Indeed, they are known for their raucous 'roar-shriek' calls. However, there has been surprisingly little research on vocal communication in Varecia species and only two previously published repertoires, both of which were qualitative descriptions of their calls. In this study, we quantitatively examined the vocal repertoire of wild black-and-white ruffed lemurs (Varecia variegata) at Mangevo, Ranomafana National Park, Madagascar. We characterized 11 call types using 33 bioacoustic parameters related to frequency, duration, tonality, and composition. We also used discriminant function analysis and hierarchical clustering to quantitatively and objectively classify call types within the black-and-white ruffed lemur vocal repertoire. The repertoire consists of both monosyllabic and multisyllabic calls that are individually given or emitted in contagious choruses. Eight of the 11 calls were also used in combination or in larger multi-call sequences. The discriminant function analysis correctly assigned call types with 87% success, though this varied greatly by call type (1-65%). Hierarchical clustering identified 3-4 robust clusters, indicating low clustering structure in the data and suggesting that V. variegata exhibits a graded vocal repertoire. Future work should consider the environmental and behavioral contexts in which calls are used to better understand the function of these call types and combinations.
Collapse
Affiliation(s)
- C H Batist
- Department of Anthropology, The CUNY Graduate Center, The Graduate Center of the City University of New York, 365 5th Avenue, New York, NY, 10016, USA.
- New York Consortium in Evolutionary Primatology, New York, NY, USA.
- Rainforest Connection (RFCx), Katy, TX, USA.
| | - M N Razafindraibe
- Department of Animal Biology, University of Antananarivo, Antananarivo, Madagascar
- Institut International de Science Sociale, Antananarivo, Madagascar
| | | | - A L Baden
- Department of Anthropology, The CUNY Graduate Center, The Graduate Center of the City University of New York, 365 5th Avenue, New York, NY, 10016, USA
- New York Consortium in Evolutionary Primatology, New York, NY, USA
- Department of Anthropology, Hunter College of the City University of New York, New York, NY, USA
| |
Collapse
|
6
|
Grijseels DM, Prendergast BJ, Gorman JC, Miller CT. The neurobiology of vocal communication in marmosets. Ann N Y Acad Sci 2023; 1528:13-28. [PMID: 37615212 PMCID: PMC10592205 DOI: 10.1111/nyas.15057] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
An increasingly popular animal model for studying the neural basis of social behavior, cognition, and communication is the common marmoset (Callithrix jacchus). Interest in this New World primate across neuroscience is now being driven by their proclivity for prosociality across their repertoire, high volubility, and rapid development, as well as their amenability to naturalistic testing paradigms and freely moving neural recording and imaging technologies. The complement of these characteristics set marmosets up to be a powerful model of the primate social brain in the years to come. Here, we focus on vocal communication because it is the area that has both made the most progress and illustrates the prodigious potential of this species. We review the current state of the field with a focus on the various brain areas and networks involved in vocal perception and production, comparing the findings from marmosets to other animals, including humans.
Collapse
Affiliation(s)
- Dori M Grijseels
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
| | - Brendan J Prendergast
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
| | - Julia C Gorman
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, California, USA
| | - Cory T Miller
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
7
|
Arnaud V, Pellegrino F, Keenan S, St-Gelais X, Mathevon N, Levréro F, Coupé C. Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls. PLoS Comput Biol 2023; 19:e1010325. [PMID: 37053268 PMCID: PMC10129004 DOI: 10.1371/journal.pcbi.1010325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 04/25/2023] [Accepted: 03/01/2023] [Indexed: 04/15/2023] Open
Abstract
Despite the accumulation of data and studies, deciphering animal vocal communication remains challenging. In most cases, researchers must deal with the sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets are characterized by a limited number of recordings, most often noisy, and unbalanced in number between the individuals or categories of vocalizations. SUNG datasets therefore offer a valuable but inevitably distorted vision of communication systems. Adopting the best practices in their analysis is essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces and run a Supervised Uniform Manifold Approximation and Projection (S-UMAP) to evaluate how call types and individual signatures cluster in the bonobo acoustic space. We then implement three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. In addition, we highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) visualizing the dataset with supervised UMAP to examine the species acoustic space; iii) adopting Support Vector Machines as the baseline classification approach; iv) explicitly evaluating data leakage and possibly implementing a mitigation strategy.
Collapse
Affiliation(s)
- Vincent Arnaud
- Département des arts, des lettres et du langage, Université du Québec à Chicoutimi, Chicoutimi, Canada
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
| | - François Pellegrino
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
| | - Sumir Keenan
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Xavier St-Gelais
- Département des arts, des lettres et du langage, Université du Québec à Chicoutimi, Chicoutimi, Canada
| | - Nicolas Mathevon
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Florence Levréro
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Christophe Coupé
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
- Department of Linguistics, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
8
|
Lawson J, Rizos G, Jasinghe D, Whitworth A, Schuller B, Banks-Leite C. Automated acoustic detection of Geoffroy's spider monkey highlights tipping points of human disturbance. Proc Biol Sci 2023; 290:20222473. [PMID: 36919432 PMCID: PMC10015327 DOI: 10.1098/rspb.2022.2473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023] Open
Abstract
As more land is altered by human activity and more species become at risk of extinction, it is essential that we understand the requirements for conserving threatened species across human-modified landscapes. Owing to their rarity and often sparse distributions, threatened species can be difficult to study and efficient methods to sample them across wide temporal and spatial scales have been lacking. Passive acoustic monitoring (PAM) is increasingly recognized as an efficient method for collecting data on vocal species; however, the development of automated species detectors required to analyse large amounts of acoustic data is not keeping pace. Here, we collected 35 805 h of acoustic data across 341 sites in a region over 1000 km2 to show that PAM, together with a newly developed automated detector, is able to successfully detect the endangered Geoffroy's spider monkey (Ateles geoffroyi), allowing us to show that Geoffroy's spider monkey was absent below a threshold of 80% forest cover and within 1 km of primary paved roads and occurred equally in old growth and secondary forests. We discuss how this methodology circumvents many of the existing issues in traditional sampling methods and can be highly successful in the study of vocally rare or threatened species. Our results provide tools and knowledge for setting targets and developing conservation strategies for the protection of Geoffroy's spider monkey.
Collapse
Affiliation(s)
- Jenna Lawson
- Grantham Institute, Imperial College London, UK.,Department of Life Sciences, Imperial College London, UK
| | - George Rizos
- GLAM - Group on Language, Audio, & Music, Imperial College London, UK
| | - Dui Jasinghe
- Department of Life Sciences, Imperial College London, UK
| | - Andrew Whitworth
- Osa Conservation, Conservation Science Team, Washington, DC 20005, USA.,Institute of Biodiversity, Animal Health, and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, Scotland, UK.,Department of Biology, Center for Energy, Environment, and Sustainability, Wake Forest University, Winston-Salem, NC 27109, USA
| | - Björn Schuller
- GLAM - Group on Language, Audio, & Music, Imperial College London, UK.,EIHW - Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
| | | |
Collapse
|
9
|
da Silva DS, Nascimento CS, Jagatheesaperumal SK, de Albuquerque VHC. Mammogram Image Enhancement Techniques for Online Breast Cancer Detection and Diagnosis. SENSORS (BASEL, SWITZERLAND) 2022; 22:8818. [PMID: 36433415 PMCID: PMC9697415 DOI: 10.3390/s22228818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/09/2022] [Accepted: 11/10/2022] [Indexed: 06/16/2023]
Abstract
Breast cancer is the type of cancer with the highest incidence and global mortality of female cancers. Thus, the adaptation of modern technologies that assist in medical diagnosis in order to accelerate, automate and reduce the subjectivity of this process are of paramount importance for an efficient treatment. Therefore, this work aims to propose a robust platform to compare and evaluate the proposed strategies for improving breast ultrasound images and compare them with state-of-the-art techniques by classifying them as benign, malignant and normal. Investigations were performed on a dataset containing a total of 780 images of tumor-affected persons, divided into benign, malignant and normal. A data augmentation technique was used to scale up the corpus of images available in the chosen dataset. For this, novel image enhancement techniques were used and the Multilayer Perceptrons, k-Nearest Neighbor and Support Vector Machines algorithms were used for classification. From the promising outcomes of the conducted experiments, it was observed that the bilateral algorithm together with the SVM classifier achieved the best result for the classification of breast cancer, with an overall accuracy of 96.69% and an accuracy for the detection of malignant nodules of 95.11%. Therefore, it was found that the application of image enhancement methods can help in the detection of breast cancer at a much earlier stage with better accuracy in detection.
Collapse
Affiliation(s)
- Daniel S. da Silva
- Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza 60455-970, CE, Brazil
| | - Caio S. Nascimento
- Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza 60455-970, CE, Brazil
| | - Senthil K. Jagatheesaperumal
- Department of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi 626005, TN, India
| | | |
Collapse
|
10
|
Romero-Mujalli D, Bergmann T, Zimmermann A, Scheumann M. Utilizing DeepSqueak for automatic detection and classification of mammalian vocalizations: a case study on primate vocalizations. Sci Rep 2021; 11:24463. [PMID: 34961788 PMCID: PMC8712519 DOI: 10.1038/s41598-021-03941-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 12/09/2021] [Indexed: 11/16/2022] Open
Abstract
Bioacoustic analyses of animal vocalizations are predominantly accomplished through manual scanning, a highly subjective and time-consuming process. Thus, validated automated analyses are needed that are usable for a variety of animal species and easy to handle by non-programing specialists. This study tested and validated whether DeepSqueak, a user-friendly software, developed for rodent ultrasonic vocalizations, can be generalized to automate the detection/segmentation, clustering and classification of high-frequency/ultrasonic vocalizations of a primate species. Our validation procedure showed that the trained detectors for vocalizations of the gray mouse lemur (Microcebus murinus) can deal with different call types, individual variation and different recording quality. Implementing additional filters drastically reduced noise signals (4225 events) and call fragments (637 events), resulting in 91% correct detections (Ntotal = 3040). Additionally, the detectors could be used to detect the vocalizations of an evolutionary closely related species, the Goodman’s mouse lemur (M. lehilahytsara). An integrated supervised classifier classified 93% of the 2683 calls correctly to the respective call type, and the unsupervised clustering model grouped the calls into clusters matching the published human-made categories. This study shows that DeepSqueak can be successfully utilized to detect, cluster and classify high-frequency/ultrasonic vocalizations of other taxa than rodents, and suggests a validation procedure usable to evaluate further bioacoustics software.
Collapse
Affiliation(s)
- Daniel Romero-Mujalli
- Institute of Zoology, University of Veterinary Medicine Hannover, Bünteweg 17, 30559, Hannover, Germany.
| | - Tjard Bergmann
- Institute of Zoology, University of Veterinary Medicine Hannover, Bünteweg 17, 30559, Hannover, Germany
| | | | - Marina Scheumann
- Institute of Zoology, University of Veterinary Medicine Hannover, Bünteweg 17, 30559, Hannover, Germany
| |
Collapse
|
11
|
Jia X, Cao Y, O'Connor D, Zhu J, Tsang DCW, Zou B, Hou D. Mapping soil pollution by using drone image recognition and machine learning at an arsenic-contaminated agricultural field. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2021; 270:116281. [PMID: 33348140 DOI: 10.1016/j.envpol.2020.116281] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 12/04/2020] [Accepted: 12/08/2020] [Indexed: 06/12/2023]
Abstract
Mapping soil contamination enables the delineation of areas where protection measures are needed. Traditional soil sampling on a grid pattern followed by chemical analysis and geostatistical interpolation methods (GIMs), such as Kriging interpolation, can be costly, slow and not well-suited to highly heterogeneous soil environments. Here we propose a novel method to map soil contamination by combining high-resolution aerial imaging (HRAI) with machine learning algorithms. To support model establishment and validation, 1068 soil samples were collected from an arsenic (As) contaminated area in Zhongxiang, Hubei province, China. The average arsenic concentration was 39.88 mg/kg (SD = 213.70 mg/kg), with individual sample points determined as low risk (66.9%), medium risk (29.4%), or high risk (3.7%), respectively. Then, identified features were extracted from a HRAI image of the study area. Four machine learning algorithms were developed to predict As risk levels, including (i) support vector machine (SVM), (ii) multi-layer perceptron (MLP), (iii) random forest (RF), and (iii) extreme random forest (ERF). Among these, we found that the ERF algorithm performed best overall and that its prediction performance was generally better than that of traditional Kriging interpolation. The accuracy of ERF in test area 1 reached 0.87, performing better than RF (0.81), MLP (0.78) and SVM (0.77). The F1-score of ERF for discerning high-risk points in test area 1 was as high as 0.8. The complexity of the distribution of points with different risk levels was a decisive factor in model prediction ability. Identified features in the study area associated with fertilizer factories had the most important contribution to the ERF model. This study demonstrates that HRAI combined with machine learning has good potential to predict As soil risk levels.
Collapse
Affiliation(s)
- Xiyue Jia
- School of Environment, Tsinghua University, Beijing 100084, China
| | - Yining Cao
- School of Environment, Tsinghua University, Beijing 100084, China; School of Information, University of Michigan, Ann Arbor 48104, United States
| | - David O'Connor
- School of Real Estate and Land Management, Royal Agricultural University, Cirencester, GL7 1RS, United Kingdom
| | - Jin Zhu
- School of Environment, Tsinghua University, Beijing 100084, China
| | - Daniel C W Tsang
- Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, China
| | - Bin Zou
- School of Geosciences and Info-Physics, Central South University, Changsha, Hunan, China
| | - Deyi Hou
- School of Environment, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
12
|
Clink DJ, Klinck H. Unsupervised acoustic classification of individual gibbon females and the implications for passive acoustic monitoring. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13520] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Dena J. Clink
- Center for Conservation Bioacoustics Cornell Laboratory of Ornithology Cornell University Ithaca NY USA
| | - Holger Klinck
- Center for Conservation Bioacoustics Cornell Laboratory of Ornithology Cornell University Ithaca NY USA
| |
Collapse
|
13
|
Dezecache G, Zuberbühler K, Davila-Ross M, Dahl CD. A machine learning approach to infant distress calls and maternal behaviour of wild chimpanzees. Anim Cogn 2020; 24:443-455. [PMID: 33094407 DOI: 10.1007/s10071-020-01437-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 09/09/2020] [Accepted: 09/26/2020] [Indexed: 12/28/2022]
Abstract
Distress calls are an acoustically variable group of vocalizations ubiquitous in mammals and other animals. Their presumed function is to recruit help, but there has been much debate on whether the nature of the disturbance can be inferred from the acoustics of distress calls. We used machine learning to analyse episodes of distress calls of wild infant chimpanzees. We extracted exemplars from those distress call episodes and examined them in relation to the external event triggering them and the distance to the mother. In further steps, we tested whether the acoustic variants were associated with particular maternal responses. Our results suggest that, although infant chimpanzee distress calls are highly graded, they can convey information about discrete problems experienced by the infant and about distance to the mother, which in turn may help guide maternal parenting decisions. The extent to which mothers rely on acoustic cues alone (versus integrate other contextual-visual information) to decide upon intervening should be the focus of future research.
Collapse
Affiliation(s)
- Guillaume Dezecache
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland. .,Budongo Conservation Field Station, Masindi, Uganda. .,Department of Psychology, University of Portsmouth, Portsmouth, England, UK. .,Université Clermont Auvergne, CNRS, LAPSCO, Clermont-Ferrand, France.
| | - Klaus Zuberbühler
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland.,Budongo Conservation Field Station, Masindi, Uganda.,School of Psychology and Neuroscience, University of St Andrews, St Andrews, Scotland, UK
| | - Marina Davila-Ross
- Department of Psychology, University of Portsmouth, Portsmouth, England, UK
| | - Christoph D Dahl
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland. .,Graduate Institute of Mind, Brain and Consciousness, Taipei Medical University, Taipei, Taiwan. .,Brain and Consciousness Research Center, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan.
| |
Collapse
|
14
|
Knight EC, Sòlymos P, Scott C, Bayne EM. Validation prediction: a flexible protocol to increase efficiency of automated acoustic processing for wildlife research. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2020; 30:e02140. [PMID: 32335994 DOI: 10.1002/eap.2140] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 12/18/2019] [Accepted: 03/19/2020] [Indexed: 06/11/2023]
Abstract
Automated recognition is increasingly used to extract species detections from audio recordings; however, the time required to manually review each detection can be prohibitive. We developed a flexible protocol called "validation prediction" that uses machine learning to predict whether recognizer detections are true or false positives and can be applied to any recognizer type, ecological application, or analytical approach. Validation prediction uses a predictable relationship between recognizer score and the energy of an acoustic signal but can also incorporate any other ecological or spectral predictors (e.g., time of day, dominant frequency) that will help separate true from false-positive recognizer detections. First, we documented the relationship between recognizer score and the energy of an acoustic signal for two different recognizer algorithm types (hidden Markov models and convolutional neural networks). Next, we demonstrated our protocol using a case study of two species, the Common Nighthawk (Chordeiles minor) and Ovenbird (Seiurus aurocapilla). We reduced the number of detections that required validation by 75.7% and 42.9%, respectively, while retaining at least 98% of the true-positive detections. Validation prediction substantially improves the efficiency of using automated recognition on acoustic data sets. Our method can be of use to wildlife monitoring and research programs and will facilitate using automated recognition to mine bioacoustic data sets.
Collapse
Affiliation(s)
- Elly C Knight
- Department of Biological Sciences, CW405 Biological Sciences Centre, University of Alberta, Edmonton, Alberta, Canada
| | - Péter Sòlymos
- Department of Biological Sciences, CW405 Biological Sciences Centre, University of Alberta, Edmonton, Alberta, Canada
| | - Chris Scott
- Bishon House, Bishopstone, HR4 7HZ, Herefordshire, UK
| | - Erin M Bayne
- Department of Biological Sciences, CW405 Biological Sciences Centre, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
15
|
Clink DJ, Tasirin JS, Klinck H. Vocal individuality and rhythm in male and female duet contributions of a nonhuman primate. Curr Zool 2020; 66:173-186. [PMID: 32440276 PMCID: PMC7233616 DOI: 10.1093/cz/zoz035] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 06/13/2019] [Indexed: 12/02/2022] Open
Abstract
Duetting, or the stereotypical, repeated and often coordinated vocalizations between 2 individuals arose independently multiple times in the Order Primates. Across primate species, there exists substantial variation in terms of timing, degree of overlap, and sex-specificity of duet contributions. There is increasing evidence that primates can modify the timing of their duet contributions relative to their partner, and this vocal flexibility may have been an important precursor to the evolution of human language. Here, we present the results of a fine-scale analysis of Gursky's spectral tarsier Tarsius spectrumgurskyae duet phrases recorded in North Sulawesi, Indonesia. Specifically, we aimed to investigate individual-level variation in the female and male contributions to the duet, quantify individual- and pair-level differences in duet timing, and measure temporal precision of duetting individuals relative to their partner. We were able to classify female duet phrases to the correct individual with an 80% accuracy using support vector machines, whereas our classification accuracy for males was lower at 64%. Females were more variable than males in terms of timing between notes. All tarsier phrases exhibited some degree of overlap between callers, and tarsiers exhibited high temporal precision in their note output relative to their partners. We provide evidence that duetting tarsier individuals can modify their note output relative to their duetting partner, and these results support the idea that flexibility in vocal exchanges-a precursor to human language-evolved early in the primate lineage and long before the emergence of modern humans.
Collapse
Affiliation(s)
- Dena J Clink
- Bioacoustics Research Program, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
| | - Johny S Tasirin
- Faculty of Agriculture, Sam Ratulangi University, Manado, Indonesia
| | - Holger Klinck
- Bioacoustics Research Program, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
| |
Collapse
|
16
|
Zhang K, Liu T, Liu M, Li A, Xiao Y, Metzner W, Liu Y. Comparing context-dependent call sequences employing machine learning methods: an indication of syntactic structure of greater horseshoe bats. ACTA ACUST UNITED AC 2019; 222:jeb.214072. [PMID: 31753908 DOI: 10.1242/jeb.214072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 11/14/2019] [Indexed: 12/20/2022]
Abstract
For analysis of vocal syntax, accurate classification of call sequence structures in different behavioural contexts is essential. However, an effective, intelligent program for classifying call sequences from numerous recorded sound files is still lacking. Here, we employed three machine learning algorithms (logistic regression, support vector machine and decision trees) to classify call sequences of social vocalizations of greater horseshoe bats (Rhinolophus ferrumequinum) in aggressive and distress contexts. The three machine learning algorithms obtained highly accurate classification rates (logistic regression 98%, support vector machine 97% and decision trees 96%). The algorithms also extracted three of the most important features for the classification: the transition between two adjacent syllables, the probability of occurrences of syllables in each position of a sequence, and the characteristics of a sequence. The results of statistical analysis also supported the classification of the algorithms. The study provides the first efficient method for data mining of call sequences and the possibility of linguistic parameters in animal communication. It suggests the presence of song-like syntax in the social vocalizations emitted within a non-breeding context in a bat species.
Collapse
Affiliation(s)
- Kangkang Zhang
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, No. 2555, Street Jingyue, Northeast Normal University, Changchun 130117, China
| | - Tong Liu
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, No. 2555, Street Jingyue, Northeast Normal University, Changchun 130117, China
| | - Muxun Liu
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, No. 2555, Street Jingyue, Northeast Normal University, Changchun 130117, China
| | - Aoqiang Li
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, No. 2555, Street Jingyue, Northeast Normal University, Changchun 130117, China
| | - Yanhong Xiao
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, No. 2555, Street Jingyue, Northeast Normal University, Changchun 130117, China
| | - Walter Metzner
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ying Liu
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, No. 2555, Street Jingyue, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
17
|
Oikarinen T, Srinivasan K, Meisner O, Hyman JB, Parmar S, Fanucci-Kiss A, Desimone R, Landman R, Feng G. Deep convolutional network for animal sound classification and source attribution using dual audio recordings. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:654. [PMID: 30823820 PMCID: PMC6786887 DOI: 10.1121/1.5087827] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 12/28/2018] [Accepted: 01/02/2019] [Indexed: 06/09/2023]
Abstract
This paper introduces an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. The network in this paper can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram images as input. The network vastly increases data analysis capacity for researchers interested in studying marmoset vocalizations, and allows data collection in the home cage, in group housed animals.
Collapse
Affiliation(s)
- Tuomas Oikarinen
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Karthik Srinivasan
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Olivia Meisner
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Julia B Hyman
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Shivangi Parmar
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Adrian Fanucci-Kiss
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Robert Desimone
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Rogier Landman
- Stanley Center, Broad Institute, 57 Ames Street, Cambridge, Massachusetts 02139, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
18
|
Zhang YJ, Huang JF, Gong N, Ling ZH, Hu Y. Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:478. [PMID: 30075670 DOI: 10.1121/1.5047743] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 07/05/2018] [Indexed: 06/08/2023]
Abstract
This paper investigates the methods to detect and classify marmoset vocalizations automatically using a large data set of marmoset vocalizations and deep learning techniques. For vocalization detection, neural networks-based methods, including deep neural network (DNN) and recurrent neural network with long short-term memory units, are designed and compared against a conventional rule-based detection method. For vocalization classification, three different classification algorithms are compared, including a support vector machine (SVM), DNN, and long short-term memory recurrent neural networks (LSTM-RNNs). A 1500-min audio data set containing recordings from four pairs of marmoset twins and manual annotations is employed for experiments. Two test sets are built according to whether the test samples are produced by the marmosets in the training set (test set I) or not (test set II). Experimental results show that the LSTM-RNN-based detection method outperformed others and achieved 0.92% and 1.67% frame error rate on these two test sets. Furthermore, the deep learning models obtained higher classification accuracy than the SVM model, which was 95.60% and 91.67% on the two test sets, respectively.
Collapse
Affiliation(s)
- Ya-Jie Zhang
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, 443 Huangshan Road, Hefei 230027, China
| | - Jun-Feng Huang
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Chinese Academy of Sciences (CAS) Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, CAS, 320 Yueyang Road, Shanghai 200031, China
| | - Neng Gong
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Chinese Academy of Sciences (CAS) Key Laboratory of Primate Neurobiology, Shanghai Institutes for Biological Sciences, CAS, 320 Yueyang Road, Shanghai 200031, China
| | - Zhen-Hua Ling
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, 443 Huangshan Road, Hefei 230027, China
| | - Yu Hu
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, 443 Huangshan Road, Hefei 230027, China
| |
Collapse
|
19
|
A "voice patch" system in the primate brain for processing vocal information? Hear Res 2018; 366:65-74. [PMID: 29776691 DOI: 10.1016/j.heares.2018.04.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 04/14/2018] [Accepted: 04/25/2018] [Indexed: 12/13/2022]
Abstract
We review behavioural and neural evidence for the processing of information contained in conspecific vocalizations (CVs) in three primate species: humans, macaques and marmosets. We focus on abilities that are present and ecologically relevant in all three species: the detection and sensitivity to CVs; and the processing of identity cues in CVs. Current evidence, although fragmentary, supports the notion of a "voice patch system" in the primate brain analogous to the face patch system of visual cortex: a series of discrete, interconnected cortical areas supporting increasingly abstract representations of the vocal input. A central question concerns the degree to which the voice patch system is conserved in evolution. We outline challenges that arise and suggesting potential avenues for comparing the organization of the voice patch system across primate brains.
Collapse
|
20
|
Clink DJ, Crofoot MC, Marshall AJ. Application of a semi-automated vocal fingerprinting approach to monitor Bornean gibbon females in an experimentally fragmented landscape in Sabah, Malaysia. BIOACOUSTICS 2018. [DOI: 10.1080/09524622.2018.1426042] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Dena J. Clink
- Department of Anthropology, University of California, Davis, Davis, CA, USA
| | - Margaret C. Crofoot
- Department of Anthropology, University of California, Davis, Davis, CA, USA
- Smithsonian Tropical Research Institute, Balboa Ancon, Republic of Panama
| | - Andrew J. Marshall
- Department of Anthropology, Program in the Environment, and School for Natural Resources and Environment, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
21
|
Charrier I, Marchesseau S, Dendrinos P, Tounta E, Karamanlidis AA. Individual signatures in the vocal repertoire of the endangered Mediterranean monk seal: new perspectives for population monitoring. ENDANGER SPECIES RES 2017. [DOI: 10.3354/esr00829] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|