1
|
Abstract
AbstractExcellent performance has been demonstrated in implementing challenging agricultural production processes using modern information technology, especially in the use of artificial intelligence methods to improve modern production environments. However, most of the existing work uses visual methods to train models that extract image features of organisms to analyze their behavior, and it may not be truly intelligent. Because vocal animals transmit information through grunts, the information obtained directly from the grunts of pigs is more useful to understand their behavior and emotional state, which is important for monitoring and predicting the health conditions and abnormal behavior of pigs. We propose a sound classification model called TransformerCNN, which combines the advantages of CNN spatial feature representation and the Transformer sequence coding to form a powerful global feature perception and local feature extraction capability. Through detailed qualitative and quantitative evaluations and by comparing state-of-the-art traditional animal sound recognition methods with deep learning methods, we demonstrate the advantages of our approach for classifying domestic pig sounds. The scores for domestic pig sound recognition accuracy, AUC and recall were 96.05%, 98.37% and 90.52%, respectively, all higher than the comparison model. In addition, it has good robustness and generalization capability with low variation in performance for different input features.
Collapse
|
2
|
Trapanotto M, Nanni L, Brahnam S, Guo X. Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. J Imaging 2022; 8:jimaging8040096. [PMID: 35448223 PMCID: PMC9029749 DOI: 10.3390/jimaging8040096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/17/2022] [Accepted: 03/29/2022] [Indexed: 02/05/2023] Open
Abstract
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.
Collapse
Affiliation(s)
- Martino Trapanotto
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Loris Nanni
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Sheryl Brahnam
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
- Correspondence: ; Tel.: +1-417-873-9979
| | - Xiang Guo
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
| |
Collapse
|
3
|
Madhusudhana S, Shiu Y, Klinck H, Fleishman E, Liu X, Nosal EM, Helble T, Cholewiak D, Gillespie D, Širović A, Roch MA. Improve automatic detection of animal call sequences with temporal context. J R Soc Interface 2021; 18:20210297. [PMID: 34283944 PMCID: PMC8292017 DOI: 10.1098/rsif.2021.0297] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Many animals rely on long-form communication, in the form of songs, for vital functions such as mate attraction and territorial defence. We explored the prospect of improving automatic recognition performance by using the temporal context inherent in song. The ability to accurately detect sequences of calls has implications for conservation and biological studies. We show that the performance of a convolutional neural network (CNN), designed to detect song notes (calls) in short-duration audio segments, can be improved by combining it with a recurrent network designed to process sequences of learned representations from the CNN on a longer time scale. The combined system of independently trained CNN and long short-term memory (LSTM) network models exploits the temporal patterns between song notes. We demonstrate the technique using recordings of fin whale (Balaenoptera physalus) songs, which comprise patterned sequences of characteristic notes. We evaluated several variants of the CNN + LSTM network. Relative to the baseline CNN model, the CNN + LSTM models reduced performance variance, offering a 9–17% increase in area under the precision–recall curve and a 9–18% increase in peak F1-scores. These results show that the inclusion of temporal information may offer a valuable pathway for improving the automatic recognition and transcription of wildlife recordings.
Collapse
Affiliation(s)
- Shyam Madhusudhana
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
| | - Yu Shiu
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
| | - Holger Klinck
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA.,Marine Mammal Institute, Department of Fisheries, Wildlife, and Conservation Sciences, Oregon State University, Corvallis, OR, USA
| | - Erica Fleishman
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, USA
| | - Xiaobai Liu
- Department of Computer Science, San Diego State University, San Diego, CA, USA
| | - Eva-Marie Nosal
- Department of Ocean and Resources Engineering, University of Hawai'i at Mānoa, Honolulu, HI, USA
| | - Tyler Helble
- US Navy, Naval Information Warfare Center Pacific, San Diego, CA, USA
| | - Danielle Cholewiak
- Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Woods Hole, MA, USA
| | - Douglas Gillespie
- Sea Mammal Research Unit, Scottish Oceans Institute, University of St Andrews, St Andrews, UK
| | - Ana Širović
- Marine Biology Department, Texas A&M University at Galveston, Galveston, TX, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, CA, USA
| |
Collapse
|
4
|
Ekpezu AO, Wiafe I, Katsriku F, Yaokumah W. Using deep learning for acoustic event classification: The case of natural disasters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2926. [PMID: 33940915 DOI: 10.1121/10.0004771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/04/2021] [Indexed: 06/12/2023]
Abstract
This study proposes a sound classification model for natural disasters. Deep learning techniques, a convolutional neural network (CNN) and long short-term memory (LSTM), were used to train two individual classifiers. The study was conducted using a dataset acquired online and truncated at 0.1 s to obtain a total of 12 937 sound segments. The result indicated that acoustic signals are effective for classifying natural disasters using machine learning techniques. The classifiers serve as an alternative effective approach to disaster classification. The CNN model obtained a classification accuracy of 99.96%, whereas the LSTM obtained an accuracy of 99.90%. The misclassification rates obtained in this study for the CNN and LSTM classifiers (i.e., 0.4% and 0.1%, respectively) suggest less classification errors when compared to existing studies. Future studies may investigate how to implement such classifiers for the early detection of natural disasters in real time.
Collapse
Affiliation(s)
- Akon O Ekpezu
- Department of Computer Science, University of Ghana, Post Office Box 163, Legon, Accra, Ghana
| | - Isaac Wiafe
- Department of Computer Science, University of Ghana, Post Office Box 163, Legon, Accra, Ghana
| | - Ferdinand Katsriku
- Department of Computer Science, University of Ghana, Post Office Box 163, Legon, Accra, Ghana
| | - Winfred Yaokumah
- Department of Computer Science, University of Ghana, Post Office Box 163, Legon, Accra, Ghana
| |
Collapse
|
5
|
Shiu Y, Palmer KJ, Roch MA, Fleishman E, Liu X, Nosal EM, Helble T, Cholewiak D, Gillespie D, Klinck H. Deep neural networks for automated detection of marine mammal species. Sci Rep 2020; 10:607. [PMID: 31953462 PMCID: PMC6969184 DOI: 10.1038/s41598-020-57549-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 12/20/2019] [Indexed: 11/25/2022] Open
Abstract
Deep neural networks have advanced the field of detection and classification and allowed for effective identification of signals in challenging data sets. Numerous time-critical conservation needs may benefit from these methods. We developed and empirically studied a variety of deep neural networks to detect the vocalizations of endangered North Atlantic right whales (Eubalaena glacialis). We compared the performance of these deep architectures to that of traditional detection algorithms for the primary vocalization produced by this species, the upcall. We show that deep-learning architectures are capable of producing false-positive rates that are orders of magnitude lower than alternative algorithms while substantially increasing the ability to detect calls. We demonstrate that a deep neural network trained with recordings from a single geographic region recorded over a span of days is capable of generalizing well to data from multiple years and across the species’ range, and that the low false positives make the output of the algorithm amenable to quality control for verification. The deep neural networks we developed are relatively easy to implement with existing software, and may provide new insights applicable to the conservation of endangered species.
Collapse
Affiliation(s)
- Yu Shiu
- Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA.
| | - K J Palmer
- Department of Computer Science, San Diego State University, San Diego, CA, 92182, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, CA, 92182, USA
| | - Erica Fleishman
- Department of Fish, Wildlife and Conservation Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Xiaobai Liu
- Department of Computer Science, San Diego State University, San Diego, CA, 92182, USA
| | - Eva-Marie Nosal
- Department of Ocean and Resources Engineering, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA
| | - Tyler Helble
- US Navy, Space and Naval Warfare Systems Command, System Center Pacific, San Diego, CA, 92152, USA
| | - Danielle Cholewiak
- Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Woods Hole, MA, 02543, USA
| | - Douglas Gillespie
- Sea Mammal Research Unit, Scottish Oceans Institute, University of St. Andrews, St Andrews, Fife, KY16 8LB, Scotland
| | - Holger Klinck
- Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
| |
Collapse
|
6
|
Oikarinen T, Srinivasan K, Meisner O, Hyman JB, Parmar S, Fanucci-Kiss A, Desimone R, Landman R, Feng G. Deep convolutional network for animal sound classification and source attribution using dual audio recordings. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:654. [PMID: 30823820 PMCID: PMC6786887 DOI: 10.1121/1.5087827] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 12/28/2018] [Accepted: 01/02/2019] [Indexed: 06/09/2023]
Abstract
This paper introduces an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. The network in this paper can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram images as input. The network vastly increases data analysis capacity for researchers interested in studying marmoset vocalizations, and allows data collection in the home cage, in group housed animals.
Collapse
Affiliation(s)
- Tuomas Oikarinen
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Karthik Srinivasan
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Olivia Meisner
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Julia B Hyman
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Shivangi Parmar
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Adrian Fanucci-Kiss
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Robert Desimone
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| | - Rogier Landman
- Stanley Center, Broad Institute, 57 Ames Street, Cambridge, Massachusetts 02139, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, Massachusetts 02139, USA
| |
Collapse
|