1
|
Dong H, Wu H, Yang G, Zhang J, Wan K. A multi-branch convolutional neural network for snoring detection based on audio. Comput Methods Biomech Biomed Engin 2024:1-12. [PMID: 38372231 DOI: 10.1080/10255842.2024.2317438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/03/2024] [Indexed: 02/20/2024]
Abstract
Obstructive sleep apnea (OSA) is associated with various health complications, and snoring is a prominent characteristic of this disorder. Therefore, the exploration of a concise and effective method for detecting snoring has consistently been a crucial aspect of sleep medicine. As the easily accessible data, the identification of snoring through sound analysis offers a more convenient and straightforward method. The objective of this study was to develop a convolutional neural network (CNN) for classifying snoring and non-snoring events based on audio. This study utilized Mel-frequency cepstral coefficients (MFCCs) as a method for extracting features during the preprocessing of raw data. In order to extract multi-scale features from the frequency domain of sound sources, this study proposes the utilization of a multi-branch convolutional neural network (MBCNN) for the purpose of classification. The network utilized asymmetric convolutional kernels to acquire additional information, while the adoption of one-hot encoding labels aimed to mitigate the impact of labels. The experiment tested the network's performance by utilizing a publicly available dataset consisting of 1,000 sound samples. The test results indicate that the MBCNN achieved a snoring detection accuracy of 99.5%. The integration of multi-scale features and the implementation of MBCNN, based on audio data, have demonstrated a substantial improvement in the performance of snoring classification.
Collapse
Affiliation(s)
- Hao Dong
- School of Computer Science, Zhongyuan University of Technology, Henan, China
- School of Computing and Artificial Intelligence, Huanghuai University, Henan, China
| | - Haitao Wu
- School of Computing and Artificial Intelligence, Huanghuai University, Henan, China
- Henan Key Laboratory of Smart Lighting, Henan, China
| | - Guan Yang
- School of Computer Science, Zhongyuan University of Technology, Henan, China
| | - Junming Zhang
- School of Computing and Artificial Intelligence, Huanghuai University, Henan, China
- Henan Key Laboratory of Smart Lighting, Henan, China
- Henan Joint International Research Laboratory of Behavior Optimization Control for Smart Robots, Henan, China
- Zhumadian Artificial Intelligence and Medical Engineering Technical Research Centre, Henan, China
| | - Keqin Wan
- School of Computing and Artificial Intelligence, Huanghuai University, Henan, China
| |
Collapse
|
2
|
Martin VP, Rouas JL, Micoulaud-Franchi JA, Philip P, Krajewski J. How to Design a Relevant Corpus for Sleepiness Detection Through Voice? Front Digit Health 2021; 3:686068. [PMID: 34713156 PMCID: PMC8521834 DOI: 10.3389/fdgth.2021.686068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/19/2021] [Indexed: 12/27/2022] Open
Abstract
This article presents research on the detection of pathologies affecting speech through automatic analysis. Voice processing has indeed been used for evaluating several diseases such as Parkinson, Alzheimer, or depression. If some studies present results that seem sufficient for clinical applications, this is not the case for the detection of sleepiness. Even two international challenges and the recent advent of deep learning techniques have still not managed to change this situation. This article explores the hypothesis that the observed average performances of automatic processing find their cause in the design of the corpora. To this aim, we first discuss and refine the concept of sleepiness related to the ground-truth labels. Second, we present an in-depth study of four corpora, bringing to light the methodological choices that have been made and the underlying biases they may have induced. Finally, in light of this information, we propose guidelines for the design of new corpora.
Collapse
Affiliation(s)
- Vincent P. Martin
- Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, CNRS–UMR 5800, Bordeaux INP, Talence, France
| | - Jean-Luc Rouas
- Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, CNRS–UMR 5800, Bordeaux INP, Talence, France
| | | | - Pierre Philip
- Sommeil, Addiction et Neuropsychiatrie, University of Bordeaux, CNRS–USR 3413, CHU Pellegrin, Bordeaux, France
| | - Jarek Krajewski
- Engineering Psychology, Rhenish University of Applied Science, Cologne, Germany
| |
Collapse
|
3
|
A novel MFCC-NN learning model for voice communication through Li-Fi for motion control of a robotic vehicle. Soft comput 2019. [DOI: 10.1007/s00500-019-04118-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
4
|
Continuous Driver's Gaze Zone Estimation Using RGB-D Camera. SENSORS 2019; 19:s19061287. [PMID: 30875740 PMCID: PMC6471141 DOI: 10.3390/s19061287] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 02/22/2019] [Accepted: 02/24/2019] [Indexed: 11/26/2022]
Abstract
The driver gaze zone is an indicator of a driver’s attention and plays an important role in the driver’s activity monitoring. Due to the bad initialization of point-cloud transformation, gaze zone systems using RGB-D cameras and ICP (Iterative Closet Points) algorithm do not work well under long-time head motion. In this work, a solution for a continuous driver gaze zone estimation system in real-world driving situations is proposed, combining multi-zone ICP-based head pose tracking and appearance-based gaze estimation. To initiate and update the coarse transformation of ICP, a particle filter with auxiliary sampling is employed for head state tracking, which accelerates the iterative convergence of ICP. Multiple templates for different gaze zone are applied to balance the templates revision of ICP under large head movement. For the RGB information, an appearance-based gaze estimation method with two-stage neighbor selection is utilized, which treats the gaze prediction as the combination of neighbor query (in head pose and eye image feature space) and linear regression (between eye image feature space and gaze angle space). The experimental results show that the proposed method outperforms the baseline methods on gaze estimation, and can provide a stable head pose tracking for driver behavior analysis in real-world driving scenarios.
Collapse
|
5
|
Schuller B, Weninger F, Zhang Y, Ringeval F, Batliner A, Steidl S, Eyben F, Marchi E, Vinciarelli A, Scherer K, Chetouani M, Mortillaro M. Affective and behavioural computing: Lessons learnt from the First Computational Paralinguistics Challenge. COMPUT SPEECH LANG 2019. [DOI: 10.1016/j.csl.2018.02.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
6
|
Wang Y, Zhao T, Ding X, Peng J, Bian J, Fu X. Learning a gaze estimator with neighbor selection from large-scale synthetic eye images. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2017.10.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
Random Deep Belief Networks for Recognizing Emotions from Speech Signals. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2017:1945630. [PMID: 28356908 PMCID: PMC5357547 DOI: 10.1155/2017/1945630] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 01/21/2017] [Accepted: 02/06/2017] [Indexed: 11/17/2022]
Abstract
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Collapse
|
8
|
|
9
|
Álvarez A, Sierra B, Arruti A, López-Gil JM, Garay-Vitoria N. Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech. SENSORS 2015; 16:s16010021. [PMID: 26712757 PMCID: PMC4732054 DOI: 10.3390/s16010021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 12/09/2015] [Accepted: 12/17/2015] [Indexed: 11/16/2022]
Abstract
In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one.
Collapse
Affiliation(s)
- Aitor Álvarez
- Vicomtech-IK4. Human Speech and Language Technologies Department, Paseo Mikeletegi 57, Parque Científico y Tecnológico de Gipuzkoa, 20009 Donostia-San Sebastián, Spain.
| | - Basilio Sierra
- University of the Basque Country (UPV/EHU), Paseo de Manuel Lardizabal 1, 20018 Donostia-San Sebastián, Spain.
| | - Andoni Arruti
- University of the Basque Country (UPV/EHU), Paseo de Manuel Lardizabal 1, 20018 Donostia-San Sebastián, Spain.
| | - Juan-Miguel López-Gil
- University of the Basque Country (UPV/EHU), Paseo de Manuel Lardizabal 1, 20018 Donostia-San Sebastián, Spain.
| | - Nestor Garay-Vitoria
- University of the Basque Country (UPV/EHU), Paseo de Manuel Lardizabal 1, 20018 Donostia-San Sebastián, Spain.
| |
Collapse
|
10
|
An Emotion Detection System Based on Multi Least Squares Twin Support Vector Machine. ACTA ACUST UNITED AC 2014. [DOI: 10.1155/2014/282659] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Posttraumatic stress disorder (PTSD), bipolar manic disorder (BMD), obsessive compulsive disorder (OCD), depression, and suicide are some major problems existing in civilian and military life. The change in emotion is responsible for such type of diseases. So, it is essential to develop a robust and reliable emotion detection system which is suitable for real world applications. Apart from healthcare, importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. Detection of emotion in speech can be applied in a variety of situations to allocate limited human resources to clients with the highest levels of distress or need, such as in automated call centers or in a nursing home. In this paper, we used a novel multi least squares twin support vector machine classifier in order to detect seven different emotions such as anger, happiness, sadness, anxiety, disgust, panic, and neutral emotions. The experimental result indicates better performance of the proposed technique over other existing approaches. The result suggests that the proposed emotion detection system may be used for screening of mental status.
Collapse
|