1
|
Yagnavajjula MK, Alku P, Rao KS, Mitra P. Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features. J Voice 2025; 39:757-763. [PMID: 36424242 DOI: 10.1016/j.jvoice.2022.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 10/21/2022] [Accepted: 10/21/2022] [Indexed: 11/22/2022]
Abstract
Neurogenic voice disorders (NVDs) are caused by damage or malfunction of the central or peripheral nervous system that controls vocal fold movement. In this paper, we investigate the potential of the Fisher vector (FV) encoding in automatic detection of people with NVDs. FVs are used to convert features from frame level (local descriptors) to utterance level (global descriptors). At the frame level, we extract two popular cepstral representations, namely, Mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction cepstral coefficients (PLPCCs), from acoustic voice signals. In addition, the MFCC features are also extracted from every frame of the glottal source signal computed using a glottal inverse filtering (GIF) technique. The global descriptors derived from the local descriptors are used to train a support vector machine (SVM) classifier. Experiments are conducted using voice signals from 80 healthy speakers and 80 patients with NVDs (40 with spasmodic dysphonia (SD) and 40 with recurrent laryngeal nerve palsy (RLNP)) taken from the Saarbruecken voice disorder (SVD) database. The overall results indicate that the use of the FV encoding leads to better identification of people with NVDs, compared to the defacto temporal encoding. Furthermore, the SVM trained using the combination of FVs derived from the cepstral and glottal features provides the overall best detection performance.
Collapse
Affiliation(s)
- Madhu Keerthana Yagnavajjula
- Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India; Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland.
| | - Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland
| | - Krothapalli Sreenivasa Rao
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pabitra Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
2
|
Tirronen S, Kadiri SR, Alku P. The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection. J Voice 2024; 38:975-982. [PMID: 35490081 DOI: 10.1016/j.jvoice.2022.03.021] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/21/2022] [Indexed: 10/18/2022]
Abstract
Automatic voice pathology detection is a research topic, which has gained increasing interest recently. Although methods based on deep learning are becoming popular, the classical pipeline systems based on a two-stage architecture consisting of a feature extraction stage and a classifier stage are still widely used. In these classical detection systems, frame-wise computation of mel-frequency cepstral coefficients (MFCCs) is the most popular feature extraction method. However, no systematic study has been conducted to investigate the effect of the MFCC frame length on automatic voice pathology detection. In this work, we studied the effect of the MFCC frame length in voice pathology detection using three disorders (hyperkinetic dysphonia, hypokinetic dysphonia and reflux laryngitis) from the Saarbrücken Voice Disorders (SVD) database. The detection performance was compared between speaker-dependent and speaker-independent scenarios as well as between speaking task -dependent and speaking task -independent scenarios. The Support Vector Machine, which is the most widely used classifier in the study area, was used as the classifier. The results show that the detection accuracy depended on the MFFC frame length in all the scenarios studied. The best detection accuracy was obtained by using a MFFC frame length of 500 ms with a shift of 5 ms.
Collapse
Affiliation(s)
- Saska Tirronen
- Department of Signal Processing and Acoustics, Aalto University, Finland
| | | | - Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, Finland
| |
Collapse
|
3
|
Kuo HC, Hsieh YP, Tseng HH, Wang CT, Fang SH, Tsao Y. Toward Real-World Voice Disorder Classification. IEEE Trans Biomed Eng 2023; 70:2922-2932. [PMID: 37099463 DOI: 10.1109/tbme.2023.3270532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
OBJECTIVE Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. METHODS This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain-invariant features. RESULTS The results show that the unweighted average recall in the noisy real-world domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. CONCLUSION By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. SIGNIFICANCE To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources.
Collapse
|
4
|
Zhang T, Liu X, Liu G, Shao Y. PVR-AFM: A Pathological Voice Repair System based on Non-linear Structure. J Voice 2023; 37:648-662. [PMID: 37717981 DOI: 10.1016/j.jvoice.2021.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 05/05/2021] [Accepted: 05/06/2021] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Speech signal processing has become an important technique to ensure that the voice interaction system communicates accurately with the user by improving the clarity or intelligibility of speech signals. However, most existing works only focus on whether to process the voice of average human but ignore the communication needs of individuals suffering from voice disorder, including voice-related professionals, older people, and smokers. To solve this demand, it is essential to design a non-invasive repair system that processes pathological voices. METHODS In this paper, we propose a repair system for multiple polyp vowels, such as /a/, /i/ and /u/. We utilize a non-linear model based on amplitude-modulation (AM) and a frequency-modulation (FM) structure to extract the pitch and formant of pathological voice. To solve the fracture and instability of pitch, we provide a pitch extraction algorithm, which ensures that pitch's stability and avoids the errors of double pitch caused by the instability of low-frequency signal. Furthermore, we design a formant reconstruction mechanism, which can effectively determine the frequency and bandwidth to accomplish formant repair. RESULTS Finally, spectrum observation and objective indicators show that the system has better performance in improving the intelligibility of pathological speech.
Collapse
Affiliation(s)
- Tao Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China, 300072
| | - Xiaonan Liu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China, 300072
| | - Ganjun Liu
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China, 300072.
| | - Yangyang Shao
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China, 300072
| |
Collapse
|
5
|
Frassineti L, Calà F, Sforza E, Onesimo R, Leoni C, Lanatà A, Zampino G, Manfredi C. Quantitative acoustical analysis of genetic syndromes in the number listing task. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
6
|
Warule P, Mishra SP, Deb S, Krajewski J. Sinusoidal model-based diagnosis of the common cold from the speech signal. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
|
7
|
Chen Z, Zhu P, Qiu W, Guo J, Li Y. Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2023; 58:279-294. [PMID: 36117378 DOI: 10.1111/1460-6984.12783] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Auditory-perceptual assessment of voice is a subjective procedure. Artificial intelligence with deep learning (DL) may improve the consistency and accessibility of this task. It is unclear how a DL model performs on different acoustic features. AIMS To develop a generalizable DL framework for identifying dysphonia using a multidimensional acoustic feature. METHODS & PROCEDURES Recordings of sustained phonations of /a/ and /i/ were retrospectively collected from a clinical database. Subjects contained 238 dysphonic and 223 vocally healthy speakers of Chinese Mandarin. All audio clips were split into multiple 1.5-s segments and normalized to the same loudness level. Mel frequency cepstral coefficients and mel-spectrogram were extracted from these standardized segments. Each set of features was used in a convolutional neural network (CNN) to perform a binary classification task. The best feature was obtained through a five-fold cross-validation on a random selection of 80% data. The resultant DL framework was tested on the remaining 20% data and a public German voice database. The performance of the DL framework was compared with those of two baseline machine-learning models. OUTCOMES & RESULTS The mel-spectrogram yielded the best model performance, with a mean area under the receiver operating characteristic curve of 0.972 and an accuracy of 92% in classifying audio segments. The resultant DL framework significantly outperformed both baseline models in detecting dysphonic subjects on both test sets. The best outcomes were achieved when classifications were made based on all segments of both vowels, with 95% accuracy, 92% recall, 98% precision and 98% specificity on the Chinese test set, and 92%, 95%, 90% and 89%, respectively, on the German set. CONCLUSIONS & IMPLICATIONS This study demonstrates the feasibility of DL for automatic detection of dysphonia. The mel-spectrogram is a preferred acoustic feature for the task. This framework may be used for vocal health screening and facilitate automatic perceptual evaluation of voice in the era of big data. WHAT THIS PAPER ADDS What is already known on this subject Auditory-perceptual assessment is the current gold standard in clinical evaluation of voice quality, but its value may be limited by the rater's reliability and accessibility. DL is a new method of artificial intelligence that can overcome these disadvantages and promote automatic voice assessment. This study explored the feasibility of a DL approach for automatic detection of dysphonia, along with a quantitative comparison of two common sets of acoustic features. What this study adds to existing knowledge A CNN model is excellent at decoding multidimensional acoustic features, outperforming the baseline parameter-based models in identifying dysphonic voices. The first 13 mel-frequency cepstral coefficients (MFCCs) are sufficient for this task. The mel-spectrogram results in greater performance, indicating the acoustic features are presented in a more favourable way than the MFCCs to the CNN model. What are the potential or actual clinical implications of this work? DL is a feasible method for the detection of dysphonia. The current DL framework may be used for remote vocal health screening or documenting voice recovery after treatment. In future, DL models may potentially be used to perform auditory-perceptual tasks in an automatic, efficient, reliable and low-cost manner.
Collapse
Affiliation(s)
- Zhen Chen
- Department of Rehabilitation Sciences, East China Normal University, Shanghai, China
- Department of Otolaryngology-Head & Neck Surgery, Eye, Ear, Nose and Throat Hospital, Fudan University, Shanghai, China
| | - Peixi Zhu
- Hilderbrand Department of Petroleum and Geosystems Engineering, University of Texas at Austin, Austin, TX, USA
| | - Wei Qiu
- Hangzhou Chenqing Heye Technology Co., Ltd, Hangzhou, Zhejiang, China
| | - Jiajie Guo
- State Key Laboratory of Digital Manufacturing Equipment and Technology, School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yike Li
- Department of Otolaryngology-Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
8
|
Shrivas A, Deshpande S, Gidaye G, Nirmal J, Ezzine K, Frikha M, Desai K, Shinde S, Oza AD, Burduhos-Nergis DD, Burduhos-Nergis DP. Employing Energy and Statistical Features for Automatic Diagnosis of Voice Disorders. Diagnostics (Basel) 2022; 12:diagnostics12112758. [PMID: 36428819 PMCID: PMC9689977 DOI: 10.3390/diagnostics12112758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/30/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022] Open
Abstract
The presence of laryngeal disease affects vocal fold(s) dynamics and thus causes changes in pitch, loudness, and other characteristics of the human voice. Many frameworks based on the acoustic analysis of speech signals have been created in recent years; however, they are evaluated on just one or two corpora and are not independent to voice illnesses and human bias. In this article, a unified wavelet-based paradigm for evaluating voice diseases is presented. This approach is independent of voice diseases, human bias, or dialect. The vocal folds' dynamics are impacted by the voice disorder, and this further modifies the sound source. Therefore, inverse filtering is used to capture the modified voice source. Furthermore, the fundamental frequency independent statistical and energy metrics are derived from each spectral sub-band to characterize the retrieved voice source. Speech recordings of the sustained vowel /a/ were collected from four different datasets in German, Spanish, English, and Arabic to run the several intra and inter-dataset experiments. The classifiers' achieved performance indicators show that energy and statistical features uncover vital information on a variety of clinical voices, and therefore the suggested approach can be used as a complementary means for the automatic medical assessment of voice diseases.
Collapse
Affiliation(s)
- Avinash Shrivas
- Department of Computer Science & Technology, Degree College of Physical Education, Sant Gadge Baba Amravati University, Amravati 444605, India
- Correspondence: (A.S.); (D.P.B.-N.); Tel.: +91-9819261821 (A.S.)
| | - Shrinivas Deshpande
- Department of Computer Science & Technology, Degree College of Physical Education, Sant Gadge Baba Amravati University, Amravati 444605, India
| | - Girish Gidaye
- Department of Electronics and Computer Science, Vidyalankar Institute of Technology, Mumbai University, Mumbai 400037, India
| | - Jagannath Nirmal
- Department of Electronics Engineering, Somaiya Vidyavihar University, Mumbai 400077, India
| | - Kadria Ezzine
- ATISP, ENET’COM, Sfax University, Sfax 3000, Tunisia
| | | | - Kamalakar Desai
- Department of Electronics and Telecommunication Engineering, Bharati Vidyapeeth’s College of Engineering, Shivaji University, Kolhapur 416013, India
| | - Sachin Shinde
- Department of Mechanical Engineering, Datta Meghe College of Engineering, Mumbai University, Airoli, Navi Mumbai 400708, India
| | - Ankit D. Oza
- Department of Computer Sciences and Engineering, Institute of Advanced Research, The University for Innovation, Gandhianagar 382426, India
| | - Dumitru Doru Burduhos-Nergis
- Faculty of Materials Science and Engineering, Gheorghe Asachi Technical University of Iasi, 700050 Iasi, Romania
| | - Diana Petronela Burduhos-Nergis
- Faculty of Materials Science and Engineering, Gheorghe Asachi Technical University of Iasi, 700050 Iasi, Romania
- Correspondence: (A.S.); (D.P.B.-N.); Tel.: +91-9819261821 (A.S.)
| |
Collapse
|
9
|
Deb S, Dandapat S. Analysis of out-of-breath speech for assessment of person’s physical fitness. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2022.101391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
10
|
Compton EC, Cruz T, Andreassen M, Beveridge S, Bosch D, Randall DR, Livingstone D. Developing an Artificial Intelligence Tool to Predict Vocal Cord Pathology in Primary Care Settings. Laryngoscope 2022. [PMID: 36226791 DOI: 10.1002/lary.30432] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/16/2022] [Accepted: 09/09/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Diagnostic tools for voice disorders are lacking for primary care physicians. Artificial intelligence (AI) tools may add to the armamentarium for physicians, decreasing the time to diagnosis and limiting the burden of dysphonia. METHODS Voice recordings of patients were collected from 2019 to 2021 using smartphones. The Saarbruecken dataset was included for comparison. Audio files were converted to mel-spectrograms using TensorFlow. Diagnostic categories were created to group pathology, including neurological and muscular disorders, inflammatory, mass lesions, and normal. The samples were further separated into sustained/a/and the rainbow passage. RESULTS Two hundred three prospective samples and 1131 samples were used from the Saarbruecken database. The AI detected abnormal pathology with an F1-score of 98%. The artificial neural network (ANN) differentiated key pathologies, including unilateral paralysis, laryngitis, adductor spasmodic dysphonia (ADSD), mass lesions, and normal samples with 39%-87% F-1 scores. The Calgary database models had higher F-1 scores in a head-to-head comparison to the Saarbruecken and combined datasets (87% vs. 58% and 50%). The AI outperformed otolaryngologists using a standardized test set of recordings (83% compared to 55% ± 15%). CONCLUSION An AI tool was created to differentiate pathology by individual or categorical diagnosis with high evaluation metrics. Prospective data should be collected in a controlled fashion to reduce intrinsic variability between recordings. Multi-center data collaborations are imperative to increase the prediction capability of AI tools for detecting vocal cord pathology. We provide proof-of-concept for an AI tool to assist primary care physicians in managing dysphonic patients. LEVEL OF EVIDENCE 3 Laryngoscope, 2022.
Collapse
Affiliation(s)
- Evan C Compton
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Tim Cruz
- Department of Data Science and Analytics, Faculty of Science, University of Calgary, Calgary, Alberta, Canada
| | - Meri Andreassen
- Section of Otolaryngology-Head and Neck Surgery, Calgary Voice Program, Alberta Health Services, Calgary, Alberta, Canada
| | - Shari Beveridge
- Section of Otolaryngology-Head and Neck Surgery, Calgary Voice Program, Alberta Health Services, Calgary, Alberta, Canada
| | - Doug Bosch
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Derrick R Randall
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Devon Livingstone
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
11
|
Pakravan M, Jahed M. Significant pathological voice discrimination by computing posterior distribution of balanced accuracy. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Wang SS, Wang CT, Lai CC, Tsao Y, Fang SH. Continuous Speech for Improved Learning Pathological Voice Disorders. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2022; 3:25-33. [PMID: 35399790 PMCID: PMC8940190 DOI: 10.1109/ojemb.2022.3151233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 01/11/2022] [Accepted: 02/09/2022] [Indexed: 11/05/2022] Open
Affiliation(s)
- Syu-Siang Wang
- Department of Electrical EngineeringYuan Ze University Taoyuan 320 Taiwan
| | - Chi-Te Wang
- Department of Electrical EngineeringYuan Ze University Taoyuan 320 Taiwan
- Department of Otolaryngology Head and Neck SurgeryFar Eastern Memorial Hospital New Taipei 220 Taiwan
| | - Chih-Chung Lai
- Department of Electrical EngineeringYuan Ze University Taoyuan 320 Taiwan
| | - Yu Tsao
- Research Center for Information Technology InnovationAcademia Sinica Taipei 115 Taiwan
| | - Shih-Hau Fang
- Department of Electrical EngineeringYuan Ze University Taoyuan 320 Taiwan
| |
Collapse
|
13
|
Sharma Y, Singh BK. One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 213:106487. [PMID: 34763173 DOI: 10.1016/j.cmpb.2021.106487] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 10/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Screening children for communicational disorders such as specific language impairment (SLI) is always challenging as it requires clinicians to follow a series of steps to evaluate the subjects. Artificial intelligence and computer-aided diagnosis have supported health professionals in making swift and error-free decisions about the neurodevelopmental state of children vis-à-vis language comprehension and production. Past studies have claimed that typical developing (TD) and SLI children show distinct vocal characteristics that can serve as discriminating facets between them. The objective of this study is to group children in SLI or TD categories by processing their raw speech signals using two proposed approaches: a customized convolutional neural network (CNN) model and a hybrid deep-learning framework where CNN is combined with long-short-term-memory (LSTM). METHOD We considered a publicly available speech database of SLI and typical children of Czech accents for this study. The convolution filters in both the proposed CNN and hybrid models (CNN-LSTM) estimated fuzzy-automated features from the speech utterance. We performed the experiments in five separate sessions. Data augmentations were performed in each of those sessions to enhance the training strength. RESULTS Our hybrid model exhibited a perfect 100% accuracy and F-measure for almost all the session-trials compared to CNN alone which achieved an average accuracy close to 90% and F-measure ≥ 92%. The models have further illustrated their robust classification essences by securing values of reliability indexes over 90%. CONCLUSION The results confirm the effectiveness of proposed approaches for the detection of SLI in children using their raw speech signals. Both the models do not require any dedicated feature extraction unit for their operations. The models may also be suitable for screening SLI and other neurodevelopmental disorders in children of different linguistic accents.
Collapse
Affiliation(s)
- Yogesh Sharma
- Department of Biomedical Engineering, National Institute of Technology Raipur, Chhattisgarh, 492010, India.
| | - Bikesh Kumar Singh
- Department of Biomedical Engineering, National Institute of Technology Raipur, Chhattisgarh, 492010, India.
| |
Collapse
|
14
|
Geng L, Shan H, Xiao Z, Wang W, Wei M. Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method. BIOMED ENG-BIOMED TE 2021; 66:613-625. [PMID: 34845886 DOI: 10.1515/bmt-2021-0112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 11/12/2021] [Indexed: 11/15/2022]
Abstract
Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.
Collapse
Affiliation(s)
- Lei Geng
- School of Life Sciences, Tiangong University, Tianjin, China.,Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
| | - Hongfeng Shan
- School of Electronic and Information Engineering, Tiangong University, Tianjin, China.,Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
| | - Zhitao Xiao
- School of Life Sciences, Tiangong University, Tianjin, China.,Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
| | - Wei Wang
- Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, China.,Institute of Otolaryngology of Tianjin, Tianjin, China.,Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China.,Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China.,Otolaryngology Clinical Quality Control Centre, Tianjin, China
| | - Mei Wei
- Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, China.,Institute of Otolaryngology of Tianjin, Tianjin, China.,Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China.,Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China.,Otolaryngology Clinical Quality Control Centre, Tianjin, China
| |
Collapse
|
15
|
Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11083450] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Massachusetts Eye and Ear Infirmary (MEEI) database is an international-standard training database for voice pathology detection (VPD) systems. However, there is a class-imbalanced distribution in normal and pathological voice samples and different types of pathological voice samples in the MEEI database. This study aimed to develop a VPD system that uses the fuzzy clustering synthetic minority oversampling technique algorithm (FC-SMOTE) to automatically detect and classify four types of pathological voices in a multi-class imbalanced database. The proposed FC-SMOTE algorithm processes the initial class-imbalanced dataset. A set of machine learning models was evaluated and validated using the resulting class-balanced dataset as an input. The effectiveness of the VPD system with FC-SMOTE was further verified by an external validation set and another pathological voice database (Saarbruecken Voice Database (SVD)). The experimental results show that, in the multi-classification of pathological voice for the class-imbalanced dataset, the method we propose can significantly improve the diagnostic accuracy. Meanwhile, FC-SMOTE outperforms the traditional imbalanced data oversampling algorithms, and it is preferred for imbalanced voice diagnosis in practical applications.
Collapse
|
16
|
Gómez-García J, Moro-Velázquez L, Arias-Londoño J, Godino-Llorente J. On the design of automatic voice condition analysis systems. Part III: review of acoustic modelling strategies. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102049] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Alves M, Silva G, Bispo BC, Dajer ME, Rodrigues PM. Voice Disorders Detection Through Multiband Cepstral Features of Sustained Vowel. J Voice 2021; 37:322-331. [PMID: 33663909 DOI: 10.1016/j.jvoice.2021.01.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 01/18/2021] [Accepted: 01/21/2021] [Indexed: 11/28/2022]
Abstract
This study aims to detect voice disorders related to vocal fold nodule, Reinke's edema and neurological pathologies through multiband cepstral features of the sustained vowel /a/. Detection is performed between pairs of study groups and multiband analysis is accomplished using the wavelet transform. For each pair of groups, a parameters selection is carried out. Time series of the selected parameters are used as input for four classifiers with leave-one-out cross validation. Classification accuracies of 100% are achieved for all pairs including the control group, surpassing the state-of-art methods based on cepstral features, while accuracies higher than 88.50% are obtained for the pathological pairs. The results indicated that the method may be adequate to assist in the diagnosis of the voice disorders addressed. The results must be updated in the future with a larger population to ensure generalization.
Collapse
Affiliation(s)
- Marco Alves
- Universidade Católica Portuguesa, CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal.
| | - Gabriel Silva
- Universidade Católica Portuguesa, CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal.
| | - Bruno C Bispo
- Department of Electrical and Electronic Engineering, Federal University of Santa Catarina, Florianópolis-SC, Brazil.
| | - María E Dajer
- Department of Electrical Engineering, Federal University of Technology - Paraná, Cornélio Procópio-PR, Brazil.
| | - Pedro M Rodrigues
- Universidade Católica Portuguesa, CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal.
| |
Collapse
|
18
|
Voice Pathologies Classification and Detection Using EMD-DWT Analysis Based on Higher Order Statistic Features. Ing Rech Biomed 2020. [DOI: 10.1016/j.irbm.2019.11.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
19
|
Zhang T, Shao Y, Wu Y, Pang Z, Liu G. Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder. IEEE J Biomed Health Inform 2020; 24:1940-1951. [PMID: 32149701 DOI: 10.1109/jbhi.2020.2978103] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.
Collapse
|
20
|
Barreira RR, Ling LL. Kullback–Leibler divergence and sample skewness for pathological voice quality assessment. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101697] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
21
|
Karan B, Sahu SS, Mahto K. Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2019.05.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
22
|
Wu H, Soraghan J, Lowit A, Di Caterina G. Convolutional Neural Networks for Pathological Voice Detection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2018:1-4. [PMID: 30440307 DOI: 10.1109/embc.2018.8513222] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Acoustic analysis using signal processing tools can be used to extract voice features to distinguish whether a voice is pathological or healthy. The proposed work uses spectrogram of voice recordings from a voice database as the input to a Convolutional Neural Network (CNN) for automatic feature extraction and classification of disordered and normal voice. The novel classifier achieved 88.5%, 66.2% and 77.0% accuracy on training, validation and testing data set respectively on 482 normal and 482 organic dysphonia speech files. It reveals that the proposed novel algorithm on the Saarbruecken Voice Database can effectively been used for screening pathological voice recordings.
Collapse
|
23
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
24
|
Hernández-García E, Moro-Velázquez L, González-Herranz R, Godino-Llorente JI, Plaza G. Effect of Functional Endoscopic Sinus Surgery on Voice and Speech Recognition. J Voice 2019; 34:650.e1-650.e6. [PMID: 30853310 DOI: 10.1016/j.jvoice.2019.02.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 02/19/2019] [Accepted: 02/20/2019] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Functional Endoscopic Sinus Surgery (FESS) is the surgery of choice for nasal polyposis and chronic rhinosinusitis. The aim of our study is to assess the influence of this surgery in the acoustic parameters of voice, and their implications in the systems of identification or verification of the speaker through the speech. MATERIAL AND METHODS A prospective study was performed between January 2017 and June 2017 including two groups of patients: those undergoing FESS, and a control group. Demographic data and GRBAS assessment were statistically analyzed. In addition, a recording of patients' voices was made with a subsequent acoustic analysis and automatic identification of the speaker through machine learning systems, establishing the equal error rate. Samples were taken before surgery, 2 weeks after surgery and 3 months later. RESULTS After FESS, a significant difference was observed in Grade, Roughness, Breathiness, Asthenia, Strain (GRBAS). Besides, acoustic analysis showed a significance decrease in fundamental frequency (F0), when compared with the control group. For the automatic identification of the speaker through computer systems, we found that the equal error rate is higher in the FESS group. CONCLUSIONS Results suggest that FESS produce a decrease of F0 and changes in the vocal tract that derive in an increase in the error of recognition of the speaker in FESS patients.
Collapse
Affiliation(s)
- Estefanía Hernández-García
- Department of Otorhinolaryngology, Hospital Universitario de Fuenlabrada, Universidad Rey Juan Carlos, Madrid, Spain.
| | - Laureano Moro-Velázquez
- Universidad Politécnica de Madrid, Madrid, Spain; Center for Language and Speech Processing, Johns Hopkins University, Baltimore, Maryland
| | - Ramón González-Herranz
- Department of Otorhinolaryngology, Hospital Universitario de Fuenlabrada, Universidad Rey Juan Carlos, Madrid, Spain
| | | | - Guillermo Plaza
- Department of Otorhinolaryngology, Hospital Universitario de Fuenlabrada, Universidad Rey Juan Carlos, Madrid, Spain
| |
Collapse
|
25
|
On the design of automatic voice condition analysis systems. Part II: Review of speaker recognition techniques and study on the effects of different variability factors. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.09.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
26
|
Effect of vowel context in cepstral and entropy analysis of pathological voices. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.08.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
27
|
Hegde S, Shetty S, Rai S, Dodderi T. A Survey on Machine Learning Approaches for Automatic Detection of Voice Disorders. J Voice 2018; 33:947.e11-947.e33. [PMID: 30316551 DOI: 10.1016/j.jvoice.2018.07.014] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 07/06/2018] [Accepted: 07/10/2018] [Indexed: 10/28/2022]
Abstract
The human voice production system is an intricate biological device capable of modulating pitch and loudness. Inherent internal and/or external factors often damage the vocal folds and result in some change of voice. The consequences are reflected in body functioning and emotional standing. Hence, it is paramount to identify voice changes at an early stage and provide the patient with an opportunity to overcome any ramification and enhance their quality of life. In this line of work, automatic detection of voice disorders using machine learning techniques plays a key role, as it is proven to help ease the process of understanding the voice disorder. In recent years, many researchers have investigated techniques for an automated system that helps clinicians with early diagnosis of voice disorders. In this paper, we present a survey of research work conducted on automatic detection of voice disorders and explore how it is able to identify the different types of voice disorders. We also analyze different databases, feature extraction techniques, and machine learning approaches used in these research works.
Collapse
Affiliation(s)
- Sarika Hegde
- NMAM Institute of Technology, Udupi, Karnataka, India.
| | | | - Smitha Rai
- NMAM Institute of Technology, Udupi, Karnataka, India
| | - Thejaswi Dodderi
- Nitte Institute of Speech & Hearing, Mangaluru, Karnataka, India
| |
Collapse
|
28
|
Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. J Voice 2018; 33:634-641. [PMID: 29567049 DOI: 10.1016/j.jvoice.2018.02.003] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 02/06/2018] [Indexed: 01/20/2023]
Abstract
OBJECTIVES Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms. METHODS This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms. RESULTS The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms. CONCLUSIONS By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives.
Collapse
|
29
|
Borsky M, Mehta DD, Van Stan JH, Gudnason J. Modal and non-modal voice quality classification using acoustic and electroglottographic features. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2017; 25:2281-2291. [PMID: 33748320 PMCID: PMC7971071 DOI: 10.1109/taslp.2017.2759002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and non-modal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.
Collapse
|
30
|
Torabi A, Zareayan Jahromy F, Daliri MR. Semantic Category-Based Classification Using Nonlinear Features and Wavelet Coefficients of Brain Signals. Cognit Comput 2017. [DOI: 10.1007/s12559-017-9487-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
31
|
Lopes LW, Batista Simões L, Delfino da Silva J, da Silva Evangelista D, da Nóbrega e Ugulino AC, Oliveira Costa Silva P, Jefferson Dias Vieira V. Accuracy of Acoustic Analysis Measurements in the Evaluation of Patients With Different Laryngeal Diagnoses. J Voice 2017; 31:382.e15-382.e26. [DOI: 10.1016/j.jvoice.2016.08.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 08/20/2016] [Accepted: 08/23/2016] [Indexed: 11/29/2022]
|
32
|
Golabbakhsh M, Abnavi F, Kadkhodaei Elyaderani M, Derakhshandeh F, Khanlar F, Rong P, Kuehn DP. Automatic identification of hypernasality in normal and cleft lip and palate patients with acoustic analysis of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:929. [PMID: 28253654 DOI: 10.1121/1.4976056] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Hypernasality is seen in cleft lip and palate patients who had undergone repair surgery as a consequence of velopharyngeal insufficiency. Hypernasality has been studied by evaluation of perturbation, noise measures, and cepstral analysis of speech. In this study, feature extraction and analysis were performed during running speech using six different sentences. Jitter, shimmer, Mel frequency cepstral coefficients, bionic wavelet transform entropy, and bionic wavelet transform energy were calculated. Support vector machines were employed for classification of data to normal or hypernasal. Finally, results of the automatic classification were compared with true labels to find accuracy, sensitivity, and specificity. Accuracy was higher when Mel frequency cepstral coefficients were combined with bionic wavelet transform energy feature. In the best case, accuracy of 85% with sensitivity of 82% and specificity of 85% was obtained. Results prove that acoustic analysis is a reliable method to find hypernasality in cleft lip and palate patients.
Collapse
Affiliation(s)
- Marzieh Golabbakhsh
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Hezarjarib Street, 81745-319, Isfahan, Iran
| | - Fatemeh Abnavi
- Craniofacial Anomalies and Cleft Palate Research Center, Isfahan University of Medical Sciences, Hezarjarib Street, 81745-319, Isfahan, Iran
| | - Mina Kadkhodaei Elyaderani
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Hezarjarib Street, 81745-319, Isfahan, Iran
| | - Fatemeh Derakhshandeh
- Craniofacial Anomalies and Cleft Palate Research Center, Isfahan University of Medical Sciences, Hezarjarib Street, 81745-319, Isfahan, Iran
| | - Fatemeh Khanlar
- Craniofacial Anomalies and Cleft Palate Research Center, Isfahan University of Medical Sciences, Hezarjarib Street, 81745-319, Isfahan, Iran
| | - Panying Rong
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, 36 First Avenue, Boston, Massachusetts 02129, USA
| | - David P Kuehn
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, 901 South Sixth Street, Champaign, Illinois 61820, USA
| |
Collapse
|
33
|
Muhammad G, Alsulaiman M, Ali Z, Mesallam TA, Farahat M, Malki KH, Al-nasheri A, Bencherif MA. Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.08.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
34
|
Moro-Velázquez L, Gómez-García JA, Godino-Llorente JI. Voice Pathology Detection Using Modulation Spectrum-Optimized Metrics. Front Bioeng Biotechnol 2016; 4:1. [PMID: 26835449 PMCID: PMC4718980 DOI: 10.3389/fbioe.2016.00001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 01/06/2016] [Indexed: 11/13/2022] Open
Abstract
There exist many acoustic parameters employed for pathological assessment tasks, which have served as tools for clinicians to distinguish between normophonic and pathological voices. However, many of these parameters require an appropriate tuning in order to maximize its efficiency. In this work, a group of new and already proposed modulation spectrum (MS) metrics are optimized considering different time and frequency ranges pursuing the maximization of efficiency for the detection of pathological voices. The optimization of the metrics is performed simultaneously in two different voice databases in order to identify what tuning ranges produce a better generalization. The experiments were cross-validated so as to ensure the validity of the results. A third database is used to test the optimized metrics. In spite of some differences, results indicate that the behavior of the metrics in the optimization process follows similar tendencies for the tuning databases, confirming the generalization capabilities of the proposed MS metrics. In addition, the tuning process reveals which bands of the modulation spectra have relevant information for each metric, which has a physical interpretation respecting the phonatory system. Efficiency values up to 90.6% are obtained in one tuning database, while in the other, the maximum efficiency reaches 71.1%. Obtained results also evidence a separability between normophonic and pathological states using the proposed metrics, which can be exploited for voice pathology detection or assessment.
Collapse
|
35
|
Automatic voice pathology detection and classification using vocal tract area irregularity. Biocybern Biomed Eng 2016. [DOI: 10.1016/j.bbe.2016.01.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
36
|
Orozco-Arroyave JR, Belalcazar-Bolanos EA, Arias-Londono JD, Vargas-Bonilla JF, Skodda S, Rusz J, Daqrouq K, Honig F, Noth E. Characterization Methods for the Detection of Multiple Voice Disorders: Neurological, Functional, and Laryngeal Diseases. IEEE J Biomed Health Inform 2015; 19:1820-8. [DOI: 10.1109/jbhi.2015.2467375] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
37
|
Mekyska J, Janousova E, Gomez-Vilda P, Smekal Z, Rektorova I, Eliasova I, Kostalova M, Mrackova M, Alonso-Hernandez JB, Faundez-Zanuy M, López-de-Ipiña K. Robust and complex approach of pathological speech signal analysis. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.02.085] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
38
|
Ghasemzadeh H, Tajik Khass M, Khalil Arjmandi M, Pooyan M. Detection of vocal disorders based on phase space parameters and Lyapunov spectrum. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2015.07.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Entropies from Markov Models as Complexity Measures of Embedded Attractors. ENTROPY 2015. [DOI: 10.3390/e17063595] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
40
|
Akbari A, Arjmandi MK. Employing linear prediction residual signal of wavelet sub-bands in automatic detection of laryngeal pathology. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2015.02.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
41
|
Cordeiro H, Fonseca J, Meneses C. Spectral envelope and periodic component in classification trees for pathological voice diagnostic. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2014:4607-10. [PMID: 25571018 DOI: 10.1109/embc.2014.6944650] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This work investigates the effectiveness of features from the spectral envelope such as the frequency and bandwidth of the first peak obtained from a 30(th) order Linear Predictive Coefficients (LPC) to identify pathological voices. Other spectral features are also investigated and tested to improve the recognition rate. The value of the Relative Power of the Periodic Component is combined with spectral features, to diagnose pathological voices. Healthy voices and five vocal folds pathologies are tested. Decision Tree classifiers are used to evaluate which features have pathological voice information. Based on those results a simple Decision Tree was implemented and 94% of all the subjects in the database are correctly diagnosed.
Collapse
|
42
|
Abstract
The development of an Automated System for Asthma Monitoring (ADAM) is described. This consists of a consumer electronics mobile platform running a custom application. The application acquires an audio signal from an external user-worn microphone connected to the device analog-to-digital converter (microphone input). This signal is processed to determine the presence or absence of cough sounds. Symptom tallies and raw audio waveforms are recorded and made easily accessible for later review by a healthcare provider. The symptom detection algorithm is based upon standard speech recognition and machine learning paradigms and consists of an audio feature extraction step followed by a Hidden Markov Model based Viterbi decoder that has been trained on a large database of audio examples from a variety of subjects. Multiple Hidden Markov Model topologies and orders are studied. Performance of the recognizer is presented in terms of the sensitivity and the rate of false alarm as determined in a cross-validation test.
Collapse
|
43
|
Muhammad G, Melhem M. Pathological voice detection and binary classification using MPEG-7 audio features. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2014.02.001] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
44
|
|
45
|
Tsanas A, Little MA, Fox C, Ramig LO. Objective Automatic Assessment of Rehabilitative Speech Treatment in Parkinson's Disease. IEEE Trans Neural Syst Rehabil Eng 2014; 22:181-90. [PMID: 26271131 DOI: 10.1109/tnsre.2013.2293575] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
46
|
Khan T, Westin J, Dougherty M. Cepstral separation difference: A novel approach for speech impairment quantification in Parkinson's disease. Biocybern Biomed Eng 2014. [DOI: 10.1016/j.bbe.2013.06.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
47
|
Hurtado-Jaramillo JS, Guarín DL, Orozco Á. Complex networks: application to pathology detection in voice signals. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2012:4229-32. [PMID: 23366861 DOI: 10.1109/embc.2012.6346900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The method of complex networks has been proposed as a novel approach to analyze time series from a new perspective. However, only few studies have applied this methodology to certain types of pseudo-periodic signals. In this article, the network-based technique is applied on voice signals, a kind of pseudo-periodic signals which has not been analyzed using complex networks, to differentiate between a healthy subject and subjects with pathological disorders. The results obtained demonstrated that through a set of statistic computed from the complex networks is possible to differentiate between healthy and non-healthy subjects, contrary to what was observed using well known non-linear statistics, such as Lempel-Ziv complexity and sample entropy. We conclude that by seeing voice signals as complex networks new information can be extracted from the time series that may help in the diagnosis of pathologies.
Collapse
|
48
|
Analysis of Speech from People with Parkinson’s Disease through Nonlinear Dynamics. ADVANCES IN NONLINEAR SPEECH PROCESSING 2013. [DOI: 10.1007/978-3-642-38847-7_15] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
49
|
Blanco JL, Hernández LA, Fernández R, Ramos D. Improving Automatic Detection of Obstructive Sleep Apnea Through Nonlinear Analysis of Sustained Speech. Cognit Comput 2012. [DOI: 10.1007/s12559-012-9168-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
|