1
|
Verde L, Marulli F, De Fazio R, Campanile L, Marrone S. HEAR set: A ligHtwEight acoustic paRameters set to assess mental health from voice analysis. Comput Biol Med 2024; 182:109021. [PMID: 39236660 DOI: 10.1016/j.compbiomed.2024.109021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 06/23/2024] [Accepted: 08/09/2024] [Indexed: 09/07/2024]
Abstract
BACKGROUND Voice analysis has significant potential in aiding healthcare professionals with detecting, diagnosing, and personalising treatment. It represents an objective and non-intrusive tool for supporting the detection and monitoring of specific pathologies. By calculating various acoustic features, voice analysis extracts valuable information to assess voice quality. The choice of these parameters is crucial for an accurate assessment. METHOD In this paper, we propose a lightweight acoustic parameter set, named HEAR, able to evaluate voice quality to assess mental health. In detail, this consists of jitter, spectral centroid, Mel-frequency cepstral coefficients, and their derivates. The choice of parameters for the proposed set was influenced by the explainable significance of each acoustic parameter in the voice production process. RESULTS The reliability of the proposed acoustic set to detect the early symptoms of mental disorders was evaluated in an experimental phase. Voices of subjects suffering from different mental pathologies, selected from available databases, were analysed. The performance obtained from the HEAR features was compared with that obtained by analysing features selected from toolkits widely used in the literature, as with those obtained using learned procedures. The best performance in terms of MAE and RMSE was achieved for the detection of depression (5.32 and 6.24 respectively). For the detection of psychogenic dysphonia and anxiety, the highest accuracy rates were about 75 % and 97 %, respectively. CONCLUSIONS The comparative evaluation was carried out to assess the performance of the proposed approach, demonstrating a reliable capability to highlight affective physiological alterations of voice quality due to the considered mental disorders.
Collapse
Affiliation(s)
- Laura Verde
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy.
| | - Fiammetta Marulli
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| | - Roberta De Fazio
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| | - Lelio Campanile
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| | - Stefano Marrone
- Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Viale Lincoln 5, Caserta, 81100, Italy
| |
Collapse
|
2
|
Herbst CT. Performance Evaluation of Subharmonic-to-Harmonic Ratio (SHR) Computation. J Voice 2021; 35:365-375. [DOI: 10.1016/j.jvoice.2019.11.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/09/2019] [Accepted: 11/11/2019] [Indexed: 10/24/2022]
|
3
|
Acoustic Characteristics of the Voice for Brazilian Portuguese Speakers Across the Life Span. J Voice 2020; 36:876.e17-876.e26. [PMID: 33041178 DOI: 10.1016/j.jvoice.2020.09.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 09/21/2020] [Accepted: 09/23/2020] [Indexed: 01/10/2023]
Abstract
INTRODUCTION Vocal changes occur across the life span and can be reflected in acoustic measurements. OBJECTIVE To investigate the characteristics of voice production of the Brazilian Portuguese speakers across the life span based on acoustic measures of Fundamental Frequency (fo) and noise-to-harmonic ratio (NHR) and to verify the differences in these measures between men and women. METHODS A total of 526 recordings from Brazilian Portuguese speakers aged 5-93 years were included. Voices from these speakers were judged with normal vocal quality for their age using the G parameter of the GRBAS scale. The recordings were divided into 12 age groups (5-7 years; 8-9; 10-11; 12; 13-15; 16-18; 19-29; 30-39; 40-49; 50-59; 60-69 and, 70-93 years old). Acoustic analysis was conducted, extracting the parameters fo and NHR through Multi-Dimensional Voice Programsoftware. RESULTS For women, there was a gradual decrease in fo from childhood to older age. Older women (60-93 years old) showed a lower fo than age groups up to 19-29 years (P< 0.00). For men, there was a decrease in fo up to the age group of 13-15 years (P< 0.00) and, then, it remained stable. Differences between sexes occurred from 12 years old, with higher fo values for women than men (P< 0.00). The NHR parameter remained stable across the life span for women while higher values for older subjects were found for men (P< 0.04). Regarding sex, men showed a higher NHR value than women (P< 0.002). CONCLUSION Vocal changes occur across the life span and are reflected in the acoustic measure of fo for men and women. The beginning of vocal changes occurs from 12 years old with differences between sexes. The NHR measure was sensitive to indicate changes over a lifetime for men, with higher values for older subjects.
Collapse
|
4
|
Saggio G, Costantini G. Worldwide Healthy Adult Voice Baseline Parameters: A Comprehensive Review. J Voice 2020; 36:637-649. [PMID: 33039203 DOI: 10.1016/j.jvoice.2020.08.028] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/20/2020] [Accepted: 08/21/2020] [Indexed: 12/17/2022]
Abstract
The voice results in acoustic signals analyzed and synthetized at first for telecommunication matters, and more recently investigated for medical purposes. In particular, voice signal characteristics can evidence individual health conditions useful for screening, diagnostic and remote monitoring aims. Within this frame, the knowledge of baseline features of healthy voice is mandatory, in order to balance a comparison with their unhealthy counterpart. However, the baseline features of the human voice depend on gender, age-range and ethnicity and, as far as we know, no work reports as those features spread worldwide. This paper intends to cover this lack. Our database research yielded 179 relevant published studies, retrieved using digital libraries of IEEE Xplore, Scopus, Web of Science, Iop Science, Taylor and Francis Online, and Scitepress. These relevant studies report different features, among which here we consider the most investigated ones, within the most investigated age-range. In particular, the features are the fundamental frequency, the jitter, the shimmer, the harmonic-to-noise ratio, and the cepstral peak prominence, the most investigated age-range is within 20-40 years and, related to the ethnicity, 20 countries are considered.
Collapse
Affiliation(s)
- Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
5
|
Lopes LW, da Silva ACF, da Silva IM, de Paiva MAA, Silva SIDN, Almeida LNA, Ribeiro VV. Evidence of Internal Consistency in the Spectrographic Analysis Protocol. J Voice 2020; 36:445-456. [PMID: 32782177 DOI: 10.1016/j.jvoice.2020.07.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 07/10/2020] [Accepted: 07/13/2020] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To verify the validity in the internal consistency in the spectrographic analysis protocol (SAP). MATERIAL AND METHODS Thirty-nine students of the Speech-Language Pathology graduate program and 38 speech-language pathologists, specialized in voice, participated in the study. The participants made visual inspections of 10 spectrograms and marked the items of the SAP. For analysis of the internal consistency in the SAP, the exploratory factor analysis (EFA) and confirmatory factor analysis were performed. RESULTS Most items showed corrected item-total correlation above 0.3, indicating that the items have a good relationship with each other and with the SAP as a whole. Six items presented values below the average, suggesting the exclusion of these from the construct. However, three of these were maintained because they were judged as important parameters in clinical practice, requiring the training of judges when using the SAP to properly understand the items. The EFA regrouped the previous domains of the SAP into three factors. All items presented a factor load above 0.4, suggesting the retention of all, except for the items previously indicated, for exclusion. The confirmatory factor analysis corroborated with the EFA and its indexes. CONCLUSION The SAP has good internal consistency. All items have a good degree of relationship with each other and contribute positively to the protocol as a whole. The final version of the SAP, at this stage, has 15 items (from the 25 items of the initial SAP version), distributed among three domains.
Collapse
Affiliation(s)
- Leonardo Wanderley Lopes
- Speech-Language Pathology Department, Universidade Federal da Paraíba - UFPB, Cidade Universitária, João Pessoa, Paraíba, Brazil.
| | - Allan Carlos França da Silva
- Speech-Language Pathology Department, Universidade Federal da Paraíba - UFPB, Cidade Universitária, João Pessoa, Paraíba, Brazil
| | - Itacely Marinho da Silva
- Speech-Language Pathology Department, Universidade Federal da Paraíba - UFPB, Cidade Universitária, João Pessoa, Paraíba, Brazil
| | - Maxsuel Alves Avelino de Paiva
- Speech-Language Pathology Department, Universidade Federal da Paraíba - UFPB, Cidade Universitária, João Pessoa, Paraíba, Brazil
| | - Saulo Iordan do Nascimento Silva
- Speech-Language Pathology Department, Universidade Federal da Paraíba - UFPB, Cidade Universitária, João Pessoa, Paraíba, Brazil
| | - Larissa Nadjara Alves Almeida
- Program of Decision and Health Models, Universidade Federal da Paraíba - UFPB, Cidade Universitária, João Pessoa, Paraíba, Brazil
| | - Vanessa Veis Ribeiro
- Speech-Language Pathology Department, Universidade Federal de Sergipe - UFS, Lagarto, Sergipe, Brazil
| |
Collapse
|
6
|
Pinyopodjanard S, Suppakitjanusant P, Lomprew P, Kasemkosin N, Chailurkit L, Ongphiphadhanakul B. Instrumental Acoustic Voice Characteristics in Adults with Type 2 Diabetes. J Voice 2019; 35:116-121. [PMID: 31427120 DOI: 10.1016/j.jvoice.2019.07.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 07/04/2019] [Accepted: 07/08/2019] [Indexed: 01/05/2023]
Abstract
OBJECTIVE The objective of this study was to investigate if there are differences in acoustic parameters between diabetic patients and normal controls. METHODS A prospective cross-sectional study was performed in 83 diabetic patients and 70 healthy controls. Voice parameters including fundamental frequency (F0), jitter, shimmer, amplitude perturbation quotient, noise-to-harmonic ratio, smoothed amplitude perturbation quotient, and relative average perturbation were analyzed using Computerized Speech Lab with the Multi-Dimensional Voice Program. RESULTS F0 in female diabetic patients was significantly lower than controls (222.23 ± 27.89 Hz versus 241.08 ± 28.21 Hz, P< 0.01). In female diabetic subgroups with disease duration more than 10 years, poor glycemic control, or neuropathy, the F0 was still significantly lower. Multivariate analysis showed that F0 was significantly associated with diabetes after controlled for age, body mass index, presence of hypertension, and dyslipidemia. (P= 0.022). However, F0 was not able to predict the presence of diabetes as shown by logistic regression analysis (P= 0.243). CONCLUSIONS Voice fundamental frequency is lower in females with diabetes. However, voice fundamental frequency cannot adequately predict the presence of diabetes.
Collapse
Affiliation(s)
- Sittichai Pinyopodjanard
- Division of Endocrinology and Metabolism, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | | | - Prangorn Lomprew
- Department of Communication Sciences and Disorders, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Nittaya Kasemkosin
- Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Laor Chailurkit
- Division of Endocrinology and Metabolism, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Boonsong Ongphiphadhanakul
- Division of Endocrinology and Metabolism, Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.
| |
Collapse
|
7
|
de Souza AJ, Gonçalves DDS, Bastilha GR, Christmann MK, Scapini F, Cielo CA. Acoustic Measurements of the Glottic Source of Female Teachers With Dysphonia. J Voice 2019; 34:838-846. [PMID: 31174883 DOI: 10.1016/j.jvoice.2019.05.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 04/15/2019] [Accepted: 05/15/2019] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To verify the acoustic measurements of glottic source of dysphonic teachers of a medium-sized municipality of interior of the state. METHODS Retrospective, cross-sectional, and quantitative study, with a composite sample of 34 dysphonic teachers, of which 21 teachers without laryngeal affections and 13 with laryngeal affections, mean age 39.1 years old and 39.5 years old, respectively. Glottic source acoustic analysis was performed with the Multi-Dimensional Voice Program Advanced. The data were analyzed statistically to verify the significance of each acoustic measure between the groups (with laryngeal affection, without laryngeal affection, and total) and in relation to the normality proposed by the software. RESULTS In the three conditions (groups with and without affection and total) the means were statistically below normality in the measurements of maximum and minimum fundamental frequency. In the group without affection, frequency, and noise measurements presented above normality. In both groups, measurements of frequency, noise, and subharmonic segments were above normal, and number of voice breaks below normal. CONCLUSION Acoustic parameters outside the normal pattern showed an aperiodic vocal production, with presence of noise and instability in the vocal signal, in dysphonic teachers with or without alteration at the laryngeal level.
Collapse
Affiliation(s)
- Arieli Jaques de Souza
- Department of Speech-Language Pathology and Audiology, Universidade Federal de Santa Maria/UFSM, Santa Maria, Rio Grande do Sul, Brazil
| | - Daniela da Silva Gonçalves
- Department of Speech-Language Pathology and Audiology, Universidade Federal de Santa Maria/UFSM, Santa Maria, Rio Grande do Sul, Brazil
| | - Gabriele Rodrigues Bastilha
- Department of Speech-Language Pathology and Audiology, Universidade Federal de Santa Maria/UFSM, Santa Maria, Rio Grande do Sul, Brazil.
| | - Mara Keli Christmann
- Department of Speech-Language Pathology and Audiology, Universidade do Vale do Itajaí (UNIVALI) - Itajaí SC e Associação Educacional Luterana Bom Jesus - IELUSC - Joinville Santa Catarina, Brazil
| | - Fabrício Scapini
- Department of Medicine, Universidade Federal de Santa Maria/UFSM, Santa Maria, Rio Grande do Sul, Brazil
| | - Carla Aparecida Cielo
- Department of Speech-Language Pathology and Audiology, Universidade Federal de Santa Maria/UFSM, Santa Maria, Rio Grande do Sul, Brazil
| |
Collapse
|
8
|
Spazzapan EA, Marino VCDC, Cardoso VM, Berti LC, Fabron EMG. Acoustic characteristics of voice in different cycles of life: an integrative literature review. REVISTA CEFAC 2019. [DOI: 10.1590/1982-0216/201921315018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
ABSTRACT Purpose: to carry out an integrative literature review about the acoustic characteristics of healthy voice production, from childhood to old age. Methods: a bibliographic survey was conducted on the databases PubMed, SciELO, MEDLINE and LILACS, covering the last 10 years. Nineteen studies were found, meeting the proposed criteria, on acoustic measurements: F0 (fundamental frequency), jitter, shimmer and/or noise measurements, in males and females, with normal voices in their different stages of life. Results: the analysis showed that F0 is the most changing acoustic parameter as people grow up and grow old. Its values present gradual fall from childhood to old age in the female population, whereas among men such decrease lasts until adulthood. Jitter, shimmer and noise remain stable throughout childhood and adulthood, while shimmer and noise measurements increase in old age. In the literature, there is no consensus regarding increase of jitter measurements in the elderly. Conclusion: from childhood to old age, in both genders, vocal changes take place which are reflected, especially by F0. There is a scarcity of information on acoustics related to specific populations with ample age range, using the same methodology. The information in this study may guide future investigations aiming to understand natural changes occurring in the human voice, in addition to guiding in the clinical practice.
Collapse
|
9
|
Nordio S, Bernitsas E, Meneghello F, Palmer K, Stabile MR, Dipietro L, Di Stadio A. Expiratory and phonation times as measures of disease severity in patients with Multiple Sclerosis. A case-control study. Mult Scler Relat Disord 2018; 23:27-32. [DOI: 10.1016/j.msard.2018.04.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 03/21/2018] [Accepted: 04/16/2018] [Indexed: 11/16/2022]
|
10
|
Acoustic analysis of voice signal: Comparison of four applications software. Biomed Signal Process Control 2018. [DOI: 10.1016/j.bspc.2017.09.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Al-Nasheri A, Muhammad G, Alsulaiman M, Ali Z, Mesallam TA, Farahat M, Malki KH, Bencherif MA. An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification. J Voice 2016; 31:113.e9-113.e18. [PMID: 27105857 DOI: 10.1016/j.jvoice.2016.03.019] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 03/31/2016] [Indexed: 10/21/2022]
Abstract
BACKGROUND AND OBJECTIVE Automatic voice-pathology detection and classification systems may help clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. The main aim of this paper is to investigate Multidimensional Voice Program (MDVP) parameters to automatically detect and classify the voice pathologies in multiple databases, and then to find out which parameters performed well in these two processes. MATERIALS AND METHODS Samples of the sustained vowel /a/ of normal and pathological voices were extracted from three different databases, which have three voice pathologies in common. The selected databases in this study represent three distinct languages: (1) the Arabic voice pathology database; (2) the Massachusetts Eye and Ear Infirmary database (English database); and (3) the Saarbruecken Voice Database (German database). A computerized speech lab program was used to extract MDVP parameters as features, and an acoustical analysis was performed. The Fisher discrimination ratio was applied to rank the parameters. A t test was performed to highlight any significant differences in the means of the normal and pathological samples. RESULTS The experimental results demonstrate a clear difference in the performance of the MDVP parameters using these databases. The highly ranked parameters also differed from one database to another. The best accuracies were obtained by using the three highest ranked MDVP parameters arranged according to the Fisher discrimination ratio: these accuracies were 99.68%, 88.21%, and 72.53% for the Saarbruecken Voice Database, the Massachusetts Eye and Ear Infirmary database, and the Arabic voice pathology database, respectively.
Collapse
Affiliation(s)
- Ahmed Al-Nasheri
- Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
| | - Ghulam Muhammad
- Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia.
| | - Mansour Alsulaiman
- Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
| | - Zulfiqar Ali
- Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia; Centre for Intelligent Signal and Imaging Research (CISIR), Department of Electrical and Electronic Engineering, Universiti Tekhnologi PETRONAS, Tronoh, Perak 31750, Malaysia
| | - Tamer A Mesallam
- ENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia; ENT Department, College of Medicine, Al-Menoufiya University, Shebin Alkoum, Egypt
| | - Mohamed Farahat
- ENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Khalid H Malki
- ENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Mohamed A Bencherif
- Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
| |
Collapse
|