1
|
Fahed VS, Doheny EP, Busse M, Hoblyn J, Lowery MM. Comparison of Acoustic Voice Features Derived From Mobile Devices and Studio Microphone Recordings. J Voice 2025; 39:559.e1-559.e18. [PMID: 36379826 DOI: 10.1016/j.jvoice.2022.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/10/2022] [Accepted: 10/10/2022] [Indexed: 11/14/2022]
Abstract
OBJECTIVES/HYPOTHESIS Improvements in mobile device technology offer new opportunities for remote monitoring of voice for home and clinical assessment. However, there is a need to establish equivalence between features derived from signals recorded from mobile devices and gold standard microphone-preamplifiers. In this study acoustic voice features from android smartphone, tablet, and microphone-preamplifier recordings were compared. METHODS Data were recorded from 37 volunteers (20 female) with no history of speech disorder and six volunteers with Huntington's disease (HD) during sustained vowel (SV) phonation, reading passage (RP), and five syllable repetition (SR) tasks. The following features were estimated: fundamental frequency median and standard deviation (F0 and SD F0), harmonics-to-noise ratio (HNR), local jitter, relative average perturbation of jitter (RAP), five-point period perturbation quotient (PPQ5), difference of differences of amplitude and periods (DDA and DDP), shimmer, and amplitude perturbation quotients (APQ3, APQ5, and APQ11). RESULTS Bland-Altman analysis revealed good agreement between microphone and mobile devices for fundamental frequency, jitter, RAP, PPQ5, and DDP during all tasks and a bias for HNR, shimmer and its variants (APQ3, APQ5, APQ11, and DDA). Significant differences were observed between devices for HNR, shimmer, and its variants for all tasks. High correlation was observed between devices for all features, except SD F0 for RP. Similar results were observed in the HD group for SV and SR task. Biological sex had a significant effect on F0 and HNR during all tests, and for jitter, RAP, PPQ5, DDP, and shimmer for RP and SR. No significant effect of age was observed. CONCLUSIONS Mobile devices provided good agreement with state of the art, high-quality microphones during structured speech tasks for features derived from frequency components of the audio recordings. Caution should be taken when estimating HNR, shimmer and its variants from recordings made with mobile devices.
Collapse
Affiliation(s)
- Vitória S Fahed
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland.
| | - Emer P Doheny
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| | - Monica Busse
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Jennifer Hoblyn
- School of Medicine, Trinity College Dublin, Dublin, Ireland; Bloomfield Health Services, Dublin, Ireland
| | - Madeleine M Lowery
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|
2
|
Boogers LS, Chen BSJ, Coerts MJ, Rinkel RNPM, Hannema SE. Mobile Phone Applications Voice Tools and Voice Pitch Analyzer Validated With LingWAVES to Measure Voice Frequency. J Voice 2025; 39:559.e29-559.e33. [PMID: 36371270 DOI: 10.1016/j.jvoice.2022.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 10/17/2022] [Accepted: 10/17/2022] [Indexed: 11/11/2022]
Abstract
OBJECTIVES Voice frequency can be measured to assess the voice change in transgender men and women during treatment. There are many applications that can analyze voice frequency. This validation study aimed to compare the ability to measure voice frequency of the mobile phone applications "Voice Tools" and "Voice Pitch Analyzer" with the registration program LingWAVES as the gold standard. STUDY DESIGN Prospective validation study. METHODS A total of 45 participants of whom 20 transgender individuals were included. They were asked to read "The North Wind and the Sun" twice. The first measurement compared voice frequency measured by Voice Tools with LingWAVES while the second measurement compared Voice Pitch Analyzer with LingWAVES. The two applications that were being compared simultaneously measured the voice frequency. Pearson's regression correlations were performed to test for correlation between the mobile phone applications and LingWAVES. RESULTS Significant correlations were demonstrated between the measurements of Voice Tools and LingWAVES (P < 0.001) and between Voice Pitch Analyzer and LingWAVES (P < 0.001). Voice Tools overestimated voice frequency with a median deviation of 2Hz (range -4 to 20). The overestimation was more pronounced in the high ranges. Voice Pitch Analyzer showed underestimation of voice frequency in high ranges. Median deviation was -2Hz (range -16 to 14). CONCLUSIONS This validation study shows that voice frequency can be reliably measured with the mobile phone applications "Voice Tools" and "Voice Pitch Analyzer". Combined with the ease of use, these applications can be used to measure voice frequency in clinical practice and for research purposes.
Collapse
Affiliation(s)
- Lidewij S Boogers
- Department of Endocrinology, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands; Center of Expertise on Gender Dysphoria, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands
| | - Britney S J Chen
- Department of Endocrinology, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands; Center of Expertise on Gender Dysphoria, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands
| | - Marieke J Coerts
- Center of Expertise on Gender Dysphoria, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands; Department of Otolaryngology, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands
| | - Rico N P M Rinkel
- Center of Expertise on Gender Dysphoria, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands; Department of Otolaryngology, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands
| | - Sabine E Hannema
- Center of Expertise on Gender Dysphoria, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands; Department of Pediatric Endocrinology, Amsterdam University Medical Center, Amsterdam, North Holland, The Netherlands.
| |
Collapse
|
3
|
Queiroz MRG, Pernambuco L, Leão RLDS, Araújo AN, Gomes ADOC, da Silva HJ, Lucena JA. Voice Therapy for Older Adults During the COVID-19 Pandemic in Brazil. J Voice 2025; 39:566.e1-566.e11. [PMID: 36550002 PMCID: PMC9574462 DOI: 10.1016/j.jvoice.2022.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/26/2022]
Abstract
OBJECTIVE To characterize the clinical practice of Brazilian speech-language-hearing therapists regarding voice therapy for older adults during the COVID-19 pandemic. METHODS Cross-sectional survey conducted remotely. Data were collected through a form shared online with approximately 1.500 speech-language-hearing therapists. The form included voice therapy practice with older adults during the COVID-19 pandemic. It was responded by 155 voice experts. RESULTS Most respondents were females with over 21 years' experience in vocal health care, working with both in-person therapy and teletherapy. Obtaining acoustic parameters and using therapy strategies for breathing and body training were the most reported changes in remote therapy during the pandemic. The main difficulties involved wearing masks in in-person therapy and assessing the voice in teletherapy. Patient adherence and goals reached were deemed positive by most participants. Associations were found between place and format of service; between patient adherence and goals reached; and between difficulties in teletherapy and use of complementary therapeutic resources. CONCLUSION The COVID-19 pandemic led Brazilian speech-language-hearing therapists to change their clinical practice with older adults in both remote and in-person therapy. The main changes involved wearing masks in in-person therapy and assessing the voice in teletherapy. Remote therapy proved to be a safe and effective possibility.
Collapse
Affiliation(s)
- Mariana Rebeka Gomes Queiroz
- Speech-Language Pathology and Audiology Department, Health Sciences Center, Graduate Program in Human Communication Health at the Federal University of Pernambuco, Recife, Pernambuco, Brazil.
| | - Leandro Pernambuco
- Department of Speech Therapy, Health Sciences Center, UFPB, João Pessoa, Paraíba, Brazil
| | - Rebeca Lins de Souza Leão
- Speech-Language Pathology and Audiology Department, Health Sciences Center, Graduate Program in Human Communication Health at the Federal University of Pernambuco, Recife, Pernambuco, Brazil
| | - Ana Nery Araújo
- Speech-Language Pathology and Audiology Department, Health Sciences Center, Federal University of Pernambuco, Recife, Pernambuco, Brazil
| | - Adriana de Oliveira Camargo Gomes
- Speech-Language Pathology and Audiology Department, Health Sciences Center, Graduate Program in Human Communication Health at the Federal University of Pernambuco, Recife, Pernambuco, Brazil
| | - Hilton Justino da Silva
- Speech-Language Pathology and Audiology Department, Health Sciences Center, Graduate Program in Human Communication Health at the Federal University of Pernambuco, Recife, Pernambuco, Brazil
| | - Jonia Alves Lucena
- Speech-Language Pathology and Audiology Department, Health Sciences Center, Graduate Program in Human Communication Health at the Federal University of Pernambuco, Recife, Pernambuco, Brazil
| |
Collapse
|
4
|
Pommée T, Morsomme D. Voice Quality in Telephone Interviews: A preliminary Acoustic Investigation. J Voice 2025; 39:563.e1-563.e20. [PMID: 36192289 DOI: 10.1016/j.jvoice.2022.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/24/2022] [Accepted: 08/25/2022] [Indexed: 10/07/2022]
Abstract
OBJECTIVES To investigate the impact of standardized mobile phone recordings passed through a telecom channel on acoustic markers of voice quality and on its perception by voice experts in normophonic speakers. METHODS Continuous speech and a sustained vowel were recorded for fourteen female and ten male normophonic speakers. The recordings were done simultaneously with a head-mounted high-quality microphone and through the telephone network on a receiving smartphone. Twenty-two acoustic voice quality, breathiness and pitch-related measures were extracted from the recordings. Nine vocologists perceptually rated the G, R and B parameters of the GRBAS scale on each voice sample. The reproducibility, the recording type, the stimulus type and the gender effects, as well as the correlation between acoustic and perceptual measures were investigated. RESULTS The sustained vowel samples are damped after one second. Only the frequencies between 100 and 3700Hz are passed through the telecom channel and the frequency response is characterized by peaks and troughs. The acoustic measures show a good reproducibility over the three repetitions. All measures significantly differ between the recording types, except for the local jitter, the harmonics-to-noise ratio by Dejonckere and Lebacq, the period standard deviation and all six pitch measures. The AVQI score is higher in telephone recordings, while the ABI score is lower. Significant differences between genders are also found for most of the measures; while the AVQI is similar in men and women, the ABI is higher in women in both recording types. For the perceptual assessment, the interrater agreement is rather low, while the reproducibility over the three repetitions is good. Few significant differences between recording types are observed, except for lower breathiness ratings on telephone recordings. G ratings are significantly more severe on the sustained vowel on both recording types, R ratings only on telephone recordings. While roughness is rated higher in men on telephone recordings by most experts, no gender effect is observed for breathiness on either recording types. Finally, neither the AVQI nor the ABI yield strong correlations with any of the perceptual parameters. CONCLUSIONS Our results show that passing a voice signal through a telecom channel induces filter and noise effects that limit the use of common acoustic voice quality measures and indexes. The AVQI and ABI are both significantly impacted by the recording type. The most reliable acoustic measures seem to be pitch perturbation (local jitter and period standard deviation) as well as the harmonics-to-noise ratio from Dejonckere and Lebacq. Our results also underline that raters are not equally sensitive to the various factors, including the recording type, the stimulus type and the gender effects. Neither of the three perceptual parameters G, R and B seem to be reliably measurable on telephone recordings using the two investigated acoustic indexes. Future studies investigating the impact of voice quality in telephone conversations should thus focus on acoustic measures on continuous speech samples that are limited to the frequency response of the telecom channel and that are not too sensitive to environmental and additive noise.
Collapse
Affiliation(s)
- Timothy Pommée
- Research Unit for a life-Course perspective on Health and Education, Voice Unit, University of Liège, Belgium.
| | - Dominique Morsomme
- Research Unit for a life-Course perspective on Health and Education, Voice Unit, University of Liège, Belgium
| |
Collapse
|
5
|
Yağcıoğlu D, Esen Aydınlı F, Tunç Songur E, Şimşek S, Çetinkaya B, İncebay Ö. Can Smartphones Be Used to Record Children's Voices for Acoustic Analysis? J Voice 2025:S0892-1997(25)00044-X. [PMID: 40011181 DOI: 10.1016/j.jvoice.2025.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 02/03/2025] [Accepted: 02/03/2025] [Indexed: 02/28/2025]
Abstract
OBJECTIVES While technological advancements have enabled the utilization of smartphones for acoustic voice analysis, existing studies have predominantly focused on the adult population. However, dysphonia is prevalent in children, and their anatomy and physiology are different from those of adults. Thus, this paper aims to investigate the feasibility of using smartphones to record children's voices for acoustic voice analysis. STUDY DESIGN A methodological study. METHODS This study involved 29 children, aged 4-10 years, who had healthy voices. Voice recordings of sustained phonation and reading a sentence were obtained using four devices (1-AKG Micromic C520 headset microphone connected to a computer with the computerized speech lab (CSL), 2-Samsung S9 Plus, 3-iPhone 13 Mini, and 4-Huawei Y9 Prime). Then, all the recorded voice samples were analyzed using CSL, examining a total of 13 acoustic parameters. Pearson correlation analysis, Bland-Altman analysis, and uncertainty of measurement analysis were conducted to assess consistency and agreement across devices. RESULTS The highest correlation and accuracy between the smartphone measurements and the reference recording system were found for the fundamental frequency (F0) (r > 0.90, P < 0.01) in both speech samples. For other parameters, limited reliability was observed; while some showed weak correlations, others had low accuracy and consistency. There were, however, moderate-to-excellent correlations for the most measurements and a nonsignificant bias according to the Bland-Altman analysis. CONCLUSION This study is the first to investigate the feasibility of using smartphones for acoustic voice analysis focusing specifically on children. The results indicate that smartphone voice recordings can be used to reliably measure F0. More research is needed to improve measurement reliability for other parameters. Nonetheless, the findings demonstrate the potential for smartphones to enable accessible and reliable voice assessment in the pediatric population.
Collapse
Affiliation(s)
- Damlasu Yağcıoğlu
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Fatma Esen Aydınlı
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Elif Tunç Songur
- Speech and Language Therapy Department, Faculty of Health Sciences, Selçuk University, Konya, Turkey.
| | - Sinem Şimşek
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Buse Çetinkaya
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Önal İncebay
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
6
|
Pietrzak MM, Pietruszewska W, Barańska M, Rycerz A, Stawiski K, Niebudek-Bogusz E. Assessment of the Interdependencies Between High-Speed Videoendoscopy and Simultaneously Recorded Audio Data in Various Glottal Pathologies. Biomedicines 2025; 13:511. [PMID: 40002924 PMCID: PMC11852736 DOI: 10.3390/biomedicines13020511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 01/24/2025] [Accepted: 02/04/2025] [Indexed: 02/27/2025] Open
Abstract
Background: This study aimed to investigate the relationships between kymographic parameters derived from high-speed videoendoscopy (HSV) and simultaneously recorded acoustic signals. The research provides insights into the vibratory dynamics of various glottal pathologies, assessed across different glottal widths, and their mutual relations with audio data. Methods: The study included 192 participants categorized as normophonic or having functional or organic lesions (benign, premalignant, and malignant). Parameters describing vocal fold oscillations were calculated using HSV kymography for three glottal widths, along with corresponding acoustic data. Initially, linear correlations between these parameters were assessed. Next, the consistency in cycle detection and its influence on the correlation levels were evaluated. Results: The fundamental frequency (F0) and mean Jitter (Jita) showed the highest correlations between the HSV- and audio-determined parameters (F0: 0.97, Jita: 0.40-0.70), with even stronger correlations when the number of detected cycles was consistent (F0: 0.99, Jita: 0.68-0.98). The correlations for other parameters ranged from low to moderate, with no significant differences observed between the diagnostic subgroups (functional changes and benign and malignant glottal lesions). However, in the premalignant lesions group, high correlations (0.77-0.9) were observed between the HSV and audio parameters, but only for measures describing period perturbations. Beyond F0 and mean Jitter, consistency in cycle detection did not significantly affect correlation levels. Conclusions: The simultaneous audio signal proved useful in verifying the accuracy of HSV quantification measures, particularly for F0, which showed strong agreement between the methods. Discrepancies in other parameters and low correlations between HSV-derived kymography and audio data may suggest the influence of the throat, mouth, and nose resonators, which are added to the glottal signal. While the kymographic analysis based on HSV provides detailed descriptions of vocal fold oscillations, it does not fully capture the three-dimensional structure and complex functionality of the vocal folds.
Collapse
Affiliation(s)
- Magdalena M. Pietrzak
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland; (M.M.P.)
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland; (M.M.P.)
| | - Magda Barańska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland; (M.M.P.)
| | - Aleksander Rycerz
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Konrad Stawiski
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland; (M.M.P.)
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| |
Collapse
|
7
|
Hu Z, Zhang Z, Li H, Yang LZ. Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices. Behav Res Methods 2024; 57:35. [PMID: 39738817 DOI: 10.3758/s13428-024-02584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2024] [Indexed: 01/02/2025]
Abstract
In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.
Collapse
Affiliation(s)
- Zian Hu
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Zhenglin Zhang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Hai Li
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
- University of Science and Technology of China, Hefei, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China.
| | - Li-Zhuang Yang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
- University of Science and Technology of China, Hefei, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China.
| |
Collapse
|
8
|
Batthyany C, Latoszek BBV, Maryn Y. Meta-Analysis on the Validity of the Acoustic Voice Quality Index. J Voice 2024; 38:1527.e1-1527.e19. [PMID: 35752532 DOI: 10.1016/j.jvoice.2022.04.022] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/27/2022] [Accepted: 04/27/2022] [Indexed: 02/01/2023]
Abstract
BACKGROUND Acoustic measurements are useful tools to objectively measure overall voice quality. The Acoustic Voice Quality Index (AVQI) has shown to be a valid multiparametric tool to objectify dysphonia severity. The increasing number of validity studies investigating AVQI's validity demands a comprehensive synthesis of the available outcomes. OBJECTIVE OF REVIEW The aim of the present meta-analysis is to quantify the evidence for the diagnostic accuracy of the AVQI, including its sensitivity, specificity and likelihood ratio statistics, and its concurrent validity and sensitivity to changes in auditory-perceptual voice quality ratings. TYPE OF REVIEW Meta-analysis SEARCH STRATEGY: MEDLINE, EMBASE, the Cochrane library and Web of Science were searched from 2010 till April 2021 with an additional manual search, using keywords related to AVQI and common terminologies of validity outcomes. Studies considering the clinical validity of AVQI (ie, diagnostic accuracy, concurrent validity and sensitivity to change), using auditory-perceptual voice quality evaluation as reference, were included. EVALUATION METHOD The Preferred Reporting Items for Systematic reviews and Meta-Analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines were used. Quality assessment of included studies was conducted using the QUADAS-2 tool. For the diagnostic accuracy of AVQI, the pooled sensitivity, specificity and likelihood ratio statistics were determined using a summary receiver operating characteristic approach. Weighted correlation coefficient measures (rW¯) were used to assess the concurrent validity and sensitivity to change. RESULTS A total of 198 studies were screened and 33 articles were included. In total, voice samples of 11447, 10272, and 367 different subjects were considered for analysis of diagnostic accuracy, concurrent validity and change responsiveness, respectively. Satisfying diagnostic accuracy results were found with a pooled sensitivity of 0.83 (95% CI: 0.82-0.83), a pooled specificity of 0.89 (95% CI: 0.88-0.90), a pooled positive LR of 7.75 (95% CI: 6.04-9.95), a pooled negative LR of 0.20 (95% CI: 0.16-0.23), and a pooled diagnostic odds ratio of 47.13 (95% CI: 34.82-63.79). Summary receiver operating characteristic curve analysis showed an excellent AUC value of 0.937 and Q* index of 0.874. Strong correlations of rW¯ = 0.838 for concurrent validity and rW¯ = 0.796 for sensitivity to change were found. CONCLUSIONS Our results confirm the general clinical utility of the AVQI as a robust and valid objective measure for evaluating overall dysphonia severity across languages and study methods.
Collapse
Affiliation(s)
- Christina Batthyany
- GZA Sint-Augustinus, Department of Otorhinolaryngology and Head & Neck Surgery, European Institute of ORL-HNS, Antwerp, Belgium
| | - Ben Barsties V Latoszek
- SRH University of Applied Health Sciences, Speech-Language Pathology, Düsseldorf, Germany; University of Münster, University Hospital Münster, Department of Phoniatrics and Pediatric Audiology, Münster, Germany
| | - Youri Maryn
- GZA Sint-Augustinus, Department of Otorhinolaryngology and Head & Neck Surgery, European Institute of ORL-HNS, Antwerp, Belgium; Ghent University, Faculty of Medicine and Health Sciences, Department of Rehabilitation Sciences, Ghent, Belgium; University College Ghent, Department of Speech-Language Therapy and Audiology, Ghent, Belgium; Université Catholique de Louvain, Faculty of Psychology and Pedagogical Sciences, School of Logopedics, Ottignies-Louvain-La-Neuve, Belgium; Phonanium, Lokeren, Belgium.
| |
Collapse
|
9
|
Sresuganthi JR, Nallamuthu A, Boominathan P. Comparison of Client-Led Asynchronous and Clinician-Led Synchronous Online Methods for Evaluation of Subjective Vocal Measures in Teachers: A Feasibility Study. J Voice 2024; 38:1522.e1-1522.e10. [PMID: 35641382 DOI: 10.1016/j.jvoice.2022.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 04/21/2022] [Accepted: 04/21/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND COVID-19 has transformed face to face teaching in classrooms to online and hybrid modes. Increased vocal intensity/ pitch to call attention of students and transact in the online class, inappropriate posture (head, neck & upper trunk) while using the laptop and other online tools cause vocal loading leading to voice related concerns in teachers. Tele voice assessment is a feasible alternative means to seek professional help in the current situation and possibly in the future too. Client-led asynchronous and clinician-led synchronous voice recordings for clinical vocal measures among school teachers were compared in this study. METHOD Twenty-five school teachers (21 females & four males) from Chennai consented to the study. Information of voice use, its impact on the day-to-day situations, self-perception of vocal fatigue, and their recorded voice sample (phonation & speaking) were obtained online (asynchronous mode). Within a period of ten days, the clinician-led synchronous session was planned on a mutually convenient time for obtaining voice samples through zoom call. The voice samples obtained were compared for clinical measures and perceptual voice evaluation. RESULTS Participants reported of vocal symptoms and increased vocal fatigue scores. The maximum phonation time values obtained through synchronous mode were lesser when compared to asynchronous mode. Also, variability was noted in the perceptual vocal measures of voice samples obtained through synchronous mode. During synchronous voice recording & evaluation, the background noise, internet stability, audio enhancement feature, and microphone placement & quality could be monitored, and immediate feedback was provided. Additionally, the asynchronous recording can be supplemented for synchronous recording, with clear instructions & demonstration. CONCLUSION This study explored the feasibility of using synchronous and asynchronous voice recording for voice analysis in school teachers. The findings could serve as a base to understand the advantages and challenges of using client-led asynchronous and clinician-led synchronous methods for estimating vocal measures.
Collapse
Affiliation(s)
| | - Aishwarya Nallamuthu
- Department of Speech Language & Hearing Sciences, Sri Ramachandra Institute of Higher Education & Research, Chennai, Tamil Nadu, India.
| | - Prakash Boominathan
- Department of Speech Language & Hearing Sciences, Sri Ramachandra Institute of Higher Education & Research, Chennai, Tamil Nadu, India
| |
Collapse
|
10
|
Awan SN, Bahr R, Watts S, Boyer M, Budinsky R, Bensoussan Y. Evidence-Based Recommendations for Tablet Recordings From the Bridge2AI-Voice Acoustic Experiments. J Voice 2024:S0892-1997(24)00283-2. [PMID: 39306498 PMCID: PMC11922786 DOI: 10.1016/j.jvoice.2024.08.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 03/21/2025]
Abstract
BACKGROUND As part of a larger goal to create best practices for voice data collection to fuel voice artificial intelligence (AI) research, the objective of this study was to investigate the ability of readily available iOS and Android tablets with and without low-cost headset microphones to produce recordings and subsequent acoustic measures of voice comparable to "research quality" instrumentation. METHODS Recordings of 24 sustained vowel samples representing a wide range of typical and disordered voices were played via a head-and-torso model and recorded using a research quality standard microphone/preamplifier/audio interface. Acoustic measurements from the standard were compared with two popular tablets using their built-in microphones and with low-cost headset microphones at different distances from the mouth. RESULTS Voice measurements obtained via tablets + headset microphones close to the mouth (2.5 and 5 cm) strongly correlated (r's > 0.90) with the research standard and resulted in no significant differences for measures of vocal frequency and perturbation. In contrast, voice measurements obtained using the tablets' built-in microphones at typical reading distances (30 and 45 cm) tended to show substantial variability in measurement, greater mean differences in voice measurements, and relatively poorer correlations vs the standard. CONCLUSION Findings from this study support preliminary recommendations from the Bridge2AI-Voice Consortium recommending the use of smartphones paired with low-cost headset microphones as adequate methods of recording for large-scale voice data collection from a variety of clinical and nonclinical settings. Compared with recording using a tablet direct, a headset microphone controls for recording distance and reduces the effects of background noise, resulting in decreased variability in recording quality. DATA AVAILABILITY Data supporting the results reported in this article may be obtained upon request from the contact author.
Collapse
Affiliation(s)
- Shaheen N Awan
- School of Communication Sciences & Disorders, University of Central Florida, Orlando, Florida.
| | - Ruth Bahr
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, Florida
| | - Stephanie Watts
- Department of Otolaryngology - Head & Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida
| | - Micah Boyer
- Department of Otolaryngology - Head & Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida
| | - Robert Budinsky
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, Florida
| | - Yael Bensoussan
- Department of Otolaryngology - Head & Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida
| |
Collapse
|
11
|
Pardo MA, Lolchuragi DS, Poole J, Granli P, Moss C, Douglas-Hamilton I, Wittemyer G. Female African elephant rumbles differ between populations and sympatric social groups. ROYAL SOCIETY OPEN SCIENCE 2024; 11:241264. [PMID: 39323553 PMCID: PMC11421903 DOI: 10.1098/rsos.241264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 08/11/2024] [Indexed: 09/27/2024]
Abstract
Vocalizations often vary in structure within a species, from the individual to population level. Vocal differences among social groups and populations can provide insight into biological processes such as vocal learning and evolutionary divergence, with important conservation implications. As vocal learners of conservation concern, intraspecific vocal variation is of particular interest in elephants. We recorded calls from individuals in multiple, wild elephant social groups in two distinct Kenyan populations. We used machine learning to investigate vocal differentiation among individual callers, core groups, bond groups (collections of core groups) and populations. We found clear evidence for vocal distinctiveness at the individual and population level, and evidence for much subtler vocal differences among social groups. Social group membership was a better predictor of call similarity than genetic relatedness, suggesting that subtle vocal differences among social groups may be learned. Vocal divergence among populations and social groups has conservation implications for the effects of social disruption and translocation of elephants.
Collapse
Affiliation(s)
- Michael A. Pardo
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, CO, USA
| | | | | | | | - Cynthia Moss
- Amboseli Elephant Research Project, Nairobi, Kenya
| | | | - George Wittemyer
- Department of Fish, Wildlife, and Conservation Biology, Colorado State University, Fort Collins, CO, USA
- Save The Elephants, Nairobi, Kenya
| |
Collapse
|
12
|
Awan SN, Bahr R, Watts S, Boyer M, Budinsky R, Bensoussan Y. Validity of Acoustic Measures Obtained Using Various Recording Methods Including Smartphones With and Without Headset Microphones. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:1712-1730. [PMID: 38749007 DOI: 10.1044/2024_jslhr-23-00759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
PURPOSE The goal of this study was to assess various recording methods, including combinations of high- versus low-cost microphones, recording interfaces, and smartphones in terms of their ability to produce commonly used time- and spectral-based voice measurements. METHOD Twenty-four vowel samples representing a diversity of voice quality deviations and severities from a wide age range of male and female speakers were played via a head-and-thorax model and recorded using a high-cost, research standard GRAS 40AF (GRAS Sound & Vibration) microphone and amplification system. Additional recordings were made using various combinations of headset microphones (AKG C555 L [AKG Acoustics GmbH], Shure SM35-XLR [Shure Incorporated], AVID AE-36 [AVID Products, Inc.]) and audio interfaces (Focusrite Scarlett 2i2 [Focusrite Audio Engineering Ltd.] and PC, Focusrite and smartphone, smartphone via a TRRS adapter), as well as smartphones direct (Apple iPhone 13 Pro, Google Pixel 6) using their built-in microphones. The effect of background noise from four different room conditions was also evaluated. Vowel samples were analyzed for measures of fundamental frequency, perturbation, cepstral peak prominence, and spectral tilt (low vs. high spectral ratio). RESULTS Results show that a wide variety of recording methods, including smartphones with and without a low-cost headset microphone, can effectively track the wide range of acoustic characteristics in a diverse set of typical and disordered voice samples. Although significant differences in acoustic measures of voice may be observed, the presence of extremely strong correlations (rs > .90) with the recording standard implies a strong linear relationship between the results of different methods that may be used to predict and adjust any observed differences in measurement results. CONCLUSION Because handheld smartphone distance and positioning may be highly variable when used in actual clinical recording situations, smartphone + a low-cost headset microphone is recommended as an affordable recording method that controls mouth-to-microphone distance and positioning and allows both hands to be available for manipulation of the smartphone device.
Collapse
Affiliation(s)
- Shaheen N Awan
- School of Communication Sciences and Disorders, University of Central Florida, Orlando
| | - Ruth Bahr
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Stephanie Watts
- Department of Otolaryngology - Head & Neck Surgery, Morsani College of Medicine, University of South Florida, Tampa
| | - Micah Boyer
- Department of Otolaryngology - Head & Neck Surgery, Morsani College of Medicine, University of South Florida, Tampa
| | - Robert Budinsky
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Yael Bensoussan
- Department of Otolaryngology - Head & Neck Surgery, Morsani College of Medicine, University of South Florida, Tampa
| |
Collapse
|
13
|
Banks R, Higgins C, Greene BR, Jannati A, Gomes‐Osman J, Tobyne S, Bates D, Pascual‐Leone A. Clinical classification of memory and cognitive impairment with multimodal digital biomarkers. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2024; 16:e12557. [PMID: 38406610 PMCID: PMC10884988 DOI: 10.1002/dad2.12557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 02/27/2024]
Abstract
INTRODUCTION Early detection of Alzheimer's disease and cognitive impairment is critical to improving the healthcare trajectories of aging adults, enabling early intervention and potential prevention of decline. METHODS To evaluate multi-modal feature sets for assessing memory and cognitive impairment, feature selection and subsequent logistic regressions were used to identify the most salient features in classifying Rey Auditory Verbal Learning Test-determined memory impairment. RESULTS Multimodal models incorporating graphomotor, memory, and speech and voice features provided the stronger classification performance (area under the curve = 0.83; sensitivity = 0.81, specificity = 0.80). Multimodal models were superior to all other single modality and demographics models. DISCUSSION The current research contributes to the prevailing multimodal profile of those with cognitive impairment, suggesting that it is associated with slower speech with a particular effect on the duration, frequency, and percentage of pauses compared to normal healthy speech.
Collapse
Affiliation(s)
- Russell Banks
- Department of Communicative Sciences & DisordersCollege of Arts & SciencesMichigan State UniversityEast LansingMichiganUSA
| | | | | | - Ali Jannati
- Department of NeurologyHarvard Medical SchoolBostonMassachusettsUSA
| | - Joyce Gomes‐Osman
- Department of NeurologyUniversity of Miami Miller School of MedicineMiamiFloridaUSA
| | | | | | - Alvaro Pascual‐Leone
- Linus HealthBostonMassachusettsUSA
- Department of NeurologyHarvard Medical SchoolBostonMassachusettsUSA
- Hinda and Arthur Marcus Institute for Aging Research and Deanna and Sidney Wolk Center for Memory HealthHebrew SeniorLifeBostonMassachusettsUSA
| |
Collapse
|
14
|
Cavalcanti JC, Eriksson A, Barbosa PA. Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications. J Voice 2024; 38:243.e11-243.e29. [PMID: 34629229 DOI: 10.1016/j.jvoice.2021.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 08/03/2021] [Accepted: 08/09/2021] [Indexed: 11/18/2022]
Abstract
OBJECTIVES To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). PARTICIPANTS A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. METHOD The participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. RESULTS f0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pair and cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f0 patterns in connected speech, but not for lengthened vowel-based f0 metrics. Estimates of f0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns of f0. Concerning system performance, the base value of f0 (f0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). CONCLUSIONS The outcomes suggest that, although identical twins were very closely related regarding their f0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f0 metrics for speaker comparison purposes, with particular consideration to f0 baseline. Furthermore, f0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.
Collapse
Affiliation(s)
- Julio Cesar Cavalcanti
- Department of linguistics, Stockholm University, Stockholm, Sweden; Institute of language studies, Campinas State University, Campinas, São Paulo, Brazil.
| | - Anders Eriksson
- Department of linguistics, Stockholm University, Stockholm, Sweden.
| | - Plinio A Barbosa
- Institute of language studies, Campinas State University, Campinas, São Paulo, Brazil.
| |
Collapse
|
15
|
Barsties V Latoszek B, Watts CR, Hetjens S. The Efficacy of the Manual Circumlaryngeal Therapy for Muscle Tension Dysphonia: A Systematic Review and Meta-analysis. Laryngoscope 2024; 134:18-26. [PMID: 37366280 DOI: 10.1002/lary.30850] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/11/2023] [Indexed: 06/28/2023]
Abstract
OBJECTIVE Muscle tension dysphonia (MTD) is the most common functional voice disorder. Behavioral voice therapy is the front-line treatment for MTD, and laryngeal manual therapy may be a part of this treatment. The objective of this study was to investigate the effect of manual circumlaryngeal therapy (MCT) on acoustic markers of voice quality (jitter, shimmer, and harmonics-to-noise ratio) and vocal function (fundamental frequency) through a systematic review with meta-analysis. DATA SOURCES Four databases were searched from inception to December 2022, and a manual search was performed. REVIEW METHODS The PRISMA extension statement for reporting systematic reviews incorporating a meta-analysis of health care interventions was applied, and a random effects model was used for the meta-analyses. RESULTS We identified 6 eligible studies from 30 studies (without duplicates). The MCT approach was highly effective on acoustics with large effect sizes (Cohen's d > 0.8). Significant improvements were obtained in jitter in percent (mean difference of -.58; 95% CI -1.00 to 0.16), shimmer in percent (mean difference of -5.66; 95% CI -8.16 to 3.17), and harmonics-to-noise ratio in dB (mean difference of 4.65; 95% CI 1.90-7.41), with the latter two measurements continuing to be significantly improved by MCT when measurement variability is considered. CONCLUSION The efficacy of MCT for MTD was confirmed in most clinical studies by assessing jitter, shimmer, and harmonics-to-noise ratio related to voice quality. The effects of MCT on the fundamental frequency changes could not be verified. Further contributions of high-quality randomized control trials are needed to support evidence-based practice in laryngology. Laryngoscope, 134:18-26, 2024.
Collapse
Affiliation(s)
| | - Christopher R Watts
- Harris College of Nursing & Health Sciences, Texas Christian University, Fort Worth, Texas, USA
| | - Svetlana Hetjens
- Department for Medical Statistics and Biomathematics, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| |
Collapse
|
16
|
Ceylan ME, Cangi ME, Yılmaz G, Peru BS, Yiğit Ö. Are smartphones and low-cost external microphones comparable for measuring time-domain acoustic parameters? Eur Arch Otorhinolaryngol 2023; 280:5433-5444. [PMID: 37584753 DOI: 10.1007/s00405-023-08179-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/05/2023] [Indexed: 08/17/2023]
Abstract
PURPOSE This study examined and compared the diagnostic accuracy and correlation levels of the acoustic parameters of the audio recordings obtained from smartphones on two operating systems and from dynamic and condenser types of external microphones. METHOD The study included 87 adults: 57 with voice disorder and 30 with a healthy voice. Each participant was asked to perform a sustained vowel phonation (/a/). The recordings were taken simultaneously using five microphones AKG-P220, Shure-SM58, Samson Go Mic, Apple iPhone 6, and Samsung Galaxy J7 Pro microphones in an acoustically insulated cabinet. Acoustic examinations were performed using Praat version 6.2.09. The data were examined using Pearson correlation and receiver-operating characteristic (ROC) analyses. RESULTS The parameters with the highest area under curve (AUC) values among all microphone recordings in the time-domain analyses were the frequency perturbation parameters. Additionally, considering the correlation coefficients obtained by synchronizing the microphones with each other and the AUC values together, the parameter with the highest correlation coefficient and diagnostic accuracy values was the jitter-local parameter. CONCLUSION Period-to-period perturbation parameters obtained from audio recordings made with smartphones show similar levels of diagnostic accuracy to external microphones used in clinical conditions.
Collapse
Affiliation(s)
- M Enes Ceylan
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - M Emrah Cangi
- University of Health Sciences, Speech and Language Therapy, Selimiye, Tıbbiye Cd No: 38, Istanbul, 34668, Üsküdar, Türkiye.
| | - Göksu Yılmaz
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Beyza Sena Peru
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Özgür Yiğit
- Istanbul Şişli Hamidiye Etfal Training and Research Hospital, Istanbul, Türkiye
| |
Collapse
|
17
|
Dhawan K, Varghese A, Kumar N, Varghese SS. Utility of Smart Phones as a Voice Acquisition Device for Assessing Pre and Post Treatment Voice Using PRAAT. Indian J Otolaryngol Head Neck Surg 2023; 75:2901-2906. [PMID: 37974690 PMCID: PMC10645755 DOI: 10.1007/s12070-023-03884-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/08/2023] [Indexed: 11/19/2023] Open
Abstract
Voice assessment before and after treatment helps the clinician to assess the effectiveness of the treatment given and facilitates comparison between different treatment modalities. Voice handicap index -10(VHI-10) questionnaire is a tool which allows the voice to be evaluated subjectively from the patient's perspective. PRAAT is a freely available, software programme that acoustically analyse voice signals. Smart phones are widely used and the high quality of the embedded microphone in it makes it a suitable and easily available voice recording device. This study aims at using PRAAT and VHI-10 questionnaire in evaluating voice before and after treatment. The utility of smart phones as a voice acquisition device is also explored in the study. Prospective, observational study, carried out from 1st November 2019 to 30th September 2021in the ENT out- patient department at a tertiary hospital in Punjab. 58 patients complaining of dysphonia were enrolled consecutively in the study. All patients underwent detailed history, examination of the larynx using 70-degree rigid laryngoscope. The voice handicap was scored by (VHI-10) questionnaire and acoustic evaluation of voice was done using the PRAAT software. Patients' voice was further evaluated 3 months post-therapy with VHI 10 questionnaire and acoustic analysis. The parameters measured on PRAAT were mean pitch, jitter (local), shimmer (local), and mean harmonics to noise ratio (HNR). The voice was recorded using a smart phone and later transferred onto a laptop for analysis. The pre and post treatment acoustic parameters and VHI-10 scores were compared and correlated. There was significant difference (p < 0.001) between the pre and post treatment VHI-10 scores and all the acoustic parameters measured except for median pitch (p = 0.995). A poor positive correlation was found between the pre treatment VHI-10 scores and jitter(r = 0.188, p = 0.157) and shimmer (r = 0.288, p = 0.028) values. A negative correlation was observed between pre treatment VHI-10 scores and pitch (r = - 0.151, p = 0.259) and HNR(r = - 0.424, p = 0.001). Post treatment VHI-10 scores showed positive correlation with jitter (r = 0.302, p = 0.021) and shimmer (0.162, p = 0.225) values and negative correlation with pitch (r = - 0.10, p = 0.457) and HNR (r = - 0.356, p = 0.006) values. We found significant differences in the VHI-10 scores and PRAAT voice analysis results before and after treatment in patients complaining with voice change (dysphonia). VHI-10 questionnaire and PRAAT are good and convenient tools for assessing the voice subjectively and objectively. Only a poor to fair correlation was found between VHI-10 scores and PRAAT analysis results. More studies must be done to confirm the utility of smart phones as a voice acquisition device and PRAAT software in voice analysis.
Collapse
Affiliation(s)
- Kaffy Dhawan
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| | - Ashish Varghese
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| | - Navneet Kumar
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| | - Sunil Sam Varghese
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| |
Collapse
|
18
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Pribuišis K, Ulozienė I, Blažauskas T, Damaševičius R, Maskeliūnas R. Smartphone-Based Voice Wellness Index Application for Dysphonia Screening and Assessment: Development and Reliability. J Voice 2023:S0892-1997(23)00330-2. [PMID: 37980209 DOI: 10.1016/j.jvoice.2023.10.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/12/2023] [Accepted: 10/12/2023] [Indexed: 11/20/2023]
Abstract
OBJECTIVE This study aimed to develop a Voice Wellness Index (VWI) application combining the acoustic voice quality index (AVQI) and glottal function index (GFI) data and to evaluate its reliability in quantitative voice assessment and normal versus pathological voice differentiation. STUDY DESIGN Cross-sectional study. METHODS A total of 135 adult participants (86 patients with voice disorders and 49 patients with normal voices) were included in this study. Five iOS and Android smartphones with the "Voice Wellness Index" app installed were used to estimate VWI. The VWI data obtained using smartphones were compared with VWI measurements computed from voice recordings collected from a reference studio microphone. The diagnostic efficacy of VWI in differentiating between normal and disordered voices was assessed using receiver operating characteristics (ROC). RESULTS With a Cronbach's alpha of 0.972 and an ICC of 0.972 (0.964-0.979), the VWI scores of the individual smartphones demonstrated remarkable inter-smartphone agreement and reliability. The VWI data obtained from different smartphones and a studio microphone showed nearly perfect direct linear correlations (r = 0.993-0.998). Depending on the individual smartphone device used, the cutoff scores of VWI related to differentiating between normal and pathological voice groups were calculated as 5.6-6.0 with the best balance between sensitivity (94.10-95.15%) and specificity (93.68-95.72%), The diagnostic accuracy was excellent in all cases, with an area under the curve (AUC) of 0.970-0.974. CONCLUSION The "Voice Wellness Index" application is an accurate and reliable tool for voice quality measurement and normal versus pathological voice screening and has considerable potential to be used by healthcare professionals and patients for voice assessment.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania.
| | - Ingrida Ulozienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Tomas Blažauskas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| |
Collapse
|
19
|
McKenna VS, Roberts RM, Friedman AD, Shanley SN, Llico AF. Impact of naturalistic smartphone positioning on acoustic measures of voicea). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:323-333. [PMID: 37450331 DOI: 10.1121/10.0020176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023]
Abstract
Smartphone technology has been used for at-home health monitoring, but there are few available applications (apps) for tracking acoustic measures of voice for those with chronic voice problems. Current apps limit the user by restricting the range of smartphone positions to those that are unnatural and non-interactive. Therefore, we aimed to understand how more natural smartphone positions impacted the accuracy of acoustic measures in comparison to clinically acquired and derived measures. Fifty-six adults (11 vocally healthy, 45 voice disordered, aged 18-80 years) completed voice recordings while holding their smartphones in four different positions (e.g., as if reading from the phone, up to the ear, etc.) while a head-mounted high-quality microphone attached to a handheld acoustic recorder simultaneously captured voice recordings. Comparisons revealed that mean fundamental frequency (Hz), maximum phonation time (s), and cepstral peak prominence (CPP; dB) were not impacted by phone position; however, CPP was significantly lower on smartphone recordings than handheld recordings. Spectral measures (low-to-high spectral ratio, harmonics-to-noise ratio) were impacted by the phone position and the recording device. These results indicate that more natural phone positions can be used to capture specific voice measures, but not all are directly comparable to clinically derived values.
Collapse
Affiliation(s)
- Victoria S McKenna
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio 45267, USA
| | - Rachel M Roberts
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio 45267, USA
| | - Aaron D Friedman
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio 45267, USA
| | - Savannah N Shanley
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio 45267, USA
| | - Andres F Llico
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio 45221, USA
| |
Collapse
|
20
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Pribuišis K, Blažauskas T, Damaševičius R, Maskeliūnas R. Reliability of Universal-Platform-Based Voice Screen Application in AVQI Measurements Captured with Different Smartphones. J Clin Med 2023; 12:4119. [PMID: 37373811 DOI: 10.3390/jcm12124119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/15/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
The aim of the study was to develop a universal-platform-based (UPB) application suitable for different smartphones for estimation of the Acoustic Voice Quality Index (AVQI) and evaluate its reliability in AVQI measurements and normal and pathological voice differentiation. Our study group consisted of 135 adult individuals, including 49 with normal voices and 86 patients with pathological voices. The developed UPB "Voice Screen" application installed on five iOS and Android smartphones was used for AVQI estimation. The AVQI measures calculated from voice recordings obtained from a reference studio microphone were compared with AVQI results obtained using smartphones. The diagnostic accuracy of differentiating normal and pathological voices was evaluated by applying receiver-operating characteristics. One-way ANOVA analysis did not detect statistically significant differences between mean AVQI scores revealed using a studio microphone and different smartphones (F = 0.759; p = 0.58). Almost perfect direct linear correlations (r = 0.991-0.987) were observed between the AVQI results obtained with a studio microphone and different smartphones. An acceptable level of precision of the AVQI in discriminating between normal and pathological voices was yielded, with areas under the curve (AUC) displaying 0.834-0.862. There were no statistically significant differences between the AUCs (p > 0.05) obtained from studio and smartphones' microphones. The significant difference revealed between the AUCs was only 0.028. The UPB "Voice Screen" application represented an accurate and robust tool for voice quality measurements and normal vs. pathological voice screening purposes, demonstrating the potential to be used by patients and clinicians for voice assessment, employing both iOS and Android smartphones.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Tomas Blažauskas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| |
Collapse
|
21
|
Awan SN, Shaikh MA, Awan JA, Abdalla I, Lim KO, Misono S. Smartphone Recordings are Comparable to "Gold Standard" Recordings for Acoustic Measurements of Voice. J Voice 2023:S0892-1997(23)00031-0. [PMID: 37019804 PMCID: PMC10545813 DOI: 10.1016/j.jvoice.2023.01.031] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 04/07/2023]
Abstract
PURPOSE The purpose of this study was to assess the relationship and comparability of cepstral and spectral measures of voice obtained from a high-cost "flat" microphone and precision sound level meter (SLM) vs. high-end and entry level models of commonly and currently used smartphones (iPhone i12 and iSE; Samsung s21 and s9 smartphones). Device comparisons were also conducted in different settings (sound-treated booth vs. typical "quiet" office room) and at different mouth-to-microphone distances (15 and 30 cm). METHODS The SLM and smartphone devices were used to record a series of speech and vowel samples from a prerecorded diverse set of 24 speakers representing a wide range of sex, age, fundamental frequency (F0), and voice quality types. Recordings were analyzed for the following measures: smoothed cepstral peak prominence (CPP in dB); the low vs high spectral ratio (L/H Ratio in dB); and the Cepstral Spectral Index of Dysphonia (CSID). RESULTS A strong device effect was observed for L/H Ratio (dB) in both vowel and sentence contexts and for CSID in the sentence context. In contrast, device had a weak effect on CPP (dB), regardless of context. Recording distance was observed to have a small-to-moderate effect on measures of CPP and CSID but had a negligible effect on L/H Ratio. With the exception of L/H Ratio in the vowel context, setting was observed to have a strong effect on all three measures. While these aforementioned effects resulted in significant differences between measures obtained with SLM vs. smartphone devices, the intercorrelations of the measurements were extremely strong (r's > 0.90), indicating that all devices were able to capture the range of voice characteristics represented in the voice sample corpus. Regression modeling showed that acoustic measurements obtained from smartphone recordings could be successfully converted to comparable measurements obtained by a "gold standard" (precision SLM recordings conducted in a sound-treated booth at 15 cm) with small degrees of error. CONCLUSIONS These findings indicate that a variety of commonly available modern smartphones can be used to collect high quality voice recordings usable for informative acoustic analysis. While device, setting, and distance can have significant effects on acoustic measurements, these effects are predictable and can be accounted for using regression modeling.
Collapse
Affiliation(s)
- Shaheen N Awan
- University of South Florida, Dept. of Communication Sciences & Disorders, Tampa FL 33620.
| | - Mohsin Ahmed Shaikh
- Commonwealth University of Pennsylvania, Dept. of Communication Sciences & Disorders, Bloomsburg PA 17815
| | - Jordan A Awan
- Purdue University, Dept. of Statistics, Mathematical Sciences Building, 150 N. University Street, West Lafayette, IN 47907
| | - Ibrahim Abdalla
- University of Minnesota Medical School, 420 Delaware Street SE, Minneapolis, MN 55455
| | - Kelvin O Lim
- University of Minnesota Medical School, Dept. of Psychiatry and Behavioral Sciences, 420 Delaware Street SE, Minneapolis, MN 55455
| | - Stephanie Misono
- University of Minnesota Medical School, Division of Laryngology, Department of Otolaryngology, Head and Neck Surgery, 420 Delaware Street SE, Minneapolis, MN 55455
| |
Collapse
|
22
|
Cavalcanti JC, Englert M, Oliveira M, Constantini AC. Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study. J Voice 2023; 37:162-172. [PMID: 33451892 DOI: 10.1016/j.jvoice.2020.12.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/01/2020] [Accepted: 12/03/2020] [Indexed: 11/29/2022]
Abstract
OBJECTIVE This study aimed to analyze the effects of microphone and audio compression variables on voice and speech parameters acquisition. METHOD Acoustic measures were recorded and compared using a high-quality reference microphone and three testing microphones. The tested microphones displayed differences in specifications and acoustic properties. Furthermore, the impact of the audio compression was assessed by resampling the original uncompressed audio files into the MPEG-1/2 Audio Layer 3 (mp3) format at three different compression rates (128 kbps, 64 kbps, 32 kbps). Eight speakers were recruited in each recording session and asked to produce four sustained vowels: two [a] segments and two [ɛ] segments. The audio was captured simultaneously by the reference and tested microphones. The recordings were synchronized and analyzed using the Praat software. RESULTS From a set of eight acoustic parameters assessed (f0, F1, F2, jitter%, shimmer%, HNR, H1-H2, and CPP), three (f0, F2, and jitter%) were suggested as resistant regarding the microphone and audio compression variables. In contrast, some parameters seemed to be significantly affected by both factors: HNR, H1-H2, and CPP; while shimmer% was found sensitive only concerning the latter factor. Moreover, higher compression rates appeared to yield more frequent acoustic distortions than lower rates. CONCLUSION Overall, the outcomes suggest that acoustic parameters are influenced by both the microphone selection and the audio compression usage, which may reflect the practical implications of these components on the acoustic analysis reliability.
Collapse
Affiliation(s)
- Julio Cesar Cavalcanti
- Universidade Estadual de Campinas (UNICAMP), Institute of Language Studies, Campinas - SP, Brazil.
| | - Marina Englert
- Universidade Federal de São Paulo (UNIFESP), Department of Communication Disorders, São Paulo - SP, Brazil; Centro de Estudos da Voz (CEV), São Paulo - SP, Brazil
| | - Miguel Oliveira
- Universidade Federal de Alagoas (UFAL), Department of Letters, Maceió - AL, Brazil
| | | |
Collapse
|
23
|
Comparison of In-Person and Online Recordings in the Clinical Teleassessment of Speech Production: A Pilot Study. Brain Sci 2023; 13:brainsci13020342. [PMID: 36831885 PMCID: PMC9953872 DOI: 10.3390/brainsci13020342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/12/2023] [Accepted: 02/15/2023] [Indexed: 02/22/2023] Open
Abstract
In certain circumstances, speech and language therapy is proposed in telepractice as a practical alternative to in-person services. However, little is known about the minimum quality requirements of recordings in the teleassessment of motor speech disorders (MSD) utilizing validated tools. The aim here is to examine the comparability of offline analyses based on speech samples acquired from three sources: (1) in-person recordings with high quality material, serving as the baseline/gold standard; (2) in-person recordings with standard equipment; (3) online recordings from videoconferencing. Speech samples were recorded simultaneously from these three sources in fifteen neurotypical speakers performing a screening battery of MSD and analyzed by three speech and language therapists. Intersource and interrater agreements were estimated with intraclass correlation coefficients on seventeen perceptual and acoustic parameters. While the interrater agreement was excellent for most speech parameters, especially on high quality in-person recordings, it decreased in online recordings. The intersource agreement was excellent for speech rate and mean fundamental frequency measures when comparing high quality in-person recordings to the other conditions. The intersource agreement was poor for voice parameters, but also for perceptual measures of intelligibility and articulation. Clinicians who plan to teleassess MSD should adapt their recording setting to the parameters they want to reliably interpret.
Collapse
|
24
|
An iOS-based VoiceScreen application: feasibility for use in clinical settings-a pilot study. Eur Arch Otorhinolaryngol 2023; 280:277-284. [PMID: 35906420 PMCID: PMC9811036 DOI: 10.1007/s00405-022-07546-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/06/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVES To elaborate the application suitable for smartphones for estimation of Acoustic Voice Quality Index (AVQI) and evaluate its usability in the clinical setting. METHODS An elaborated AVQI automatization and background noise monitoring functions were implemented into a mobile "VoiceScreen" application running the iOS operating system. A study group consisted of 103 adult individuals with normal voices (n = 30) and 73 patients with pathological voices. Voice recordings were performed in the clinical setting with "VoiceScreen" app using iPhone 8 microphones. Voices of 30 patients were recorded before and 1 month after phonosurgical intervention. To evaluate the diagnostic accuracy differentiating normal and pathological voice, the receiver-operating characteristic statistics, i.e., area under the curve (AUC), sensitivity and specificity, and correct classification rate (CCR) were used. RESULTS A high level of precision of AVQI in discriminating between normal and dysphonic voices was yielded with corresponding AUC = 0.937. The AVQI cutoff score of 3.4 demonstrated a sensitivity of 86.3% and specificity of 95.6% with a CCR of 89.2%. The preoperative mean value of the AVQI [6.01(SD 2.39)] in the post-phonosurgical follow-up group decreased to 2.00 (SD 1.08). No statistically significant differences (p = 0.216) between AVQI measurements in a normal voice and 1-month follow-up after phonosurgery groups were revealed. CONCLUSIONS The "VoiceScreen" app represents an accurate and robust tool for voice quality measurement and demonstrates the potential to be used in clinical settings as a sensitive measure of voice changes across phonosurgical treatment outcomes.
Collapse
|
25
|
Rodríguez Marconi D, Morales C, Araya P, Ferrada R, Ibarra M, Catrifol MT. Uso del smartphone en telepráctica para trastornos de la voz. Una revisión desde el concepto de Mhealth. REVISTA DE INVESTIGACIÓN EN LOGOPEDIA 2022. [DOI: 10.5209/rlog.78550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
El uso de los smartphones y el concepto de mobile health (mHealth) es reciente en vocología y sus posibles beneficios en el tratamiento y entrenamiento vocal en contexto de telepráctica. Se realizó una revisión narrativa con el objetivo describir los beneficios de la mHealth a través del smartphone en el contexto de la telepráctica fonoaudiológica de los trastornos vocales. Se buscaron artículos científicos en Pubmed, ScienceDirect y Google Scholar, asociados al uso del smartphone en vocología, considerando voces humanas normales, patológicas y voces sintéticas; relacionados a la intervención, evaluación, valoración, monitoreo, prevención, intervención, supervisión, educación, consulta y entrenamiento vocal. Se revisaron 42 estudios, de los cuales fueron seleccionados 15 de acuerdo a los criterios de inclusión. Los estudios analizados se relacionan con grabación de voz para análisis acústico con smartphone, teleterapia con smartphone y dispositivos periféricos para análisis vocal y seguimiento. Se destaca el potencial de los dispositivos móviles para incrementar accecibilidad, reducir costos y favorecer el seguimiento terapéutico con medidas objetivas en diversos contextos de salud vocal.
Collapse
|
26
|
Zakariah M, B R, Ajmi Alotaibi Y, Guo Y, Tran-Trung K, Elahi MM. An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7814952. [PMID: 35529259 PMCID: PMC9071878 DOI: 10.1155/2022/7814952] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 02/17/2022] [Accepted: 03/07/2022] [Indexed: 11/17/2022]
Abstract
Diseases of internal organs other than the vocal folds can also affect a person's voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the "continuous sentence" audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% (cordectomy × healthy). As a result, the suggested framework is the best fit for the healthcare industry.
Collapse
Affiliation(s)
- Mohammed Zakariah
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 21574, Saudi Arabia
| | - Reshma B
- Division of Electronics Engineering, School of Engineering, Cochin University of Science and Technology, India
| | - Yousef Ajmi Alotaibi
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 21574, Saudi Arabia
| | | | - Kiet Tran-Trung
- Faculty of Computer Science, Ho Chi Minh City Open University, 97 Vo Van Tan, Ward Vo Thi Sau, District 3, Ho Chi Minh City Code postal: 70000, Vietnam
| | - Mohammad Mamun Elahi
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| |
Collapse
|
27
|
Awan SN, Shaikh MA, Desjardins M, Feinstein H, Abbott KV. The Effect of Microphone Frequency Response on Spectral and Cepstral Measures of Voice: An Examination of Low-Cost Electret Headset Microphones. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2022; 31:959-973. [PMID: 35050724 PMCID: PMC9150670 DOI: 10.1044/2021_ajslp-21-00156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/12/2021] [Accepted: 10/11/2021] [Indexed: 06/14/2023]
Abstract
PURPOSE The purpose of this study was to establish the frequency response of a selection of low-cost headset microphones that could be given to subjects for remote voice recordings and to examine the effect of microphone type and frequency response on key acoustic measures related to voice quality obtained from speech and vowel samples. METHOD The frequency responses of three low-cost headset microphones were evaluated using pink noise generated via a head-and-torso model. Each of the headset microphones was then used to record a series of speech and vowel samples prerecorded from 24 speakers who represented a diversity of sex, age, fundamental frequency (F o), and voice quality types. Recordings were later analyzed for the following measures: smoothed cepstral peak prominence (CPP; dB), low versus high spectral ratio (L/H ratio; dB), CPP F o (Hz), and cepstral spectral index of dysphonia (CSID). RESULTS The frequency response of the microphones under test was observed to have nonsignificant effects on measures of the CPP and CPP F o, significant effects on the CSID in speech contexts, and strong and significant effects on the measure of spectral tilt (L/H ratio). However, the correlations between the various headset microphones and a reference precision microphone were excellent (rs > .90). CONCLUSIONS The headset microphones under test all showed the capability to track a wide range of diversity in the voice signal. Though the use of higher quality microphones that have demonstrated specifications is recommended for typical research and clinical purposes, low-cost electret microphones may be used to provide valid measures of voice, specifically when the same microphone and signal chain is used for the evaluation of pre- versus posttreatment change or intergroup comparisons.
Collapse
Affiliation(s)
- Shaheen N. Awan
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Mohsin A. Shaikh
- Department of Communication Sciences and Disorders, Bloomsburg University of Pennsylvania
| | - Maude Desjardins
- Department of Communication Sciences & Disorders, University of Delaware, Newark
| | - Hagar Feinstein
- Department of Communication Sciences & Disorders, University of Delaware, Newark
| | | |
Collapse
|
28
|
Castillo-Allendes A, Contreras-Ruston F, Cantor L, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Terapia de voz en el contexto de la pandemia covid-19; recomendaciones para la práctica clínica. J Voice 2021; 35:808.e1-808.e12. [PMID: 32917457 PMCID: PMC7442931 DOI: 10.1016/j.jvoice.2020.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Since the beginning of the new pandemic, COVID-19 health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from 5 different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULTS The clinical guide provides 79 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendations; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Francisco Contreras-Ruston
- Speech-Language Pathology and Audiology Department, Universidad de Valparaíso, San Felipe, Chile,Address correspondence and reprint requests to Francisco Contreras-Ruston, CEV–Centro de Estudos da Voz, Rua Machado Bittencourt, 361, SP 04044-001, Brazil
| | - Lady Cantor
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia,Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan,Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México,Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina,Servicio de Fonoudiología, Hospital de Clínicas “José de San Martin”, Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
29
|
Castillo-Allendes A, Contreras-Ruston F, Cantor L, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Terapia Vocal No Contexto Da Pandemia Do Covid-19; Orientações Para A Prática Clínica. J Voice 2021; 35:808.e13-808.e24. [PMID: 32917460 PMCID: PMC7439998 DOI: 10.1016/j.jvoice.2020.08.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Since the beginning of the new pandemic, Corona Virus Disease 2019 (COVID-19) health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from five different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULT The clinical guide provides 79 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendation; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Francisco Contreras-Ruston
- Speech-Language Pathology and Audiology Department, Universidad de Valparaíso, San Felipe, Chile,Address correspondence and reprint requests to Francisco Contreras-Ruston, CEV–Centro de Estudos da Voz, Rua Machado Bittencourt, 361, SP 04044-001, Brazil
| | - Lady Cantor
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia,Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan,Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México,Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina,Servicio de Fonoudiología, Hospital de Clínicas “José de San Martin,” Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
30
|
Castillo-Allendes A, Contreras-Ruston F, Cantor-Cutiva LC, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Voice Therapy in the Context of the COVID-19 Pandemic: Guidelines for Clinical Practice. J Voice 2021; 35:717-727. [PMID: 32878736 PMCID: PMC7413113 DOI: 10.1016/j.jvoice.2020.08.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/30/2020] [Accepted: 08/03/2020] [Indexed: 01/14/2023]
Abstract
INTRODUCTION Since the beginning of the new pandemic, COVID-19 health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from five different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULTS The clinical guide provides 65 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendation; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | | | - Lady Catherine Cantor-Cutiva
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia; Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan; Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México; Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil; Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina; Servicio de Fonoudiología, Hospital de Clínicas "José de San Martin", Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil; Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
31
|
Weerathunge HR, Segina RK, Tracy L, Stepp CE. Accuracy of Acoustic Measures of Voice via Telepractice Videoconferencing Platforms. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2586-2599. [PMID: 34157251 PMCID: PMC8632479 DOI: 10.1044/2021_jslhr-20-00625] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/19/2020] [Accepted: 03/23/2021] [Indexed: 05/31/2023]
Abstract
Purpose Telepractice improves patient access to clinical care for voice disorders. Acoustic assessment has the potential to provide critical, objective information during telepractice, yet its validity via telepractice is currently unknown. The current study investigated the accuracy of acoustic measures of voice in a variety of telepractice platforms. Method Twenty-nine voice samples from individuals with dysphonia were transmitted over six video conferencing platforms (Zoom with and without enhancements, Cisco WebEx, Microsoft Teams, Doxy.me, and VSee Messenger). Standard time-, spectral-, and cepstral-based acoustic measures were calculated. The effect of transmission condition on each acoustic measure was assessed using repeated-measures analyses of variance. For those acoustic measures for which transmission condition was a significant factor, linear regression analysis was performed on the difference between the original recording and each telepractice platform, with the overall severity of dysphonia, Internet speed, and ambient noise from the transmitter as predictors. Results Transmission condition was a statistically significant factor for all acoustic measures except for mean fundamental frequency (f o). Ambient noise from the transmitter was a significant predictor of differences between platforms and the original recordings for all acoustic measures except f o measures. All telepractice platforms affected acoustic measures in a statistically significantly manner, although the effects of platforms varied by measure. Conclusions Overall, measures of f o were the least impacted by telepractice transmission. Microsoft Teams had the least and Zoom (with enhancements) had the most pronounced effects on acoustic measures. These results provide valuable insight into the relative validity of acoustic measures of voice when collected via telepractice. Supplemental Material https://doi.org/10.23641/asha.14794812.
Collapse
Affiliation(s)
- Hasini R. Weerathunge
- Department of Biomedical Engineering, Boston University, MA
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Roxanne K. Segina
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Lauren Tracy
- Department of Otolaryngology—Head & Neck Surgery, Boston University School of Medicine, MA
| | - Cara E. Stepp
- Department of Biomedical Engineering, Boston University, MA
- Department of Speech, Language and Hearing Sciences, Boston University, MA
- Department of Otolaryngology—Head & Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
32
|
Zhang C, Jepson K, Lohfink G, Arvaniti A. Comparing acoustic analyses of speech data collected remotely. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3910. [PMID: 34241427 PMCID: PMC8269758 DOI: 10.1121/10.0005132] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 05/11/2021] [Accepted: 05/12/2021] [Indexed: 06/01/2023]
Abstract
Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.
Collapse
Affiliation(s)
- Cong Zhang
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| | - Kathleen Jepson
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| | - Georg Lohfink
- School of European Culture and Languages, University of Kent, Canterbury, Kent, CT2 7NF, United Kingdom
| | - Amalia Arvaniti
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| |
Collapse
|
33
|
Pierce JL, Tanner K, Merrill RM, Shnowske L, Roy N. A Field-Based Approach to Establish Normative Acoustic Data for Healthy Female Voices. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:691-706. [PMID: 33561361 DOI: 10.1044/2020_jslhr-20-00490] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose The primary aim of this study was to obtain high-quality acoustic normative data in natural field environments for female voices. A secondary aim was to examine acoustic measurement variability in field environments. Method This study employed a within-subject repeated-measures experimental design that included 45 young female adults with normal voices. Participants were stratified by age (18-23, 24-29, and 30-35 years). After initial evaluation and instruction, participants completed voice recordings during seven consecutive days using a standard protocol, including both connected speech and sustained vowels. Thirty-two cepstral-, spectral-, and time-based acoustic measures were acquired using Praat and the Analysis of Dysphonia in Speech and Voice. Results Among the 958 total recordings, greater than 90% satisfied inclusion criteria based on protocol compliance, peak clipping, and signal-to-noise ratio. Significant differences were observed for age (p < .05). For 19 acoustic measures, values improved significantly as signal-to-noise ratio increased. Cepstral- and spectral-based measures demonstrated less measurement variability as compared with time-based measures. Conclusions With adequate training, field audio recordings represent a viable option for clinical voice management. The significant age effects observed in this study support the need for more specific criteria when collecting and applying normative data. Cepstral- and spectral-based measures demonstrated the least measurement variability. This study provides additional evidence for multiparameter acoustic voice measurement, specifically toward ecologically valid sampling in natural environments. Future studies should expand on these findings in other populations with normal and disordered voices.
Collapse
Affiliation(s)
- Jenny L Pierce
- Department of Surgery, The University of Utah, Salt Lake City
- Department of Communication Sciences & Disorders, The University of Utah, Salt Lake City
| | - Kristine Tanner
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Ray M Merrill
- Department of Public Health, Brigham Young University, Provo, UT
| | - Lauren Shnowske
- Department of Communication Sciences & Disorders, The University of Utah, Salt Lake City
- Department of Communication Sciences and Disorders, University of Kentucky, Lexington
| | - Nelson Roy
- Department of Communication Sciences & Disorders, The University of Utah, Salt Lake City
| |
Collapse
|
34
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Kregždytė R. Accuracy of Acoustic Voice Quality Index Captured With a Smartphone - Measurements With Added Ambient Noise. J Voice 2021; 37:465.e19-465.e26. [PMID: 33676807 DOI: 10.1016/j.jvoice.2021.01.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To evaluate the accuracy of Acoustic Voice Quality Index (AVQI) measures obtained from voice recordings simultaneously using oral and smartphone microphones in a sound-proof room, and to compare them with AVQIs obtained from the same smartphone voice recordings with added ambient noise. METHODS A study group of 183 subjects with normal voices (n = 86) and various voice disorders (n = 97) was asked to read aloud a standard text and sustain the vowel /a/. The controlled ambient noise averaged at 29.61 dB SPL was added digitally to the smartphone voice recordings. Repeated measures analysis of variances (ANOVA) with Greenhouse-Geiser correction was used to evaluate AVQI changes within subjects. To evaluate the level of agreement between AVQI measurements obtained from different voice recordings Bland-Altman plots were used. RESULTS Repeated measures ANOVA showed that differences among AVQI results obtained from voice recordings done with oral studio microphone, recordings done with a smartphone microphone, and recordings done with a smartphone microphone with added ambient noise were not statistically significant (P = 0.07). No significant systemic differences and acceptable level of random errors in AVQI measurements of voice recordings made with oral and smartphone microphones (including added noise) were revealed. CONCLUSION The AVQI measures obtained from smartphone microphones voice recordings with experimentally added ambient noise revealed an acceptable agreement with results of oral microphone recordings, thus suggesting the suitability of smartphone microphone recordings performed even in the presence of acceptable ambient noise for estimation of AVQI.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania.
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Rima Kregždytė
- Department of Preventive Medicine, Lithuanian University of Health Sciences, Kaunas, Lithuania
| |
Collapse
|
35
|
Are Acoustic Markers of Voice and Speech Signals Affected by Nose-and-Mouth-Covering Respiratory Protective Masks? J Voice 2021; 37:468.e1-468.e12. [PMID: 33608184 PMCID: PMC7885637 DOI: 10.1016/j.jvoice.2021.01.013] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/11/2021] [Accepted: 01/13/2021] [Indexed: 11/24/2022]
Abstract
Background Worldwide use of nose-and-mouth-covering respiratory protective mask (RPM) has become ubiquitous during COVID19 pandemic. Consequences of wearing RPMs, especially regarding perception and production of spoken communication, are gradually emerging. The present study explored how three prevalent RPMs affect various speech and voice sound properties. Methods Pre-recorded sustained [a] vowels and read sentences from 47 subjects were played by a speech production model (‘Voice Emitted by Spare Parts’, or ‘VESPA’) in four conditions: without RPM (C1), with disposable surgical mask (C2), with FFP2 mask (C3), and with transparent plastic mask (C4). Differences between C1 and masked conditions were assessed with Dunnett's t test in 26 speech sound properties related to voice production (fundamental frequency, sound intensity level), voice quality (jitter percent, shimmer percent, harmonics-to-noise ratio, smoothed cepstral peak prominence, Acoustic Voice Quality Index), articulation and resonance (first and second formant frequencies, first and second formant bandwidths, spectral center of gravity, spectral standard deviation, spectral skewness, spectral kurtosis, spectral slope, and spectral energy in ten 1-kHz bands from 0 to 10 kHz). Results C2, C3, and C4 significantly affected 10, 15, and 19 of the acoustic speech markers, respectively. Furthermore, absolute differences between unmasked and masked conditions were largest for C4 and smallest for C2. Conclusions All RPMs influenced more or less speech sound properties. However, this influence was least for surgical RPMs and most for plastic RPMs. Surgical RPMs are therefore preferred when spoken communication is priority next to respiratory protection.
Collapse
|
36
|
Robin J, Harrison JE, Kaufman LD, Rudzicz F, Simpson W, Yancheva M. Evaluation of Speech-Based Digital Biomarkers: Review and Recommendations. Digit Biomark 2020; 4:99-108. [PMID: 33251474 DOI: 10.1159/000510820] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 08/11/2020] [Indexed: 12/23/2022] Open
Abstract
Speech represents a promising novel biomarker by providing a window into brain health, as shown by its disruption in various neurological and psychiatric diseases. As with many novel digital biomarkers, however, rigorous evaluation is currently lacking and is required for these measures to be used effectively and safely. This paper outlines and provides examples from the literature of evaluation steps for speech-based digital biomarkers, based on the recent V3 framework (Goldsack et al., 2020). The V3 framework describes 3 components of evaluation for digital biomarkers: verification, analytical validation, and clinical validation. Verification includes assessing the quality of speech recordings and comparing the effects of hardware and recording conditions on the integrity of the recordings. Analytical validation includes checking the accuracy and reliability of data processing and computed measures, including understanding test-retest reliability, demographic variability, and comparing measures to reference standards. Clinical validity involves verifying the correspondence of a measure to clinical outcomes which can include diagnosis, disease progression, or response to treatment. For each of these sections, we provide recommendations for the types of evaluation necessary for speech-based biomarkers and review published examples. The examples in this paper focus on speech-based biomarkers, but they can be used as a template for digital biomarker development more generally.
Collapse
Affiliation(s)
| | - John E Harrison
- Metis Cognition Ltd., Park House, Kilmington Common, Warminster, United Kingdom.,Alzheimer Center, AUmc, Amsterdam, The Netherlands.,Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | | | - Frank Rudzicz
- Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - William Simpson
- Winterlight Labs, Toronto, Ontario, Canada.,Department of Psychiatry and Behavioural Neuroscience, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
37
|
Petrizzo D, Popolo PS. Smartphone Use in Clinical Voice Recording and Acoustic Analysis: A Literature Review. J Voice 2020; 35:499.e23-499.e28. [PMID: 32736910 DOI: 10.1016/j.jvoice.2019.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 10/11/2019] [Accepted: 10/11/2019] [Indexed: 11/30/2022]
Abstract
OBJECTIVE With the increase of smartphone use and availability over the last decade, mobile healthcare applications have become more accessible. Many of these applications allow users to track behaviors and goals, and acquire feedback and information while on the go. Recent studies appearing in the literature suggest that smartphones may offer a means of augmenting clinical voice assessment by recording individuals with voice disorders outside the clinic for the purpose of extracting acoustic characteristics. This review examines the effectiveness of smartphones in clinical voice assessment and treatment, as reported in the current literature. METHODS The PubMed database was searched using a combination and variation of different term related to smartphones, voice, and recording apps, in order to find articles that address the role of smartphones in clinical voice recording and assessment. RESULTS AND CONCLUSION Six studies published in the last 3 years were reviewed and examined in terms of types of device and operating systems used, types of subjects and disorders studied, voice parameters extracted, and microphones used. Considerations such as impact of environmental noise, and privacy and security issues are also examined. While smartphones and mobile apps have the potential to be valuable tools in voice assessment outside the clinic, further efforts are needed for them to be effectively used in a clinical setting.
Collapse
Affiliation(s)
- Danielle Petrizzo
- Department of Communication Sciences and Disorders, Montclair State University, Montclair, New Jersey
| | | |
Collapse
|
38
|
Illner V, Sovka P, Rusz J. Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101831] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
39
|
Maryn Y, Leblans M, Zarowski A, Barkmeier-Kraemer J. Objective Acoustic Quantification of Perceived Voice Tremor Severity. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3689-3705. [PMID: 31619112 DOI: 10.1044/2019_jslhr-s-19-0024] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose This study compared auditory-perceptual measures of presence/absence and severity of vocal tremor to acoustic markers of vocal tremor. The validity (both concurrent and diagnostic) of various acoustic markers of vocal tremor was also assessed. Method Fifty-six midvowel sustained [a:] recordings were selected to yield a representative convenience sample of vocal tremor. After training with 10 synthesized samples, 4 female audiologists rated these samples on "voice tremor severity" on a continuous 10-cm scale. Afterward, 15 randomly selected recordings were presented a 2nd time for intrarater reliability assessment. Customized audio signal processing in Praat yielded 12 acoustic measures of rate, extent and perturbation of fundamental frequency (f 0), and intensity level (IL) modulation. Enter-type multiple linear regression analysis was applied to weight and combine these acoustic variables into an acoustic model of vocal tremor severity. Results After removing the vocal tremor severity ratings of 1 of the audiologists because of insufficient intra- and interrater reliability, mean single-measures consistency-type intraclass correlation coefficients equaled .83 within raters and .72 between raters. Correlation between mean ratings and the 12 acoustic markers ranged from .76 for median extent of f 0 modulation to .11 for rate of IL modulation. Correlation between mean ratings and the acoustic model was .89. Analysis of this model's receiver operating characteristics yielded an area under receiver operating characteristic curve of .93, denoting sensitivity of .87 and specificity of .91. Conclusions This study demonstrated that auditory-perceptual ratings of vocal tremor severity are guided primarily by f 0 and IL modulation extent, less by modulation perturbation, and least by modulation rate. The acoustic model covering all these modulation properties yielded acceptable results in terms of both concurrent and diagnostic validity. However, external cross-validation of this model is warranted before applying it in clinical voice/speech assessment.
Collapse
Affiliation(s)
- Youri Maryn
- Department of Otorhinolaryngology and Head & Neck Surgery, European Institute for ORL-HNS, GZA Sint-Augustinus, Wilrijk, Belgium
- Department of Speech, Language and Hearing Sciences, Faculty of Medicine and Health Sciences, University of Ghent, Belgium
- Faculty of Education, Health & Social Work, University College Ghent, Belgium
- Faculty of Psychology and Educational Sciences, School of Logopedics, Louvain-La-Neuve, Belgium
- Phonanium, Lokeren, Belgium
| | - Marc Leblans
- Department of Otorhinolaryngology and Head & Neck Surgery, European Institute for ORL-HNS, GZA Sint-Augustinus, Wilrijk, Belgium
| | - Andrzej Zarowski
- Department of Otorhinolaryngology and Head & Neck Surgery, European Institute for ORL-HNS, GZA Sint-Augustinus, Wilrijk, Belgium
| | | |
Collapse
|
40
|
Jannetts S, Schaeffler F, Beck J, Cowen S. Assessing voice health using smartphones: bias and random error of acoustic voice parameters captured by different smartphone types. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2019; 54:292-305. [PMID: 30779425 DOI: 10.1111/1460-6984.12457] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 10/31/2018] [Accepted: 01/16/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND Occupational voice problems constitute a serious public health issue with substantial financial and human consequences for society. Modern mobile technologies such as smartphones have the potential to enhance approaches to prevention and management of voice problems. This paper addresses an important aspect of smartphone-assisted voice care: the reliability of smartphone-based acoustic analysis for voice health state monitoring. AIM To assess the reliability of acoustic parameter extraction for a range of commonly used smartphones by comparison with studio recording equipment. METHODS & PROCEDURES Twenty-two vocally healthy speakers (12 female, 10 male) were recorded producing sustained vowels and connected speech under studio conditions using a high-quality studio microphone and an array of smartphones. For both types of utterance, Bland-Altman analysis was used to assess overall reliability for mean F0, cepstral peak prominence (CPPS), Jitter (RAP) and Shimmer %. OUTCOMES & RESULTS Analysis of the systematic and random error indicated significant bias for CPPS across both sustained vowels and passage reading. Analysis of the random error of the devices indicated that that mean F0 and CPPS showed acceptable random error size, while jitter and shimmer random error was judged as problematic. CONCLUSIONS & IMPLICATIONS Confidence in the feasibility of smartphone-based voice assessment is increased by the experimental finding of high levels of reliability for some clinically relevant acoustic parameters, while the use of other parameters is discouraged. We also challenge the practice of using statistical tests (e.g., t-tests) for measurement reliability assessment.
Collapse
Affiliation(s)
| | | | - Janet Beck
- CASL Research Centre, Queen Margaret University, Edinburgh, UK
| | - Steve Cowen
- CASL Research Centre, Queen Margaret University, Edinburgh, UK
| |
Collapse
|
41
|
Munnings AJ. The Current State and Future Possibilities of Mobile Phone "Voice Analyser" Applications, in Relation to Otorhinolaryngology. J Voice 2019; 34:527-532. [PMID: 30655018 DOI: 10.1016/j.jvoice.2018.12.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 12/21/2018] [Accepted: 12/26/2018] [Indexed: 10/27/2022]
Abstract
BACKGROUND A large proportion of the population suffers from voice disorders. The use of mobile phone technology in healthcare is increasing, and this includes applications that can analyze voice. OBJECTIVE This study aimed to review the potential for voice analyzer applications to aid the management of voice disorders. METHODS A literature search was conducted yielding eight studies which were further analyzed. RESULTS Seven out of the eight studies concluded that smartphone assessments were comparable to current techniques. Nevertheless there remained some common issues with using applications such as; voice parameters used; voice pathology tested; smartphone software consistency and microphone specifications. CONCLUSIONS It is clear that further developments are required before a mobile application can be used widely in voice analysis. However, promising results have been obtained thus far, and the benefits of mobile technology in this field, particularly in voice rehabilitation, warrant further research into its widespread implementation.
Collapse
|
42
|
Rusz J, Hlavnicka J, Tykalova T, Novotny M, Dusek P, Sonka K, Ruzicka E. Smartphone Allows Capture of Speech Abnormalities Associated With High Risk of Developing Parkinson’s Disease. IEEE Trans Neural Syst Rehabil Eng 2018; 26:1495-1507. [DOI: 10.1109/tnsre.2018.2851787] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|