1
|
Edmonds TJ, Howard DM. An Investigation in the Measurable Differences between Pitch Perception in the Voice and Pitch Perception of External Sound Sources. J Voice 2025; 39:656-663. [PMID: 36710198 DOI: 10.1016/j.jvoice.2022.11.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/17/2022] [Accepted: 11/18/2022] [Indexed: 01/31/2023]
Abstract
OBJECTIVES Pitch perception is an important part of accurate singing. Therefore, accurate singing requires the ability to accurately assess the pitch in one's own voice.There are two objectives of this study the first was to investigate whether there is a measurable difference in perceived pitch in one's voice to the pitch one perceives from an external sound source. The second, to measure the effects of occlusion on pitch accuracy over a melodic phrase. STUDY DESIGN We recruited 16 participants for this study. The study that was designed to investigate the perceptual difference was split into two parts. The first is a one-to-one pitch matching test where they would recreate a pitch with singing and matching external pitches. The second was singing the familiar song 'Happy Birthday' which was used to measure pitch accuracy over a melodic phrase and to measure the effects of occlusion on pitch accuracy while singing. METHODS The one-to-one study involved singing back a series of 5 notes to a set vowel which were the same 5 notes used when matching them to a series of possible pitches on the button test. The melodic test was to sing 'Happy Birthday' 3 times, the first normally, the second wearing headphones to occlude the ear to reduce air conductive hearing and the third time with white noise to mask all hearing. RESULTS The results showed a higher accuracy of pitch matching with external sounds over using their voice, and some form of occlusion (wearing headphones or headphones with white noise) showed the version with higher pitch accuracy. CONCLUSIONS The results of this study showed that there was improved pitch accuracy when comparing two external sounds in pitch and when singing occlusion of some form improved pitch accuracy. This could suggest a difference when recreating pitch between the voice and matching external sound sources. Furthermore, with the improvements shown from occluding the ears, it could further suggest a difference in pitch perception abilities between the voice and external sound sources. This could have implications of improving pitch accuracy in a studio environment.
Collapse
|
2
|
Rahmatallah Y, Kemp AS, Iyer A, Pillai L, Larson-Prior LJ, Virmani T, Prior F. Pre-trained convolutional neural networks identify Parkinson's disease from spectrogram images of voice samples. Sci Rep 2025; 15:7337. [PMID: 40025201 PMCID: PMC11873116 DOI: 10.1038/s41598-025-92105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Accepted: 02/25/2025] [Indexed: 03/04/2025] Open
Abstract
Machine learning approaches including deep learning models have shown promising performance in the automatic detection of Parkinson's disease. These approaches rely on different types of data with voice recordings being the most used due to the convenient and non-invasive nature of data acquisition. Our group has successfully developed a novel approach that uses convolutional neural network with transfer learning to analyze spectrogram images of the sustained vowel /a/ to identify people with Parkinson's disease. We tested this approach by collecting a dataset of voice recordings via analog telephone lines, which support limited bandwidth. The convolutional neural network with transfer learning approach showed superior performance against conventional machine learning methods that collapse measurements across time to generate feature vectors. This study builds upon our prior results and presents two novel contributions: First, we tested the performance of our approach on a larger voice dataset recorded using smartphones with wide bandwidth. Our results show comparable performance between two datasets generated using different recording platforms despite the differences in most important features resulting from the limited bandwidth of analog telephonic lines. Second, we compared the classification performance achieved using linear-scale and mel-scale spectrogram images and showed a small but statistically significant gain using mel-scale spectrograms.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA.
| | - Aaron S Kemp
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Anu Iyer
- Georgia Institute of Technology, Atlanta, 30332, USA
| | - Lakshmi Pillai
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Linda J Larson-Prior
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neuroscience, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Tuhin Virmani
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Fred Prior
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| |
Collapse
|
3
|
Fahed VS, Doheny EP, Busse M, Hoblyn J, Lowery MM. Comparison of Acoustic Voice Features Derived From Mobile Devices and Studio Microphone Recordings. J Voice 2025; 39:559.e1-559.e18. [PMID: 36379826 DOI: 10.1016/j.jvoice.2022.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/10/2022] [Accepted: 10/10/2022] [Indexed: 11/14/2022]
Abstract
OBJECTIVES/HYPOTHESIS Improvements in mobile device technology offer new opportunities for remote monitoring of voice for home and clinical assessment. However, there is a need to establish equivalence between features derived from signals recorded from mobile devices and gold standard microphone-preamplifiers. In this study acoustic voice features from android smartphone, tablet, and microphone-preamplifier recordings were compared. METHODS Data were recorded from 37 volunteers (20 female) with no history of speech disorder and six volunteers with Huntington's disease (HD) during sustained vowel (SV) phonation, reading passage (RP), and five syllable repetition (SR) tasks. The following features were estimated: fundamental frequency median and standard deviation (F0 and SD F0), harmonics-to-noise ratio (HNR), local jitter, relative average perturbation of jitter (RAP), five-point period perturbation quotient (PPQ5), difference of differences of amplitude and periods (DDA and DDP), shimmer, and amplitude perturbation quotients (APQ3, APQ5, and APQ11). RESULTS Bland-Altman analysis revealed good agreement between microphone and mobile devices for fundamental frequency, jitter, RAP, PPQ5, and DDP during all tasks and a bias for HNR, shimmer and its variants (APQ3, APQ5, APQ11, and DDA). Significant differences were observed between devices for HNR, shimmer, and its variants for all tasks. High correlation was observed between devices for all features, except SD F0 for RP. Similar results were observed in the HD group for SV and SR task. Biological sex had a significant effect on F0 and HNR during all tests, and for jitter, RAP, PPQ5, DDP, and shimmer for RP and SR. No significant effect of age was observed. CONCLUSIONS Mobile devices provided good agreement with state of the art, high-quality microphones during structured speech tasks for features derived from frequency components of the audio recordings. Caution should be taken when estimating HNR, shimmer and its variants from recordings made with mobile devices.
Collapse
Affiliation(s)
- Vitória S Fahed
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland.
| | - Emer P Doheny
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| | - Monica Busse
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Jennifer Hoblyn
- School of Medicine, Trinity College Dublin, Dublin, Ireland; Bloomfield Health Services, Dublin, Ireland
| | - Madeleine M Lowery
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|
4
|
Pommée T, Morsomme D. Voice Quality in Telephone Interviews: A preliminary Acoustic Investigation. J Voice 2025; 39:563.e1-563.e20. [PMID: 36192289 DOI: 10.1016/j.jvoice.2022.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/24/2022] [Accepted: 08/25/2022] [Indexed: 10/07/2022]
Abstract
OBJECTIVES To investigate the impact of standardized mobile phone recordings passed through a telecom channel on acoustic markers of voice quality and on its perception by voice experts in normophonic speakers. METHODS Continuous speech and a sustained vowel were recorded for fourteen female and ten male normophonic speakers. The recordings were done simultaneously with a head-mounted high-quality microphone and through the telephone network on a receiving smartphone. Twenty-two acoustic voice quality, breathiness and pitch-related measures were extracted from the recordings. Nine vocologists perceptually rated the G, R and B parameters of the GRBAS scale on each voice sample. The reproducibility, the recording type, the stimulus type and the gender effects, as well as the correlation between acoustic and perceptual measures were investigated. RESULTS The sustained vowel samples are damped after one second. Only the frequencies between 100 and 3700Hz are passed through the telecom channel and the frequency response is characterized by peaks and troughs. The acoustic measures show a good reproducibility over the three repetitions. All measures significantly differ between the recording types, except for the local jitter, the harmonics-to-noise ratio by Dejonckere and Lebacq, the period standard deviation and all six pitch measures. The AVQI score is higher in telephone recordings, while the ABI score is lower. Significant differences between genders are also found for most of the measures; while the AVQI is similar in men and women, the ABI is higher in women in both recording types. For the perceptual assessment, the interrater agreement is rather low, while the reproducibility over the three repetitions is good. Few significant differences between recording types are observed, except for lower breathiness ratings on telephone recordings. G ratings are significantly more severe on the sustained vowel on both recording types, R ratings only on telephone recordings. While roughness is rated higher in men on telephone recordings by most experts, no gender effect is observed for breathiness on either recording types. Finally, neither the AVQI nor the ABI yield strong correlations with any of the perceptual parameters. CONCLUSIONS Our results show that passing a voice signal through a telecom channel induces filter and noise effects that limit the use of common acoustic voice quality measures and indexes. The AVQI and ABI are both significantly impacted by the recording type. The most reliable acoustic measures seem to be pitch perturbation (local jitter and period standard deviation) as well as the harmonics-to-noise ratio from Dejonckere and Lebacq. Our results also underline that raters are not equally sensitive to the various factors, including the recording type, the stimulus type and the gender effects. Neither of the three perceptual parameters G, R and B seem to be reliably measurable on telephone recordings using the two investigated acoustic indexes. Future studies investigating the impact of voice quality in telephone conversations should thus focus on acoustic measures on continuous speech samples that are limited to the frequency response of the telecom channel and that are not too sensitive to environmental and additive noise.
Collapse
Affiliation(s)
- Timothy Pommée
- Research Unit for a life-Course perspective on Health and Education, Voice Unit, University of Liège, Belgium.
| | - Dominique Morsomme
- Research Unit for a life-Course perspective on Health and Education, Voice Unit, University of Liège, Belgium
| |
Collapse
|
5
|
Yağcıoğlu D, Esen Aydınlı F, Tunç Songur E, Şimşek S, Çetinkaya B, İncebay Ö. Can Smartphones Be Used to Record Children's Voices for Acoustic Analysis? J Voice 2025:S0892-1997(25)00044-X. [PMID: 40011181 DOI: 10.1016/j.jvoice.2025.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 02/03/2025] [Accepted: 02/03/2025] [Indexed: 02/28/2025]
Abstract
OBJECTIVES While technological advancements have enabled the utilization of smartphones for acoustic voice analysis, existing studies have predominantly focused on the adult population. However, dysphonia is prevalent in children, and their anatomy and physiology are different from those of adults. Thus, this paper aims to investigate the feasibility of using smartphones to record children's voices for acoustic voice analysis. STUDY DESIGN A methodological study. METHODS This study involved 29 children, aged 4-10 years, who had healthy voices. Voice recordings of sustained phonation and reading a sentence were obtained using four devices (1-AKG Micromic C520 headset microphone connected to a computer with the computerized speech lab (CSL), 2-Samsung S9 Plus, 3-iPhone 13 Mini, and 4-Huawei Y9 Prime). Then, all the recorded voice samples were analyzed using CSL, examining a total of 13 acoustic parameters. Pearson correlation analysis, Bland-Altman analysis, and uncertainty of measurement analysis were conducted to assess consistency and agreement across devices. RESULTS The highest correlation and accuracy between the smartphone measurements and the reference recording system were found for the fundamental frequency (F0) (r > 0.90, P < 0.01) in both speech samples. For other parameters, limited reliability was observed; while some showed weak correlations, others had low accuracy and consistency. There were, however, moderate-to-excellent correlations for the most measurements and a nonsignificant bias according to the Bland-Altman analysis. CONCLUSION This study is the first to investigate the feasibility of using smartphones for acoustic voice analysis focusing specifically on children. The results indicate that smartphone voice recordings can be used to reliably measure F0. More research is needed to improve measurement reliability for other parameters. Nonetheless, the findings demonstrate the potential for smartphones to enable accessible and reliable voice assessment in the pediatric population.
Collapse
Affiliation(s)
- Damlasu Yağcıoğlu
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Fatma Esen Aydınlı
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Elif Tunç Songur
- Speech and Language Therapy Department, Faculty of Health Sciences, Selçuk University, Konya, Turkey.
| | - Sinem Şimşek
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Buse Çetinkaya
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| | - Önal İncebay
- Speech and Language Therapy Department, Faculty of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
6
|
Hu Z, Zhang Z, Li H, Yang LZ. Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices. Behav Res Methods 2024; 57:35. [PMID: 39738817 DOI: 10.3758/s13428-024-02584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2024] [Indexed: 01/02/2025]
Abstract
In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.
Collapse
Affiliation(s)
- Zian Hu
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Zhenglin Zhang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Hai Li
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
- University of Science and Technology of China, Hefei, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China.
| | - Li-Zhuang Yang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
- University of Science and Technology of China, Hefei, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China.
| |
Collapse
|
7
|
Sresuganthi JR, Nallamuthu A, Boominathan P. Comparison of Client-Led Asynchronous and Clinician-Led Synchronous Online Methods for Evaluation of Subjective Vocal Measures in Teachers: A Feasibility Study. J Voice 2024; 38:1522.e1-1522.e10. [PMID: 35641382 DOI: 10.1016/j.jvoice.2022.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 04/21/2022] [Accepted: 04/21/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND COVID-19 has transformed face to face teaching in classrooms to online and hybrid modes. Increased vocal intensity/ pitch to call attention of students and transact in the online class, inappropriate posture (head, neck & upper trunk) while using the laptop and other online tools cause vocal loading leading to voice related concerns in teachers. Tele voice assessment is a feasible alternative means to seek professional help in the current situation and possibly in the future too. Client-led asynchronous and clinician-led synchronous voice recordings for clinical vocal measures among school teachers were compared in this study. METHOD Twenty-five school teachers (21 females & four males) from Chennai consented to the study. Information of voice use, its impact on the day-to-day situations, self-perception of vocal fatigue, and their recorded voice sample (phonation & speaking) were obtained online (asynchronous mode). Within a period of ten days, the clinician-led synchronous session was planned on a mutually convenient time for obtaining voice samples through zoom call. The voice samples obtained were compared for clinical measures and perceptual voice evaluation. RESULTS Participants reported of vocal symptoms and increased vocal fatigue scores. The maximum phonation time values obtained through synchronous mode were lesser when compared to asynchronous mode. Also, variability was noted in the perceptual vocal measures of voice samples obtained through synchronous mode. During synchronous voice recording & evaluation, the background noise, internet stability, audio enhancement feature, and microphone placement & quality could be monitored, and immediate feedback was provided. Additionally, the asynchronous recording can be supplemented for synchronous recording, with clear instructions & demonstration. CONCLUSION This study explored the feasibility of using synchronous and asynchronous voice recording for voice analysis in school teachers. The findings could serve as a base to understand the advantages and challenges of using client-led asynchronous and clinician-led synchronous methods for estimating vocal measures.
Collapse
Affiliation(s)
| | - Aishwarya Nallamuthu
- Department of Speech Language & Hearing Sciences, Sri Ramachandra Institute of Higher Education & Research, Chennai, Tamil Nadu, India.
| | - Prakash Boominathan
- Department of Speech Language & Hearing Sciences, Sri Ramachandra Institute of Higher Education & Research, Chennai, Tamil Nadu, India
| |
Collapse
|
8
|
Song Y, Yun I, Giovanoli S, Easthope CA, Chung Y. A Wearable System for Monitoring Neurological Disorder Events with Multi-Class Classification Model in Daily Life. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039282 DOI: 10.1109/embc53108.2024.10782047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Dysphagia and dysarthria are the prominent sequelae of neurological disorders. Treatment and rehabilitation of these impairments necessitate continuously monitoring symptoms related to swallowing and speaking. However, current medical technologies require large and diverse equipment to record these symptoms, which are predominantly limited to clinical environments. In this study, we propose an innovative wearable system for distinguishing neurological disorder events using a mechano-acoustic (MA) sensor and multi-class ensemble classification model. The MA sensor exhibits a high sensitivity to neck vibration without any interference from ambient sounds. A multi-class classification model was also developed to discern the symptoms from the recorded signals accurately. The proposed classification model is an ensemble neural network trained on waveforms and mel spectrograms. As a result, we achieve a high classification accuracy of 91.94%, surpassing the performance of previous single neural networks.
Collapse
|
9
|
Awan SN, Bahr R, Watts S, Boyer M, Budinsky R, Bensoussan Y. Validity of Acoustic Measures Obtained Using Various Recording Methods Including Smartphones With and Without Headset Microphones. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:1712-1730. [PMID: 38749007 DOI: 10.1044/2024_jslhr-23-00759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
PURPOSE The goal of this study was to assess various recording methods, including combinations of high- versus low-cost microphones, recording interfaces, and smartphones in terms of their ability to produce commonly used time- and spectral-based voice measurements. METHOD Twenty-four vowel samples representing a diversity of voice quality deviations and severities from a wide age range of male and female speakers were played via a head-and-thorax model and recorded using a high-cost, research standard GRAS 40AF (GRAS Sound & Vibration) microphone and amplification system. Additional recordings were made using various combinations of headset microphones (AKG C555 L [AKG Acoustics GmbH], Shure SM35-XLR [Shure Incorporated], AVID AE-36 [AVID Products, Inc.]) and audio interfaces (Focusrite Scarlett 2i2 [Focusrite Audio Engineering Ltd.] and PC, Focusrite and smartphone, smartphone via a TRRS adapter), as well as smartphones direct (Apple iPhone 13 Pro, Google Pixel 6) using their built-in microphones. The effect of background noise from four different room conditions was also evaluated. Vowel samples were analyzed for measures of fundamental frequency, perturbation, cepstral peak prominence, and spectral tilt (low vs. high spectral ratio). RESULTS Results show that a wide variety of recording methods, including smartphones with and without a low-cost headset microphone, can effectively track the wide range of acoustic characteristics in a diverse set of typical and disordered voice samples. Although significant differences in acoustic measures of voice may be observed, the presence of extremely strong correlations (rs > .90) with the recording standard implies a strong linear relationship between the results of different methods that may be used to predict and adjust any observed differences in measurement results. CONCLUSION Because handheld smartphone distance and positioning may be highly variable when used in actual clinical recording situations, smartphone + a low-cost headset microphone is recommended as an affordable recording method that controls mouth-to-microphone distance and positioning and allows both hands to be available for manipulation of the smartphone device.
Collapse
Affiliation(s)
- Shaheen N Awan
- School of Communication Sciences and Disorders, University of Central Florida, Orlando
| | - Ruth Bahr
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Stephanie Watts
- Department of Otolaryngology - Head & Neck Surgery, Morsani College of Medicine, University of South Florida, Tampa
| | - Micah Boyer
- Department of Otolaryngology - Head & Neck Surgery, Morsani College of Medicine, University of South Florida, Tampa
| | - Robert Budinsky
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Yael Bensoussan
- Department of Otolaryngology - Head & Neck Surgery, Morsani College of Medicine, University of South Florida, Tampa
| |
Collapse
|
10
|
Robotti C, Costantini G, Saggio G, Cesarini V, Calastri A, Maiorano E, Piloni D, Perrone T, Sabatini U, Ferretti VV, Cassaniti I, Baldanti F, Gravina A, Sakib A, Alessi E, Pietrantonio F, Pascucci M, Casali D, Zarezadeh Z, Zoppo VD, Pisani A, Benazzo M. Machine Learning-based Voice Assessment for the Detection of Positive and Recovered COVID-19 Patients. J Voice 2024; 38:796.e1-796.e13. [PMID: 34965907 PMCID: PMC8616736 DOI: 10.1016/j.jvoice.2021.11.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 12/12/2022]
Abstract
Many virological tests have been implemented during the Coronavirus Disease 2019 (COVID-19) pandemic for diagnostic purposes, but they appear unsuitable for screening purposes. Furthermore, current screening strategies are not accurate enough to effectively curb the spread of the disease. Therefore, the present study was conducted within a controlled clinical environment to determine eventual detectable variations in the voice of COVID-19 patients, recovered and healthy subjects, and also to determine whether machine learning-based voice assessment (MLVA) can accurately discriminate between them, thus potentially serving as a more effective mass-screening tool. Three different subpopulations were consecutively recruited: positive COVID-19 patients, recovered COVID-19 patients and healthy individuals as controls. Positive patients were recruited within 10 days from nasal swab positivity. Recovery from COVID-19 was established clinically, virologically and radiologically. Healthy individuals reported no COVID-19 symptoms and yielded negative results at serological testing. All study participants provided three trials for multiple vocal tasks (sustained vowel phonation, speech, cough). All recordings were initially divided into three different binary classifications with a feature selection, ranking and cross-validated RBF-SVM pipeline. This brough a mean accuracy of 90.24%, a mean sensitivity of 91.15%, a mean specificity of 89.13% and a mean AUC of 0.94 across all tasks and all comparisons, and outlined the sustained vowel as the most effective vocal task for COVID discrimination. Moreover, a three-way classification was carried out on an external test set comprised of 30 subjects, 10 per class, with a mean accuracy of 80% and an accuracy of 100% for the detection of positive subjects. Within this assessment, recovered individuals proved to be the most difficult class to identify, and all the misclassified subjects were declared positive; this might be related to mid and short-term vocal traces of COVID-19, even after the clinical resolution of the infection. In conclusion, MLVA may accurately discriminate between positive COVID-19 patients, recovered COVID-19 patients and healthy individuals. Further studies should test MLVA among larger populations and asymptomatic positive COVID-19 patients to validate this novel screening technology and test its potential application as a potentially more effective surveillance strategy for COVID-19.
Collapse
Affiliation(s)
- Carlo Robotti
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy.
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Anna Calastri
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Eugenia Maiorano
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Davide Piloni
- Pneumology Unit, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Tiziano Perrone
- Department of Internal Medicine, Fondazione IRCCS Policlinico San Matteo, University of Pavia, Pavia, Italy
| | - Umberto Sabatini
- Department of Internal Medicine, Fondazione IRCCS Policlinico San Matteo, University of Pavia, Pavia, Italy
| | - Virginia Valeria Ferretti
- Clinical Epidemiology and Biometry Unit, Fondazione IRCCS Policlinico San Matteo Foundation, Pavia, Italy
| | - Irene Cassaniti
- Molecular Virology Unit, Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Fausto Baldanti
- Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy; Molecular Virology Unit, Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Andrea Gravina
- Otorhinolaryngology Department, University of Rome Tor Vergata, Rome, Italy
| | - Ahmed Sakib
- Otorhinolaryngology Department, University of Rome Tor Vergata, Rome, Italy
| | - Elena Alessi
- Internal Medicine Unit, Ospedale dei Castelli ASL Roma 6, Ariccia, Italy
| | | | - Matteo Pascucci
- Internal Medicine Unit, Ospedale dei Castelli ASL Roma 6, Ariccia, Italy
| | - Daniele Casali
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Zakarya Zarezadeh
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Vincenzo Del Zoppo
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy; IRCCS Mondino Foundation, Pavia, Italy
| | - Marco Benazzo
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy
| |
Collapse
|
11
|
Busquet F, Efthymiou F, Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices. Behav Res Methods 2024; 56:2114-2134. [PMID: 37253958 PMCID: PMC10228884 DOI: 10.3758/s13428-023-02139-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2023] [Indexed: 06/01/2023]
Abstract
The use of voice recordings in both research and industry practice has increased dramatically in recent years-from diagnosing a COVID-19 infection based on patients' self-recorded voice samples to predicting customer emotions during a service center call. Crowdsourced audio data collection in participants' natural environment using their own recording device has opened up new avenues for researchers and practitioners to conduct research at scale across a broad range of disciplines. The current research examines whether fundamental properties of the human voice are reliably and validly captured through common consumer-grade audio-recording devices in current medical, behavioral science, business, and computer science research. Specifically, this work provides evidence from a tightly controlled laboratory experiment analyzing 1800 voice samples and subsequent simulations that recording devices with high proximity to a speaker (such as a headset or a lavalier microphone) lead to inflated measures of amplitude compared to a benchmark studio-quality microphone while recording devices with lower proximity to a speaker (such as a laptop or a smartphone in front of the speaker) systematically reduce measures of amplitude and can lead to biased measures of the speaker's true fundamental frequency. We further demonstrate through simulation studies that these differences can lead to biased and ultimately invalid conclusions in, for example, an emotion detection task. Finally, we outline a set of recording guidelines to ensure reliable and valid voice recordings and offer initial evidence for a machine-learning approach to bias correction in the case of distorted speech signals.
Collapse
Affiliation(s)
- Francesc Busquet
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland.
| | - Fotis Efthymiou
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland
| | - Christian Hildebrand
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland.
| |
Collapse
|
12
|
Di Cesare MG, Perpetuini D, Cardone D, Merla A. Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:1499. [PMID: 38475034 DOI: 10.3390/s24051499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/22/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024]
Abstract
Parkinson's disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King's College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life.
Collapse
Affiliation(s)
| | - David Perpetuini
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| | - Daniela Cardone
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| | - Arcangelo Merla
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| |
Collapse
|
13
|
Schneider SL, Habich L, Weston ZM, Rosen CA. Observations and Considerations for Implementing Remote Acoustic Voice Recording and Analysis in Clinical Practice. J Voice 2024; 38:69-76. [PMID: 34366193 DOI: 10.1016/j.jvoice.2021.06.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/12/2021] [Accepted: 06/15/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVES/HYPOTHESIS Remote voice recording and acoustic analysis allow for comprehensive voice assessment and outcome tracking without the requirements of travel to the clinic, in-person visit, or expensive equipment. This paper delineates the process and considerations for implementing remote voice recording and acoustic analysis in a high-volume university voice clinic. STUDY DESIGN Clinical Focus. METHODS Acoustic voice recordings were attempted on 108 unique patients over a 6-month period using a remote voice recording phone application. Development of the clinical process including determining normative data in which to compare acoustic results, clinician training, and clinical application is described. The treating Speech Language Pathologists (SLPs) were surveyed 2 months after implementation to assess ease of application, identify challenges and assess implementation of potential solutions. RESULTS Of 108 unique patients, 83 patients were successful in completing the process of synchronous remote acoustic voice recording in conjunction with their SLP clinician. The process of downloading the application, setting up, and obtaining voice recordings was most commonly 10-20 minutes according to the 8 SLPs surveyed. Challenges and helpful techniques were identified. CONCLUSIONS Remote acoustic voice recordings have allowed SLPs to continue to complete a comprehensive voice evaluation in a telepractice model. Given emerging knowledge about the viability of remote voice recordings, the success in obtaining acoustic data remotely, and the accessibility of a low-cost app for SLPs makes remote voice recordings a viable option to facilitate remote clinical care and research investigation.
Collapse
Affiliation(s)
- Sarah L Schneider
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California.
| | - Laura Habich
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California
| | - Zoe M Weston
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California
| | - Clark A Rosen
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California
| |
Collapse
|
14
|
Ceylan ME, Cangi ME, Yılmaz G, Peru BS, Yiğit Ö. Are smartphones and low-cost external microphones comparable for measuring time-domain acoustic parameters? Eur Arch Otorhinolaryngol 2023; 280:5433-5444. [PMID: 37584753 DOI: 10.1007/s00405-023-08179-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/05/2023] [Indexed: 08/17/2023]
Abstract
PURPOSE This study examined and compared the diagnostic accuracy and correlation levels of the acoustic parameters of the audio recordings obtained from smartphones on two operating systems and from dynamic and condenser types of external microphones. METHOD The study included 87 adults: 57 with voice disorder and 30 with a healthy voice. Each participant was asked to perform a sustained vowel phonation (/a/). The recordings were taken simultaneously using five microphones AKG-P220, Shure-SM58, Samson Go Mic, Apple iPhone 6, and Samsung Galaxy J7 Pro microphones in an acoustically insulated cabinet. Acoustic examinations were performed using Praat version 6.2.09. The data were examined using Pearson correlation and receiver-operating characteristic (ROC) analyses. RESULTS The parameters with the highest area under curve (AUC) values among all microphone recordings in the time-domain analyses were the frequency perturbation parameters. Additionally, considering the correlation coefficients obtained by synchronizing the microphones with each other and the AUC values together, the parameter with the highest correlation coefficient and diagnostic accuracy values was the jitter-local parameter. CONCLUSION Period-to-period perturbation parameters obtained from audio recordings made with smartphones show similar levels of diagnostic accuracy to external microphones used in clinical conditions.
Collapse
Affiliation(s)
- M Enes Ceylan
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - M Emrah Cangi
- University of Health Sciences, Speech and Language Therapy, Selimiye, Tıbbiye Cd No: 38, Istanbul, 34668, Üsküdar, Türkiye.
| | - Göksu Yılmaz
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Beyza Sena Peru
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Özgür Yiğit
- Istanbul Şişli Hamidiye Etfal Training and Research Hospital, Istanbul, Türkiye
| |
Collapse
|
15
|
Iyer A, Kemp A, Rahmatallah Y, Pillai L, Glover A, Prior F, Larson-Prior L, Virmani T. A machine learning method to process voice samples for identification of Parkinson's disease. Sci Rep 2023; 13:20615. [PMID: 37996478 PMCID: PMC10667335 DOI: 10.1038/s41598-023-47568-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023] Open
Abstract
Machine learning approaches have been used for the automatic detection of Parkinson's disease with voice recordings being the most used data type due to the simple and non-invasive nature of acquiring such data. Although voice recordings captured via telephone or mobile devices allow much easier and wider access for data collection, current conflicting performance results limit their clinical applicability. This study has two novel contributions. First, we show the reliability of personal telephone-collected voice recordings of the sustained vowel /a/ in natural settings by collecting samples from 50 people with specialist-diagnosed Parkinson's disease and 50 healthy controls and applying machine learning classification with voice features related to phonation. Second, we utilize a novel application of a pre-trained convolutional neural network (Inception V3) with transfer learning to analyze the spectrograms of the sustained vowel from these samples. This approach considers speech intensity estimates across time and frequency scales rather than collapsing measurements across time. We show the superiority of our deep learning model for the task of classifying people with Parkinson's disease as distinct from healthy controls.
Collapse
Affiliation(s)
- Anu Iyer
- Georgia Institute of Technology, Atlanta, 30332, USA
| | - Aaron Kemp
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA.
| | - Yasir Rahmatallah
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Lakshmi Pillai
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Aliyah Glover
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Fred Prior
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Linda Larson-Prior
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurobiology and Developmental Sciences, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Tuhin Virmani
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| |
Collapse
|
16
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Pribuišis K, Ulozienė I, Blažauskas T, Damaševičius R, Maskeliūnas R. Smartphone-Based Voice Wellness Index Application for Dysphonia Screening and Assessment: Development and Reliability. J Voice 2023:S0892-1997(23)00330-2. [PMID: 37980209 DOI: 10.1016/j.jvoice.2023.10.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/12/2023] [Accepted: 10/12/2023] [Indexed: 11/20/2023]
Abstract
OBJECTIVE This study aimed to develop a Voice Wellness Index (VWI) application combining the acoustic voice quality index (AVQI) and glottal function index (GFI) data and to evaluate its reliability in quantitative voice assessment and normal versus pathological voice differentiation. STUDY DESIGN Cross-sectional study. METHODS A total of 135 adult participants (86 patients with voice disorders and 49 patients with normal voices) were included in this study. Five iOS and Android smartphones with the "Voice Wellness Index" app installed were used to estimate VWI. The VWI data obtained using smartphones were compared with VWI measurements computed from voice recordings collected from a reference studio microphone. The diagnostic efficacy of VWI in differentiating between normal and disordered voices was assessed using receiver operating characteristics (ROC). RESULTS With a Cronbach's alpha of 0.972 and an ICC of 0.972 (0.964-0.979), the VWI scores of the individual smartphones demonstrated remarkable inter-smartphone agreement and reliability. The VWI data obtained from different smartphones and a studio microphone showed nearly perfect direct linear correlations (r = 0.993-0.998). Depending on the individual smartphone device used, the cutoff scores of VWI related to differentiating between normal and pathological voice groups were calculated as 5.6-6.0 with the best balance between sensitivity (94.10-95.15%) and specificity (93.68-95.72%), The diagnostic accuracy was excellent in all cases, with an area under the curve (AUC) of 0.970-0.974. CONCLUSION The "Voice Wellness Index" application is an accurate and reliable tool for voice quality measurement and normal versus pathological voice screening and has considerable potential to be used by healthcare professionals and patients for voice assessment.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania.
| | - Ingrida Ulozienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Tomas Blažauskas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| |
Collapse
|
17
|
Friedman L, Lauber M, Behroozmand R, Fogerty D, Kunecki D, Berry-Kravis E, Klusek J. Atypical vocal quality in women with the FMR1 premutation: an indicator of impaired sensorimotor control. Exp Brain Res 2023; 241:1975-1987. [PMID: 37347418 PMCID: PMC10863608 DOI: 10.1007/s00221-023-06653-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/13/2023] [Indexed: 06/23/2023]
Abstract
Women with the FMR1 premutation are susceptible to motor involvement related to atypical cerebellar function, including risk for developing fragile X tremor ataxia syndrome. Vocal quality analyses are sensitive to subtle differences in motor skills but have not yet been applied to the FMR1 premutation. This study examined whether women with the FMR1 premutation demonstrate differences in vocal quality, and whether such differences relate to FMR1 genetic, executive, motor, or health features of the FMR1 premutation. Participants included 35 women with the FMR1 premutation and 45 age-matched women without the FMR1 premutation who served as a comparison group. Three sustained /a/ vowels were analyzed for pitch (mean F0), variability of pitch (standard deviation of F0), and overall vocal quality (jitter, shimmer, and harmonics-to-noise ratio). Executive, motor, and health indices were obtained from direct and self-report measures and genetic samples were analyzed for FMR1 CGG repeat length and activation ratio. Women with the FMR1 premutation had a lower pitch, larger pitch variability, and poorer vocal quality than the comparison group. Working memory was related to harmonics-to-noise ratio and shimmer in women with the FMR1 premutation. Vocal quality abnormalities differentiated women with the FMR1 premutation from the comparison group and were evident even in the absence of other clinically evident motor deficits. This study supports vocal quality analyses as a tool that may prove useful in the detection of early signs of motor involvement in this population.
Collapse
Affiliation(s)
- Laura Friedman
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA
| | - Meagan Lauber
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA
| | - Roozbeh Behroozmand
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA
| | - Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, USA
| | - Dariusz Kunecki
- Department of Pediatrics, Rush University Medical Center, Chicago, USA
| | | | - Jessica Klusek
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA.
| |
Collapse
|
18
|
Vinney LA, Tripp R, Shelly S, Gillespie A. Indexing Cognitive Resource Usage for Acquisition of Initial Voice Therapy Targets. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 32:717-732. [PMID: 36701805 DOI: 10.1044/2022_ajslp-22-00197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
PURPOSE The purpose of this study was to index cognitive resource usage for acquisition of initial targets of two common voice therapy techniques (resonant voice therapy [RVT] and conversation training therapy [CTT]) based on the theorized depletion effect (i.e., when an initial task requiring high cognitive load leads to poorer performance on a subsequent task). METHOD Eleven vocally healthy participants, ages 23-41 years, read aloud the Rainbow Passage and produced consonant-vowel resonant targets (/mi, ma, mu/) followed by a baseline computerized Stroop task and a 15-min washout. Following this baseline period, participants watched and interacted with two videos instructing them in RVT or CTT initial targets. After viewing each video and practicing the associated vocal skills, participants rated the degree of mental effort required to engage in the target vocal technique on a modified Borg scale. Participants recorded their attempts at RVT on /mi, ma, mu/ and CTT on the Rainbow Passage, which were later rated by three voice-specialized speech-language pathologists as to how representative they were of each respective target technique. Changes in fundamental frequency and average auditory-perceptual ratings from baseline were examined to determine if participants adjusted their technique from RVT and CTT baseline to acquisition. RESULTS Performance on the Stroop task was, on average, worse post CTT than post RVT, but both post-CTT and post-RVT Stroop scores were poorer than baseline. These results suggest that both treatment techniques taxed cognitive resources but that CTT was more cognitively taxing than RVT. However, despite differences in raw averages, no statistically significant differences were found between the baseline, post-CTT, and post-RVT Stroop scores, likely due to the small sample size. Participant ratings of mental effort for CTT and RVT were statistically similar. Likewise, poorer post-RVT Stroop scores were associated with participants' greater perceived mental effort with RVT acquisition, but there was no significant association between mental effort ratings for CTT acquisition and post-CTT Stroop scores. Significantly higher fundamental frequency and perceived ratings of the accuracy of technique from baseline to acquisition for both CTT and RVT were found, providing evidence of vocal behavior changes as a result of each technique. CONCLUSIONS Brief exposure to initial treatment tasks in CTT is more cognitively depleting than initial RVT tasks. Results also indicate that vocally healthy participants are able to make a voice change in response to a brief therapy prompt. Finally, participant-rated measures of mental effort and secondary measures of cognitive depletion do not always correlate.
Collapse
Affiliation(s)
| | - Raquel Tripp
- Department of Communicative Sciences and Disorders, New York University, NY
| | - Sandeep Shelly
- Emory Voice Center, Department of Otolarynngology-Head and Neck Surgery, Emory University, Atlanta, GA
| | - Amanda Gillespie
- Emory Voice Center, Department of Otolarynngology-Head and Neck Surgery, Emory University, Atlanta, GA
| |
Collapse
|
19
|
Cavalcanti JC, Englert M, Oliveira M, Constantini AC. Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study. J Voice 2023; 37:162-172. [PMID: 33451892 DOI: 10.1016/j.jvoice.2020.12.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/01/2020] [Accepted: 12/03/2020] [Indexed: 11/29/2022]
Abstract
OBJECTIVE This study aimed to analyze the effects of microphone and audio compression variables on voice and speech parameters acquisition. METHOD Acoustic measures were recorded and compared using a high-quality reference microphone and three testing microphones. The tested microphones displayed differences in specifications and acoustic properties. Furthermore, the impact of the audio compression was assessed by resampling the original uncompressed audio files into the MPEG-1/2 Audio Layer 3 (mp3) format at three different compression rates (128 kbps, 64 kbps, 32 kbps). Eight speakers were recruited in each recording session and asked to produce four sustained vowels: two [a] segments and two [ɛ] segments. The audio was captured simultaneously by the reference and tested microphones. The recordings were synchronized and analyzed using the Praat software. RESULTS From a set of eight acoustic parameters assessed (f0, F1, F2, jitter%, shimmer%, HNR, H1-H2, and CPP), three (f0, F2, and jitter%) were suggested as resistant regarding the microphone and audio compression variables. In contrast, some parameters seemed to be significantly affected by both factors: HNR, H1-H2, and CPP; while shimmer% was found sensitive only concerning the latter factor. Moreover, higher compression rates appeared to yield more frequent acoustic distortions than lower rates. CONCLUSION Overall, the outcomes suggest that acoustic parameters are influenced by both the microphone selection and the audio compression usage, which may reflect the practical implications of these components on the acoustic analysis reliability.
Collapse
Affiliation(s)
- Julio Cesar Cavalcanti
- Universidade Estadual de Campinas (UNICAMP), Institute of Language Studies, Campinas - SP, Brazil.
| | - Marina Englert
- Universidade Federal de São Paulo (UNIFESP), Department of Communication Disorders, São Paulo - SP, Brazil; Centro de Estudos da Voz (CEV), São Paulo - SP, Brazil
| | - Miguel Oliveira
- Universidade Federal de Alagoas (UFAL), Department of Letters, Maceió - AL, Brazil
| | | |
Collapse
|
20
|
An iOS-based VoiceScreen application: feasibility for use in clinical settings-a pilot study. Eur Arch Otorhinolaryngol 2023; 280:277-284. [PMID: 35906420 PMCID: PMC9811036 DOI: 10.1007/s00405-022-07546-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/06/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVES To elaborate the application suitable for smartphones for estimation of Acoustic Voice Quality Index (AVQI) and evaluate its usability in the clinical setting. METHODS An elaborated AVQI automatization and background noise monitoring functions were implemented into a mobile "VoiceScreen" application running the iOS operating system. A study group consisted of 103 adult individuals with normal voices (n = 30) and 73 patients with pathological voices. Voice recordings were performed in the clinical setting with "VoiceScreen" app using iPhone 8 microphones. Voices of 30 patients were recorded before and 1 month after phonosurgical intervention. To evaluate the diagnostic accuracy differentiating normal and pathological voice, the receiver-operating characteristic statistics, i.e., area under the curve (AUC), sensitivity and specificity, and correct classification rate (CCR) were used. RESULTS A high level of precision of AVQI in discriminating between normal and dysphonic voices was yielded with corresponding AUC = 0.937. The AVQI cutoff score of 3.4 demonstrated a sensitivity of 86.3% and specificity of 95.6% with a CCR of 89.2%. The preoperative mean value of the AVQI [6.01(SD 2.39)] in the post-phonosurgical follow-up group decreased to 2.00 (SD 1.08). No statistically significant differences (p = 0.216) between AVQI measurements in a normal voice and 1-month follow-up after phonosurgery groups were revealed. CONCLUSIONS The "VoiceScreen" app represents an accurate and robust tool for voice quality measurement and demonstrates the potential to be used in clinical settings as a sensitive measure of voice changes across phonosurgical treatment outcomes.
Collapse
|
21
|
Schultz BG, Vogel AP. A Tutorial Review on Clinical Acoustic Markers in Speech Science. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3239-3263. [PMID: 36044888 DOI: 10.1044/2022_jslhr-21-00647] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE The human voice changes with the progression of neurological disease and the onset of diseases that affect articulators, often decreasing the effectiveness of communication. These changes can be objectively measured using signal processing techniques that extract acoustic features. When measuring acoustic features, there are often several steps and assumptions that might be known to experts in acoustics and phonetics, but are less transparent for other disciplines (e.g., clinical medicine, speech pathology, engineering, and data science). This tutorial describes these signal processing techniques, explicitly outlines the underlying steps for accurate measurement, and discusses the implications of clinical acoustic markers. CONCLUSIONS We establish a vocabulary using straightforward terms, provide visualizations to achieve common ground, and guide understanding for those outside the domains of acoustics and auditory signal processing. Where possible, we highlight the best practices for measuring clinical acoustic markers and suggest resources for obtaining and further understanding these measures.
Collapse
Affiliation(s)
- Benjamin Glenn Schultz
- Centre for Neuroscience of Speech, The University of Melbourne, Victoria, Australia
- Department of Audiology and Speech Pathology, The University of Melbourne, Victoria, Australia
| | - Adam P Vogel
- Centre for Neuroscience of Speech, The University of Melbourne, Victoria, Australia
- Department of Audiology and Speech Pathology, The University of Melbourne, Victoria, Australia
- Redenlab, Melbourne, Victoria, Australia
| |
Collapse
|
22
|
Lokwani P, Prabhu P, Nisha KV. Profiles and predictors of onset based differences in vocal characteristics of adults with auditory neuropathy spectrum disorder (ANSD). J Otol 2022; 17:218-225. [PMID: 36249919 PMCID: PMC9547112 DOI: 10.1016/j.joto.2022.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/14/2022] [Accepted: 08/08/2022] [Indexed: 10/25/2022] Open
|
23
|
Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures. Knowl Based Syst 2022; 253:109539. [PMID: 35915642 PMCID: PMC9328841 DOI: 10.1016/j.knosys.2022.109539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 06/18/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022]
Abstract
Alongside the currently used nasal swab testing, the COVID-19 pandemic situation would gain noticeable advantages from low-cost tests that are available at any-time, anywhere, at a large-scale, and with real time answers. A novel approach for COVID-19 assessment is adopted here, discriminating negative subjects versus positive or recovered subjects. The scope is to identify potential discriminating features, highlight mid and short-term effects of COVID on the voice and compare two custom algorithms. A pool of 310 subjects took part in the study; recordings were collected in a low-noise, controlled setting employing three different vocal tasks. Binary classifications followed, using two different custom algorithms. The first was based on the coupling of boosting and bagging, with an AdaBoost classifier using Random Forest learners. A feature selection process was employed for the training, identifying a subset of features acting as clinically relevant biomarkers. The other approach was centered on two custom CNN architectures applied to mel-Spectrograms, with a custom knowledge-based data augmentation. Performances, evaluated on an independent test set, were comparable: Adaboost and CNN differentiated COVID-19 positive from negative with accuracies of 100% and 95% respectively, and recovered from negative individuals with accuracies of 86.1% and 75% respectively. This study highlights the possibility to identify COVID-19 positive subjects, foreseeing a tool for on-site screening, while also considering recovered subjects and the effects of COVID-19 on the voice. The two proposed novel architectures allow for the identification of biomarkers and demonstrate the ongoing relevance of traditional ML versus deep learning in speech analysis.
Collapse
|
24
|
Maskeliūnas R, Kulikajevas A, Damaševičius R, Pribuišis K, Ulozaitė-Stanienė N, Uloza V. Lightweight Deep Learning Model for Assessment of Substitution Voicing and Speech after Laryngeal Carcinoma Surgery. Cancers (Basel) 2022; 14:cancers14102366. [PMID: 35625971 PMCID: PMC9139213 DOI: 10.3390/cancers14102366] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/03/2022] [Accepted: 05/04/2022] [Indexed: 11/16/2022] Open
Abstract
Laryngeal carcinoma is the most common malignant tumor of the upper respiratory tract. Total laryngectomy provides complete and permanent detachment of the upper and lower airways that causes the loss of voice, leading to a patient's inability to verbally communicate in the postoperative period. This paper aims to exploit modern areas of deep learning research to objectively classify, extract and measure the substitution voicing after laryngeal oncosurgery from the audio signal. We propose using well-known convolutional neural networks (CNNs) applied for image classification for the analysis of voice audio signal. Our approach takes an input of Mel-frequency spectrogram (MFCC) as an input of deep neural network architecture. A database of digital speech recordings of 367 male subjects (279 normal speech samples and 88 pathological speech samples) was used. Our approach has shown the best true-positive rate of any of the compared state-of-the-art approaches, achieving an overall accuracy of 89.47%.
Collapse
Affiliation(s)
- Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania; (R.M.); (A.K.)
| | - Audrius Kulikajevas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania; (R.M.); (A.K.)
| | - Robertas Damaševičius
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania; (R.M.); (A.K.)
- Correspondence:
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (K.P.); (N.U.-S.); (V.U.)
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (K.P.); (N.U.-S.); (V.U.)
| | - Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania; (K.P.); (N.U.-S.); (V.U.)
| |
Collapse
|
25
|
Awan SN, Shaikh MA, Desjardins M, Feinstein H, Abbott KV. The Effect of Microphone Frequency Response on Spectral and Cepstral Measures of Voice: An Examination of Low-Cost Electret Headset Microphones. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2022; 31:959-973. [PMID: 35050724 PMCID: PMC9150670 DOI: 10.1044/2021_ajslp-21-00156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/12/2021] [Accepted: 10/11/2021] [Indexed: 06/14/2023]
Abstract
PURPOSE The purpose of this study was to establish the frequency response of a selection of low-cost headset microphones that could be given to subjects for remote voice recordings and to examine the effect of microphone type and frequency response on key acoustic measures related to voice quality obtained from speech and vowel samples. METHOD The frequency responses of three low-cost headset microphones were evaluated using pink noise generated via a head-and-torso model. Each of the headset microphones was then used to record a series of speech and vowel samples prerecorded from 24 speakers who represented a diversity of sex, age, fundamental frequency (F o), and voice quality types. Recordings were later analyzed for the following measures: smoothed cepstral peak prominence (CPP; dB), low versus high spectral ratio (L/H ratio; dB), CPP F o (Hz), and cepstral spectral index of dysphonia (CSID). RESULTS The frequency response of the microphones under test was observed to have nonsignificant effects on measures of the CPP and CPP F o, significant effects on the CSID in speech contexts, and strong and significant effects on the measure of spectral tilt (L/H ratio). However, the correlations between the various headset microphones and a reference precision microphone were excellent (rs > .90). CONCLUSIONS The headset microphones under test all showed the capability to track a wide range of diversity in the voice signal. Though the use of higher quality microphones that have demonstrated specifications is recommended for typical research and clinical purposes, low-cost electret microphones may be used to provide valid measures of voice, specifically when the same microphone and signal chain is used for the evaluation of pre- versus posttreatment change or intergroup comparisons.
Collapse
Affiliation(s)
- Shaheen N. Awan
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Mohsin A. Shaikh
- Department of Communication Sciences and Disorders, Bloomsburg University of Pennsylvania
| | - Maude Desjardins
- Department of Communication Sciences & Disorders, University of Delaware, Newark
| | - Hagar Feinstein
- Department of Communication Sciences & Disorders, University of Delaware, Newark
| | | |
Collapse
|
26
|
Marsano-Cornejo MJ, Roco-Videla Á. Comparison of the Acoustic Parameters Obtained With Different Smartphones and a Professional Microphone. ACTA OTORRINOLARINGOLOGICA ESPANOLA 2022; 73:51-55. [DOI: 10.1016/j.otoeng.2020.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 08/13/2020] [Indexed: 10/19/2022]
|
27
|
Castillo-Allendes A, Contreras-Ruston F, Cantor L, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Terapia de voz en el contexto de la pandemia covid-19; recomendaciones para la práctica clínica. J Voice 2021; 35:808.e1-808.e12. [PMID: 32917457 PMCID: PMC7442931 DOI: 10.1016/j.jvoice.2020.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Since the beginning of the new pandemic, COVID-19 health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from 5 different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULTS The clinical guide provides 79 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendations; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Francisco Contreras-Ruston
- Speech-Language Pathology and Audiology Department, Universidad de Valparaíso, San Felipe, Chile,Address correspondence and reprint requests to Francisco Contreras-Ruston, CEV–Centro de Estudos da Voz, Rua Machado Bittencourt, 361, SP 04044-001, Brazil
| | - Lady Cantor
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia,Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan,Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México,Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina,Servicio de Fonoudiología, Hospital de Clínicas “José de San Martin”, Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
28
|
Castillo-Allendes A, Contreras-Ruston F, Cantor L, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Terapia Vocal No Contexto Da Pandemia Do Covid-19; Orientações Para A Prática Clínica. J Voice 2021; 35:808.e13-808.e24. [PMID: 32917460 PMCID: PMC7439998 DOI: 10.1016/j.jvoice.2020.08.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Since the beginning of the new pandemic, Corona Virus Disease 2019 (COVID-19) health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from five different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULT The clinical guide provides 79 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendation; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Francisco Contreras-Ruston
- Speech-Language Pathology and Audiology Department, Universidad de Valparaíso, San Felipe, Chile,Address correspondence and reprint requests to Francisco Contreras-Ruston, CEV–Centro de Estudos da Voz, Rua Machado Bittencourt, 361, SP 04044-001, Brazil
| | - Lady Cantor
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia,Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan,Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México,Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina,Servicio de Fonoudiología, Hospital de Clínicas “José de San Martin,” Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
29
|
Castillo-Allendes A, Contreras-Ruston F, Cantor-Cutiva LC, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Voice Therapy in the Context of the COVID-19 Pandemic: Guidelines for Clinical Practice. J Voice 2021; 35:717-727. [PMID: 32878736 PMCID: PMC7413113 DOI: 10.1016/j.jvoice.2020.08.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/30/2020] [Accepted: 08/03/2020] [Indexed: 01/14/2023]
Abstract
INTRODUCTION Since the beginning of the new pandemic, COVID-19 health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from five different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULTS The clinical guide provides 65 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendation; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | | | - Lady Catherine Cantor-Cutiva
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia; Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan; Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México; Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil; Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina; Servicio de Fonoudiología, Hospital de Clínicas "José de San Martin", Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil; Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
30
|
Thomas JA, Burkhardt HA, Chaudhry S, Ngo AD, Sharma S, Zhang L, Au R, Hosseini Ghomi R. Assessing the Utility of Language and Voice Biomarkers to Predict Cognitive Impairment in the Framingham Heart Study Cognitive Aging Cohort Data. J Alzheimers Dis 2021; 76:905-922. [PMID: 32568190 DOI: 10.3233/jad-190783] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND There is a need for fast, accessible, low-cost, and accurate diagnostic methods for early detection of cognitive decline. Dementia diagnoses are usually made years after symptom onset, missing a window of opportunity for early intervention. OBJECTIVE To evaluate the use of recorded voice features as proxies for cognitive function by using neuropsychological test measures and existing dementia diagnoses. METHODS This study analyzed 170 audio recordings, transcripts, and paired neuropsychological test results from 135 participants selected from the Framingham Heart Study (FHS), which includes 97 recordings of cognitively normal participants and 73 recordings of cognitively impaired participants. Acoustic and linguistic features of the voice samples were correlated with cognitive performance measures to verify their association. RESULTS Language and voice features, when combined with demographic variables, performed with an AUC of 0.942 (95% CI 0.929-0.983) in predicting cognitive status. Features with good predictive power included the acoustic features mean spectral slope in the 500-1500 Hz band, variation in the F2 bandwidth, and variation in the Mel-Frequency Cepstral Coefficient (MFCC) 1; the demographic features employment, education, and age; and the text features of number of words, number of compound words, number of unique nouns, and number of proper names. CONCLUSION Several linguistic and acoustic biomarkers show correlations and predictive power with regard to neuropsychological testing results and cognitive impairment diagnoses, including dementia. This initial study paves the way for a follow-up comprehensive study incorporating the entire FHS cohort.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Rhoda Au
- Boston University, Boston, MA, USA
| | | |
Collapse
|
31
|
Zhang C, Jepson K, Lohfink G, Arvaniti A. Comparing acoustic analyses of speech data collected remotely. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3910. [PMID: 34241427 PMCID: PMC8269758 DOI: 10.1121/10.0005132] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 05/11/2021] [Accepted: 05/12/2021] [Indexed: 06/01/2023]
Abstract
Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.
Collapse
Affiliation(s)
- Cong Zhang
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| | - Kathleen Jepson
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| | - Georg Lohfink
- School of European Culture and Languages, University of Kent, Canterbury, Kent, CT2 7NF, United Kingdom
| | - Amalia Arvaniti
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| |
Collapse
|
32
|
Abstract
The pandemic caused by the new coronavirus (SARS-COV-2) has led to more than two million deaths in the world by March 2021. The worldwide call to reduce transmission is enormous. Recently, there has been a rapid growth of telemedicine and the use of mobile health (mHealth) in the context of the COVID-19 pandemic. Smartphone accessories such as a flashlight, camera, microphone, and microprocessor can measure different clinical parameters such as oxygen saturation, blood pressure, heart rate, breathing rate, fever, pulmonary auscultation, and even voice analysis. All these parameters are of great clinical importance when evaluating suspected patients of COVID-19 or monitoring infected patients admitted in various hospitals or in-home isolation. In remote medical care, the results of these parameters can be sent to a call center or a health unit for interpretation by a qualified health professional. Thus, the patient can receive orientations or be immediately referred for in-patient care. The application of machine learning and other artificial intelligence strategies assume a central role in signal processing and are gaining much space in the medical field. In this work, we present different approaches for evaluating clinical parameters that are valuable in the case of COVID-19 and we hope that soon all these parameters can be measured by a single smartphone application, facilitating remote clinical assessments.
Collapse
|
33
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Kregždytė R. Accuracy of Acoustic Voice Quality Index Captured With a Smartphone - Measurements With Added Ambient Noise. J Voice 2021; 37:465.e19-465.e26. [PMID: 33676807 DOI: 10.1016/j.jvoice.2021.01.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To evaluate the accuracy of Acoustic Voice Quality Index (AVQI) measures obtained from voice recordings simultaneously using oral and smartphone microphones in a sound-proof room, and to compare them with AVQIs obtained from the same smartphone voice recordings with added ambient noise. METHODS A study group of 183 subjects with normal voices (n = 86) and various voice disorders (n = 97) was asked to read aloud a standard text and sustain the vowel /a/. The controlled ambient noise averaged at 29.61 dB SPL was added digitally to the smartphone voice recordings. Repeated measures analysis of variances (ANOVA) with Greenhouse-Geiser correction was used to evaluate AVQI changes within subjects. To evaluate the level of agreement between AVQI measurements obtained from different voice recordings Bland-Altman plots were used. RESULTS Repeated measures ANOVA showed that differences among AVQI results obtained from voice recordings done with oral studio microphone, recordings done with a smartphone microphone, and recordings done with a smartphone microphone with added ambient noise were not statistically significant (P = 0.07). No significant systemic differences and acceptable level of random errors in AVQI measurements of voice recordings made with oral and smartphone microphones (including added noise) were revealed. CONCLUSION The AVQI measures obtained from smartphone microphones voice recordings with experimentally added ambient noise revealed an acceptable agreement with results of oral microphone recordings, thus suggesting the suitability of smartphone microphone recordings performed even in the presence of acceptable ambient noise for estimation of AVQI.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania.
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Rima Kregždytė
- Department of Preventive Medicine, Lithuanian University of Health Sciences, Kaunas, Lithuania
| |
Collapse
|
34
|
Acoustic analysis of vowels in patients with sleep apnea syndrome in sitting and supine positions. Sleep Breath 2021; 25:2107-2108. [PMID: 33483907 DOI: 10.1007/s11325-021-02304-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Revised: 11/15/2020] [Accepted: 01/18/2021] [Indexed: 10/22/2022]
|
35
|
Marsano-Cornejo MJ, Roco-Videla Á. Comparison of the acoustic parameters obtained with different smartphones and a professional microphone. ACTA OTORRINOLARINGOLOGICA ESPANOLA 2021; 73:S0001-6519(20)30169-2. [PMID: 33413843 DOI: 10.1016/j.otorri.2020.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/11/2020] [Accepted: 08/13/2020] [Indexed: 11/30/2022]
Abstract
Smartphones allow good quality recordings; however, it cannot be claimed that the acoustic parameters obtained from them are comparable to those obtained with a professional microphone. The objective of this study is to establish whether there are significant differences when comparing the values of six acoustic parameters obtained from recordings using four smartphones and a professional microphone. The Praat programme was used to obtain the acoustic parameters: f0, Jitter, Shimmer, HNR, Alpha Ratio and L1 - L0 of the recording of a sustained vowel /a/ using iPhone SE, iPhone 6, Samsung S8, Huawei Y7 and the Behringer ECM8000 microphone. The sample was made up of 26 men and 26 women, from 18 to 26 years old without declared voice pathology. The repeated sample ANOVA test was used to compare the values. All the equipment show reproducibility between consecutive repeated measurements. The parameters f0 and Jitter were the only ones that did not show significant differences between the smartphones and the professional microphone. None of the smartphones studied can replace the professional microphone in voice recording for the evaluation of the six parameters analysed, except for f0 and Jitter.
Collapse
Affiliation(s)
- María-José Marsano-Cornejo
- Facultad de Ciencias de la Salud, Universidad Autónoma de Chile, Santiago, Chile; Facultad de Ciencias de la Salud, Universidad de Las Américas, Santiago, Chile.
| | - Ángel Roco-Videla
- Facultad de Salud, programa de Magister en Ciencias Químico-Biológicas Universidad Bernardo O'Higgins, Chile; Facultad de ingeniería, Departamento de ingeniería Civil, Universidad Católica de la Santísima Concepción, Concepción, Chile
| |
Collapse
|
36
|
van der Woerd B, Wu M, Parsa V, Doyle PC, Fung K. Evaluation of Acoustic Analyses of Voice in Nonoptimized Conditions. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3991-3999. [PMID: 33186510 DOI: 10.1044/2020_jslhr-20-00212] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone-audio booth, Blue Yeti-audio booth, iPhone-office, and Blue Yeti-office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency (fo), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic (n = 10) and normal (n = 10), male (n = 5) and female (n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male (n = 12) and female (n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.
Collapse
Affiliation(s)
- Benjamin van der Woerd
- Department of Otolaryngology-Head and Neck Surgery, Western University, London, Ontario, Canada
| | - Min Wu
- School of Communication Sciences and Disorders, Western University, London, Ontario, Canada
| | - Vijay Parsa
- School of Communication Sciences and Disorders, Western University, London, Ontario, Canada
- Department of Electrical and Computer Engineering, Western University, London, Ontario, Canada
| | - Philip C Doyle
- Department of Otolaryngology-Head and Neck Surgery, Western University, London, Ontario, Canada
- School of Communication Sciences and Disorders, Western University, London, Ontario, Canada
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, CA
| | - Kevin Fung
- Department of Otolaryngology-Head and Neck Surgery, Western University, London, Ontario, Canada
| |
Collapse
|
37
|
Robin J, Harrison JE, Kaufman LD, Rudzicz F, Simpson W, Yancheva M. Evaluation of Speech-Based Digital Biomarkers: Review and Recommendations. Digit Biomark 2020; 4:99-108. [PMID: 33251474 DOI: 10.1159/000510820] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 08/11/2020] [Indexed: 12/23/2022] Open
Abstract
Speech represents a promising novel biomarker by providing a window into brain health, as shown by its disruption in various neurological and psychiatric diseases. As with many novel digital biomarkers, however, rigorous evaluation is currently lacking and is required for these measures to be used effectively and safely. This paper outlines and provides examples from the literature of evaluation steps for speech-based digital biomarkers, based on the recent V3 framework (Goldsack et al., 2020). The V3 framework describes 3 components of evaluation for digital biomarkers: verification, analytical validation, and clinical validation. Verification includes assessing the quality of speech recordings and comparing the effects of hardware and recording conditions on the integrity of the recordings. Analytical validation includes checking the accuracy and reliability of data processing and computed measures, including understanding test-retest reliability, demographic variability, and comparing measures to reference standards. Clinical validity involves verifying the correspondence of a measure to clinical outcomes which can include diagnosis, disease progression, or response to treatment. For each of these sections, we provide recommendations for the types of evaluation necessary for speech-based biomarkers and review published examples. The examples in this paper focus on speech-based biomarkers, but they can be used as a template for digital biomarker development more generally.
Collapse
Affiliation(s)
| | - John E Harrison
- Metis Cognition Ltd., Park House, Kilmington Common, Warminster, United Kingdom.,Alzheimer Center, AUmc, Amsterdam, The Netherlands.,Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | | | - Frank Rudzicz
- Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - William Simpson
- Winterlight Labs, Toronto, Ontario, Canada.,Department of Psychiatry and Behavioural Neuroscience, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
38
|
Saggio G, Costantini G. Worldwide Healthy Adult Voice Baseline Parameters: A Comprehensive Review. J Voice 2020; 36:637-649. [PMID: 33039203 DOI: 10.1016/j.jvoice.2020.08.028] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/20/2020] [Accepted: 08/21/2020] [Indexed: 12/17/2022]
Abstract
The voice results in acoustic signals analyzed and synthetized at first for telecommunication matters, and more recently investigated for medical purposes. In particular, voice signal characteristics can evidence individual health conditions useful for screening, diagnostic and remote monitoring aims. Within this frame, the knowledge of baseline features of healthy voice is mandatory, in order to balance a comparison with their unhealthy counterpart. However, the baseline features of the human voice depend on gender, age-range and ethnicity and, as far as we know, no work reports as those features spread worldwide. This paper intends to cover this lack. Our database research yielded 179 relevant published studies, retrieved using digital libraries of IEEE Xplore, Scopus, Web of Science, Iop Science, Taylor and Francis Online, and Scitepress. These relevant studies report different features, among which here we consider the most investigated ones, within the most investigated age-range. In particular, the features are the fundamental frequency, the jitter, the shimmer, the harmonic-to-noise ratio, and the cepstral peak prominence, the most investigated age-range is within 20-40 years and, related to the ethnicity, 20 countries are considered.
Collapse
Affiliation(s)
- Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
39
|
Illner V, Sovka P, Rusz J. Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101831] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
40
|
Psaltos D, Chappie K, Karahanoglu FI, Chasse R, Demanuele C, Kelekar A, Zhang H, Marquez V, Kangarloo T, Patel S, Czech M, Caouette D, Cai X. Multimodal Wearable Sensors to Measure Gait and Voice. Digit Biomark 2019; 3:133-144. [PMID: 32095772 DOI: 10.1159/000503282] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 09/10/2019] [Indexed: 12/31/2022] Open
Abstract
Background Traditional measurement systems utilized in clinical trials are limited because they are episodic and thus cannot capture the day-to-day fluctuations and longitudinal changes that frequently affect patients across different therapeutic areas. Objectives The aim of this study was to collect and evaluate data from multiple devices, including wearable sensors, and compare them to standard lab-based instruments across multiple domains of daily tasks. Methods Healthy volunteers aged 18-65 years were recruited for a 1-h study to collect and assess data from wearable sensors. They performed walking tasks on a gait mat while instrumented with a watch, phone, and sensor insoles as well as several speech tasks on multiple recording devices. Results Step count and temporal gait metrics derived from a single lumbar accelerometer are highly precise; spatial gait metrics are consistently 20% shorter than gait mat measurements. The insole's algorithm only captures about 72% of steps but does have precision in measuring temporal gait metrics. Mobile device voice recordings provide similar results to traditional recorders for average signal pitch and sufficient signal-to-noise ratio for analysis when hand-held. Lossless compression techniques are advised for signal processing. Conclusions Gait metrics from a single lumbar accelerometer sensor are in reasonable concordance with standard measurements, with some variation between devices and across individual metrics. Finally, participants in this study were familiar with mobile devices and had high acceptance of potential future continuous wear for clinical trials.
Collapse
Affiliation(s)
- Dimitrios Psaltos
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Kara Chappie
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | | | - Rachel Chasse
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | | | - Amey Kelekar
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Hao Zhang
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Vanessa Marquez
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Tairmae Kangarloo
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Shyamal Patel
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Matthew Czech
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - David Caouette
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA
| | - Xuemei Cai
- Early Clinical Development, Pfizer Inc., Cambridge, Massachusetts, USA.,Tufts Medical Center, Boston, Massachusetts, USA
| |
Collapse
|
41
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
42
|
Ulozaite-Staniene N, Petrauskas T, Šaferis V, Uloza V. Exploring the feasibility of the combination of acoustic voice quality index and glottal function index for voice pathology screening. Eur Arch Otorhinolaryngol 2019; 276:1737-1745. [DOI: 10.1007/s00405-019-05433-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 04/12/2019] [Indexed: 11/25/2022]
|
43
|
Jannetts S, Schaeffler F, Beck J, Cowen S. Assessing voice health using smartphones: bias and random error of acoustic voice parameters captured by different smartphone types. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2019; 54:292-305. [PMID: 30779425 DOI: 10.1111/1460-6984.12457] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 10/31/2018] [Accepted: 01/16/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND Occupational voice problems constitute a serious public health issue with substantial financial and human consequences for society. Modern mobile technologies such as smartphones have the potential to enhance approaches to prevention and management of voice problems. This paper addresses an important aspect of smartphone-assisted voice care: the reliability of smartphone-based acoustic analysis for voice health state monitoring. AIM To assess the reliability of acoustic parameter extraction for a range of commonly used smartphones by comparison with studio recording equipment. METHODS & PROCEDURES Twenty-two vocally healthy speakers (12 female, 10 male) were recorded producing sustained vowels and connected speech under studio conditions using a high-quality studio microphone and an array of smartphones. For both types of utterance, Bland-Altman analysis was used to assess overall reliability for mean F0, cepstral peak prominence (CPPS), Jitter (RAP) and Shimmer %. OUTCOMES & RESULTS Analysis of the systematic and random error indicated significant bias for CPPS across both sustained vowels and passage reading. Analysis of the random error of the devices indicated that that mean F0 and CPPS showed acceptable random error size, while jitter and shimmer random error was judged as problematic. CONCLUSIONS & IMPLICATIONS Confidence in the feasibility of smartphone-based voice assessment is increased by the experimental finding of high levels of reliability for some clinically relevant acoustic parameters, while the use of other parameters is discouraged. We also challenge the practice of using statistical tests (e.g., t-tests) for measurement reliability assessment.
Collapse
Affiliation(s)
| | | | - Janet Beck
- CASL Research Centre, Queen Margaret University, Edinburgh, UK
| | - Steve Cowen
- CASL Research Centre, Queen Margaret University, Edinburgh, UK
| |
Collapse
|
44
|
Jaber B, Remman R, Matar N. Repetitive Voice Evaluation in Dysphonic Teachers: Office Versus Home. J Voice 2019; 34:675-681. [PMID: 30765321 DOI: 10.1016/j.jvoice.2019.01.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 01/24/2019] [Accepted: 01/25/2019] [Indexed: 10/27/2022]
Abstract
INTRODUCTION A patient's voice can vary from one moment to another, and these variations cannot be captured by a one-time assessment. Multiple assessments may give a more holistic idea of the severity of the patient's dysphonia and by asking the patient to do the recordings he becomes involved in his therapeutic plan from the beginning. AIM This study aims to evaluate the added value of a repetitive assessment outside the speech therapist's (SLP) clinic, to have a broader vision of the voice disorder and identify parameters that change after working hours to be able to explain this disorder and find solutions for it. METHODOLOGY Twelve dysphonic Lebanese teachers, aged between 20 and 60 years, recorded their voices once at the SLP's office, and five other times at home every day after working hours. The recordings included a standardized text and a sustained /ɑ/. For perceptive evaluation of voice quality, six SLPs (three experts and three naïve) analyzed the recordings using the GRBAS scale. For self-assessment, patients filled two self-assessment grids at the office: (SSVS: subjective assessment for vocal overwork) and the Lebanese Voice Handicap Index (VHI-10lb) questionnaire. They responded orally to a third scale ranging from 0 to 100 assessing the severity of dysphonia every day after completing the repetitive home recordings. For objective evaluation of the acoustic parameters, PRAAT software was used. RESULTS Results reveal significant difference between the scores of the voices recorded in the office compared to the home repetitive assessment for the G and R of the perceptual evaluation with P < 0.01, as well as for the Jitter, the fundamental frequency, and the harmonic-to-noise ratio with P < 0.05. The recordings made at home revealed a more severe dysphonia. The self-evaluation scales 1 and 2 (VHI-10Ib, SSVS) did not correlate with the results of the objective and perceptual analysis, whereas the results of the oral self-assessment 3 seem to be in agreement with the results of Jitter (P < 0.05), and Grade of dysphonia (P < 0.05). CONCLUSIONS In teachers, the severity of dysphonia is more pronounced when the voice is recorded after working hours. Daily self-evaluation allows the patient to be more aware of his vocal disorder and voice fluctuations and might improve participation and compliance with therapy. It may also be used to monitor the response to speech therapy.
Collapse
Affiliation(s)
- Batoul Jaber
- Higher Institute of Speech-Language Pathology, Saint-Joseph University, Beirut, Lebanon.
| | - Reina Remman
- Higher Institute of Speech-Language Pathology, Saint-Joseph University, Beirut, Lebanon
| | - Nayla Matar
- Department of Otolaryngology Head and Neck Surgery, Saint Joseph University, Faculty of Medicine, Hotel Dieu Hospital of France, Beirut, Lebanon
| |
Collapse
|
45
|
Munnings AJ. The Current State and Future Possibilities of Mobile Phone "Voice Analyser" Applications, in Relation to Otorhinolaryngology. J Voice 2019; 34:527-532. [PMID: 30655018 DOI: 10.1016/j.jvoice.2018.12.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 12/21/2018] [Accepted: 12/26/2018] [Indexed: 10/27/2022]
Abstract
BACKGROUND A large proportion of the population suffers from voice disorders. The use of mobile phone technology in healthcare is increasing, and this includes applications that can analyze voice. OBJECTIVE This study aimed to review the potential for voice analyzer applications to aid the management of voice disorders. METHODS A literature search was conducted yielding eight studies which were further analyzed. RESULTS Seven out of the eight studies concluded that smartphone assessments were comparable to current techniques. Nevertheless there remained some common issues with using applications such as; voice parameters used; voice pathology tested; smartphone software consistency and microphone specifications. CONCLUSIONS It is clear that further developments are required before a mobile application can be used widely in voice analysis. However, promising results have been obtained thus far, and the benefits of mobile technology in this field, particularly in voice rehabilitation, warrant further research into its widespread implementation.
Collapse
|
46
|
Rusz J, Hlavnicka J, Tykalova T, Novotny M, Dusek P, Sonka K, Ruzicka E. Smartphone Allows Capture of Speech Abnormalities Associated With High Risk of Developing Parkinson’s Disease. IEEE Trans Neural Syst Rehabil Eng 2018; 26:1495-1507. [DOI: 10.1109/tnsre.2018.2851787] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
47
|
Chen H, Muros-Cobos JL, Holgado-Terriza JA, Amirfazli A. Surface tension measurement with a smartphone using a pendant drop. Colloids Surf A Physicochem Eng Asp 2017. [DOI: 10.1016/j.colsurfa.2017.08.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
48
|
Maryn Y, Ysenbaert F, Zarowski A, Vanspauwen R. Mobile Communication Devices, Ambient Noise, and Acoustic Voice Measures. J Voice 2017; 31:248.e11-248.e23. [DOI: 10.1016/j.jvoice.2016.07.023] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 07/26/2016] [Accepted: 07/26/2016] [Indexed: 10/20/2022]
|
49
|
Manfredi C, Lebacq J, Cantarella G, Schoentgen J, Orlandi S, Bandini A, DeJonckere P. Smartphones Offer New Opportunities in Clinical Voice Research. J Voice 2017; 31:111.e1-111.e7. [DOI: 10.1016/j.jvoice.2015.12.020] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 12/30/2015] [Indexed: 11/17/2022]
|
50
|
Grillo EU, Brosious JN, Sorrell SL, Anand S. Influence of Smartphones and Software on Acoustic Voice Measures. Int J Telerehabil 2016; 8:9-14. [PMID: 28775797 PMCID: PMC5536725 DOI: 10.5195/ijt.2016.6202] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship.
Collapse
Affiliation(s)
- Elizabeth U Grillo
- WEST CHESTER UNIVERSITY OF PENNSYLVANIA, DEPARTMENT OF COMMUNICATION SCIENCES AND DISORDERS, WEST CHESTER, PENNSYLVANIA, USA
| | - Jenna N Brosious
- WEST CHESTER UNIVERSITY OF PENNSYLVANIA, DEPARTMENT OF COMMUNICATION SCIENCES AND DISORDERS, WEST CHESTER, PENNSYLVANIA, USA
| | - Staci L Sorrell
- WEST CHESTER UNIVERSITY OF PENNSYLVANIA, DEPARTMENT OF COMMUNICATION SCIENCES AND DISORDERS, WEST CHESTER, PENNSYLVANIA, USA
| | - Supraja Anand
- UNIVERSITY OF SOUTH FLORIDA, DEPARTMENT OF COMMUNICATION SCIENCES AND DISORDERS, TAMPA, FLORIDA, USA
| |
Collapse
|