1
|
Hawley JL, Hancock AB. Incorporating Mobile App Technology in Voice Modification Protocol for Transgender Women. J Voice 2024; 38:337-345. [PMID: 34706847 DOI: 10.1016/j.jvoice.2021.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/02/2021] [Accepted: 09/07/2021] [Indexed: 11/23/2022]
Abstract
PURPOSE Motivated by practice and feedback principles of motor learning, a hybrid clinic-home protocol for voice feminization was developed to minimize the role of SLPs to 1 of supervision and professional guidance and to maximize learning during independent practice apart from intervention sessions. The purpose was to explore the effectiveness and acceptability of the innovative service delivery. METHOD This single-subject changing criterion design included four transgender women who completed a 10-week hybrid clinic-home voice intervention program delivered via 30 -minute weekly in-clinic sessions and a technology-supported home program. The program was client-centered and capitalized on principles of motor learning in that it incorporated frequent practice with intermittent, knowledge-of-result feedback. Participants' desired outcomes were measured using acoustics, self and listener ratings of audio samples, and a program evaluation questionnaire. RESULTS Average speaking fundamental frequency of phrases and picture descriptions gradually increased into the 170-220 Hz range for all except one participant. All four transgender women were perceived to sound more feminine following treatment compared to baseline. Participants found the in-clinic sessions useful, the app easy to use, and noted limited fatigue or discomfort. CONCLUSION Four transwomen met their goals using this hybrid clinic-home service delivery format. Further investigations may elucidate key factors of the success achieved in the current study by designing comparison delivery models and including people from other populations.
Collapse
Affiliation(s)
- Janet L Hawley
- Speech, Language, and Hearing Sciences Department, University of Arizona, Tucson, Arizona.
| | - Adrienne B Hancock
- Dept of Speech, Language, and Hearing Sciences, George Washington University, Washington, District of Columbia
| |
Collapse
|
2
|
Evangelista E, Kale R, McCutcheon D, Rameau A, Gelbard A, Powell M, Johns M, Law A, Song P, Naunheim M, Watts S, Bryson PC, Crowson MG, Pinto J, Bensoussan Y. Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey. Laryngoscope 2024; 134:1333-1339. [PMID: 38087983 DOI: 10.1002/lary.31052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 08/08/2023] [Accepted: 08/29/2023] [Indexed: 02/17/2024]
Abstract
INTRODUCTION Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice data are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research. OBJECTIVE The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research. METHODS A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research. RESULTS Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%). CONCLUSION To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing. LEVEL OF EVIDENCE 5 Laryngoscope, 134:1333-1339, 2024.
Collapse
Affiliation(s)
- Emily Evangelista
- University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| | - Rohan Kale
- Department of Biology, University of South Florida, Tampa, Florida, U.S.A
| | | | - Anais Rameau
- Department of Otolaryngology, Head and Neck Surgery Weill Cornell Medical College, Ithaca, New York, U.S.A
| | - Alexander Gelbard
- Department of Otolaryngology, Head and Neck Surgery Vanderbilt University Medical Center, Nashville, Tennessee, U.S.A
| | - Maria Powell
- Department of Otolaryngology, Head and Neck Surgery Vanderbilt University Medical Center, Nashville, Tennessee, U.S.A
| | - Michael Johns
- Department of Otolaryngology-Head and Neck Surgery Keck College of Medicine, University of Southern California, Los Angeles, California, U.S.A
| | - Anthony Law
- Department of Otolaryngology, Emory University School of Medicine, Atlanta, Georgia, U.S.A
| | - Phillip Song
- Massachusetts Eye and Ear, Division of Laryngology, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Matthew Naunheim
- Massachusetts Eye and Ear, Division of Laryngology, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Stephanie Watts
- Department of Otolaryngology, Head and Neck Surgery at University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| | - Paul C Bryson
- Department of Otolaryngology, Head and Neck Surgery at Cleveland Clinic, Cleveland, Ohio, U.S.A
| | - Matthew G Crowson
- Massachusetts Eye and Ear, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Jeremy Pinto
- Mila Quebec Artificial Intelligence Institute, Montreal, Quebec, Canada
| | - Yael Bensoussan
- Division of Laryngology Department of Otolaryngology, Head and Neck Surgery at University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| |
Collapse
|
3
|
Busquet F, Efthymiou F, Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices. Behav Res Methods 2024; 56:2114-2134. [PMID: 37253958 PMCID: PMC10228884 DOI: 10.3758/s13428-023-02139-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2023] [Indexed: 06/01/2023]
Abstract
The use of voice recordings in both research and industry practice has increased dramatically in recent years-from diagnosing a COVID-19 infection based on patients' self-recorded voice samples to predicting customer emotions during a service center call. Crowdsourced audio data collection in participants' natural environment using their own recording device has opened up new avenues for researchers and practitioners to conduct research at scale across a broad range of disciplines. The current research examines whether fundamental properties of the human voice are reliably and validly captured through common consumer-grade audio-recording devices in current medical, behavioral science, business, and computer science research. Specifically, this work provides evidence from a tightly controlled laboratory experiment analyzing 1800 voice samples and subsequent simulations that recording devices with high proximity to a speaker (such as a headset or a lavalier microphone) lead to inflated measures of amplitude compared to a benchmark studio-quality microphone while recording devices with lower proximity to a speaker (such as a laptop or a smartphone in front of the speaker) systematically reduce measures of amplitude and can lead to biased measures of the speaker's true fundamental frequency. We further demonstrate through simulation studies that these differences can lead to biased and ultimately invalid conclusions in, for example, an emotion detection task. Finally, we outline a set of recording guidelines to ensure reliable and valid voice recordings and offer initial evidence for a machine-learning approach to bias correction in the case of distorted speech signals.
Collapse
Affiliation(s)
- Francesc Busquet
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland.
| | - Fotis Efthymiou
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland
| | - Christian Hildebrand
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland.
| |
Collapse
|
4
|
Ceylan ME, Cangi ME, Yılmaz G, Peru BS, Yiğit Ö. Are smartphones and low-cost external microphones comparable for measuring time-domain acoustic parameters? Eur Arch Otorhinolaryngol 2023; 280:5433-5444. [PMID: 37584753 DOI: 10.1007/s00405-023-08179-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/05/2023] [Indexed: 08/17/2023]
Abstract
PURPOSE This study examined and compared the diagnostic accuracy and correlation levels of the acoustic parameters of the audio recordings obtained from smartphones on two operating systems and from dynamic and condenser types of external microphones. METHOD The study included 87 adults: 57 with voice disorder and 30 with a healthy voice. Each participant was asked to perform a sustained vowel phonation (/a/). The recordings were taken simultaneously using five microphones AKG-P220, Shure-SM58, Samson Go Mic, Apple iPhone 6, and Samsung Galaxy J7 Pro microphones in an acoustically insulated cabinet. Acoustic examinations were performed using Praat version 6.2.09. The data were examined using Pearson correlation and receiver-operating characteristic (ROC) analyses. RESULTS The parameters with the highest area under curve (AUC) values among all microphone recordings in the time-domain analyses were the frequency perturbation parameters. Additionally, considering the correlation coefficients obtained by synchronizing the microphones with each other and the AUC values together, the parameter with the highest correlation coefficient and diagnostic accuracy values was the jitter-local parameter. CONCLUSION Period-to-period perturbation parameters obtained from audio recordings made with smartphones show similar levels of diagnostic accuracy to external microphones used in clinical conditions.
Collapse
Affiliation(s)
- M Enes Ceylan
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - M Emrah Cangi
- University of Health Sciences, Speech and Language Therapy, Selimiye, Tıbbiye Cd No: 38, Istanbul, 34668, Üsküdar, Türkiye.
| | - Göksu Yılmaz
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Beyza Sena Peru
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Özgür Yiğit
- Istanbul Şişli Hamidiye Etfal Training and Research Hospital, Istanbul, Türkiye
| |
Collapse
|
5
|
Dhawan K, Varghese A, Kumar N, Varghese SS. Utility of Smart Phones as a Voice Acquisition Device for Assessing Pre and Post Treatment Voice Using PRAAT. Indian J Otolaryngol Head Neck Surg 2023; 75:2901-2906. [PMID: 37974690 PMCID: PMC10645755 DOI: 10.1007/s12070-023-03884-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/08/2023] [Indexed: 11/19/2023] Open
Abstract
Voice assessment before and after treatment helps the clinician to assess the effectiveness of the treatment given and facilitates comparison between different treatment modalities. Voice handicap index -10(VHI-10) questionnaire is a tool which allows the voice to be evaluated subjectively from the patient's perspective. PRAAT is a freely available, software programme that acoustically analyse voice signals. Smart phones are widely used and the high quality of the embedded microphone in it makes it a suitable and easily available voice recording device. This study aims at using PRAAT and VHI-10 questionnaire in evaluating voice before and after treatment. The utility of smart phones as a voice acquisition device is also explored in the study. Prospective, observational study, carried out from 1st November 2019 to 30th September 2021in the ENT out- patient department at a tertiary hospital in Punjab. 58 patients complaining of dysphonia were enrolled consecutively in the study. All patients underwent detailed history, examination of the larynx using 70-degree rigid laryngoscope. The voice handicap was scored by (VHI-10) questionnaire and acoustic evaluation of voice was done using the PRAAT software. Patients' voice was further evaluated 3 months post-therapy with VHI 10 questionnaire and acoustic analysis. The parameters measured on PRAAT were mean pitch, jitter (local), shimmer (local), and mean harmonics to noise ratio (HNR). The voice was recorded using a smart phone and later transferred onto a laptop for analysis. The pre and post treatment acoustic parameters and VHI-10 scores were compared and correlated. There was significant difference (p < 0.001) between the pre and post treatment VHI-10 scores and all the acoustic parameters measured except for median pitch (p = 0.995). A poor positive correlation was found between the pre treatment VHI-10 scores and jitter(r = 0.188, p = 0.157) and shimmer (r = 0.288, p = 0.028) values. A negative correlation was observed between pre treatment VHI-10 scores and pitch (r = - 0.151, p = 0.259) and HNR(r = - 0.424, p = 0.001). Post treatment VHI-10 scores showed positive correlation with jitter (r = 0.302, p = 0.021) and shimmer (0.162, p = 0.225) values and negative correlation with pitch (r = - 0.10, p = 0.457) and HNR (r = - 0.356, p = 0.006) values. We found significant differences in the VHI-10 scores and PRAAT voice analysis results before and after treatment in patients complaining with voice change (dysphonia). VHI-10 questionnaire and PRAAT are good and convenient tools for assessing the voice subjectively and objectively. Only a poor to fair correlation was found between VHI-10 scores and PRAAT analysis results. More studies must be done to confirm the utility of smart phones as a voice acquisition device and PRAAT software in voice analysis.
Collapse
Affiliation(s)
- Kaffy Dhawan
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| | - Ashish Varghese
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| | - Navneet Kumar
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| | - Sunil Sam Varghese
- Department of ENT, Christian Medical College, Ludhiana, Punjab 141008 India
| |
Collapse
|
6
|
Calà F, Frassineti L, Sforza E, Onesimo R, D’Alatri L, Manfredi C, Lanata A, Zampino G. Artificial Intelligence Procedure for the Screening of Genetic Syndromes Based on Voice Characteristics. Bioengineering (Basel) 2023; 10:1375. [PMID: 38135966 PMCID: PMC10741055 DOI: 10.3390/bioengineering10121375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 12/24/2023] Open
Abstract
Perceptual and statistical evidence has highlighted voice characteristics of individuals affected by genetic syndromes that differ from those of normophonic subjects. In this paper, we propose a procedure for systematically collecting such pathological voices and developing AI-based automated tools to support differential diagnosis. Guidelines on the most appropriate recording devices, vocal tasks, and acoustical parameters are provided to simplify, speed up, and make the whole procedure homogeneous and reproducible. The proposed procedure was applied to a group of 56 subjects affected by Costello syndrome (CS), Down syndrome (DS), Noonan syndrome (NS), and Smith-Magenis syndrome (SMS). The entire database was divided into three groups: pediatric subjects (PS; individuals < 12 years of age), female adults (FA), and male adults (MA). In line with the literature results, the Kruskal-Wallis test and post hoc analysis with Dunn-Bonferroni test revealed several significant differences in the acoustical features not only between healthy subjects and patients but also between syndromes within the PS, FA, and MA groups. Machine learning provided a k-nearest-neighbor classifier with 86% accuracy for the PS group, a support vector machine (SVM) model with 77% accuracy for the FA group, and an SVM model with 84% accuracy for the MA group. These preliminary results suggest that the proposed method based on acoustical analysis and AI could be useful for an effective, non-invasive automatic characterization of genetic syndromes. In addition, clinicians could benefit in the case of genetic syndromes that are extremely rare or present multiple variants and facial phenotypes.
Collapse
Affiliation(s)
- Federico Calà
- Department of Information Engineering, University of Florence, 50139 Florence, Italy; (F.C.); (L.F.); (A.L.)
| | - Lorenzo Frassineti
- Department of Information Engineering, University of Florence, 50139 Florence, Italy; (F.C.); (L.F.); (A.L.)
- Department of Information Engineering, Università degli Studi di Pisa, 56122 Pisa, Italy
| | - Elisabetta Sforza
- Department of Life Sciences and Public Health, Faculty of Medicine and Surgery, Catholic University of Sacred Heart, 00168 Rome, Italy; (E.S.); (G.Z.)
| | - Roberta Onesimo
- Centre for Rare Diseases and Transition, Department of Woman and Child Health and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy;
| | - Lucia D’Alatri
- Unit for Ear, Nose and Throat Medicine, Department of Neuroscience, Sensory Organs and Chest, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy;
| | - Claudia Manfredi
- Department of Information Engineering, University of Florence, 50139 Florence, Italy; (F.C.); (L.F.); (A.L.)
| | - Antonio Lanata
- Department of Information Engineering, University of Florence, 50139 Florence, Italy; (F.C.); (L.F.); (A.L.)
| | - Giuseppe Zampino
- Department of Life Sciences and Public Health, Faculty of Medicine and Surgery, Catholic University of Sacred Heart, 00168 Rome, Italy; (E.S.); (G.Z.)
- Centre for Rare Diseases and Transition, Department of Woman and Child Health and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy;
- European Reference Network for Rare Malformation Syndromes, Intellectual and Other Neurodevelopmental Disorders—ERN ITHACA
| |
Collapse
|
7
|
Llico AF, Shanley SN, Friedman AD, Bamford LM, Roberts RM, McKenna VS. Comparison Between Custom Smartphone Acoustic Processing Algorithms and Praat in Healthy and Disordered Voices. J Voice 2023:S0892-1997(23)00241-2. [PMID: 37690854 DOI: 10.1016/j.jvoice.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/31/2023] [Accepted: 07/31/2023] [Indexed: 09/12/2023]
Abstract
OBJECTIVE The aim of this study was to understand the relationship between temporal and spectral-based acoustic measures derived using Praat and custom smartphone algorithms across patients with a wide range of vocal pathologies. METHODS Voice samples were collected from 56 adults (11 vocally healthy, 45 dysphonic, aged 18-80 years) performing three speech tasks: (a) sustained vowel, (b) maximum phonation, and (c) the second and third sentences of the Rainbow passage. Data were analyzed to extract mean fundamental frequency (fo), maximum phonation time (MPT), and cepstral peak prominence (CPP) using Praat and our custom smartphone algorithms. Linear regression models were calculated with and without outliers to determine relationships. RESULTS Statistically significant relationships were found between the smartphone algorithms and Praat for all three measures (r2 = 0.68-0.95, with outliers; r2 = 0.80-0.98, without outliers). An offset between CPP measures was found where Praat values were consistently lower than those computed by the smartphone app. Outlying data were identified and described, and findings indicated that speakers with high levels of clinician-perceived dysphonia resulted in smartphone algorithm errors. CONCLUSIONS These results suggest that the proposed algorithms can provide measurements comparable to clinically derived values. However, clinicians should take caution when analyzing severely dysphonic voices as the current algorithms show reduced accuracy for measures of mean fo and MPT for these voice types.
Collapse
Affiliation(s)
- Andres F Llico
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio
| | - Savannah N Shanley
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio
| | - Aaron D Friedman
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio
| | - Leigh M Bamford
- Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, Ohio
| | - Rachel M Roberts
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio
| | - Victoria S McKenna
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio; Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio; Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio.
| |
Collapse
|
8
|
Frassineti L, Calà F, Sforza E, Onesimo R, Leoni C, Lanatà A, Zampino G, Manfredi C. Quantitative acoustical analysis of genetic syndromes in the number listing task. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
9
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Pribuišis K, Blažauskas T, Damaševičius R, Maskeliūnas R. Reliability of Universal-Platform-Based Voice Screen Application in AVQI Measurements Captured with Different Smartphones. J Clin Med 2023; 12:4119. [PMID: 37373811 DOI: 10.3390/jcm12124119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/15/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
The aim of the study was to develop a universal-platform-based (UPB) application suitable for different smartphones for estimation of the Acoustic Voice Quality Index (AVQI) and evaluate its reliability in AVQI measurements and normal and pathological voice differentiation. Our study group consisted of 135 adult individuals, including 49 with normal voices and 86 patients with pathological voices. The developed UPB "Voice Screen" application installed on five iOS and Android smartphones was used for AVQI estimation. The AVQI measures calculated from voice recordings obtained from a reference studio microphone were compared with AVQI results obtained using smartphones. The diagnostic accuracy of differentiating normal and pathological voices was evaluated by applying receiver-operating characteristics. One-way ANOVA analysis did not detect statistically significant differences between mean AVQI scores revealed using a studio microphone and different smartphones (F = 0.759; p = 0.58). Almost perfect direct linear correlations (r = 0.991-0.987) were observed between the AVQI results obtained with a studio microphone and different smartphones. An acceptable level of precision of the AVQI in discriminating between normal and pathological voices was yielded, with areas under the curve (AUC) displaying 0.834-0.862. There were no statistically significant differences between the AUCs (p > 0.05) obtained from studio and smartphones' microphones. The significant difference revealed between the AUCs was only 0.028. The UPB "Voice Screen" application represented an accurate and robust tool for voice quality measurements and normal vs. pathological voice screening purposes, demonstrating the potential to be used by patients and clinicians for voice assessment, employing both iOS and Android smartphones.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, 50061 Kaunas, Lithuania
| | - Tomas Blažauskas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| | | | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, 51368 Kaunas, Lithuania
| |
Collapse
|
10
|
Vinney LA, Tripp R, Shelly S, Gillespie A. Indexing Cognitive Resource Usage for Acquisition of Initial Voice Therapy Targets. Am J Speech Lang Pathol 2023; 32:717-732. [PMID: 36701805 DOI: 10.1044/2022_ajslp-22-00197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
PURPOSE The purpose of this study was to index cognitive resource usage for acquisition of initial targets of two common voice therapy techniques (resonant voice therapy [RVT] and conversation training therapy [CTT]) based on the theorized depletion effect (i.e., when an initial task requiring high cognitive load leads to poorer performance on a subsequent task). METHOD Eleven vocally healthy participants, ages 23-41 years, read aloud the Rainbow Passage and produced consonant-vowel resonant targets (/mi, ma, mu/) followed by a baseline computerized Stroop task and a 15-min washout. Following this baseline period, participants watched and interacted with two videos instructing them in RVT or CTT initial targets. After viewing each video and practicing the associated vocal skills, participants rated the degree of mental effort required to engage in the target vocal technique on a modified Borg scale. Participants recorded their attempts at RVT on /mi, ma, mu/ and CTT on the Rainbow Passage, which were later rated by three voice-specialized speech-language pathologists as to how representative they were of each respective target technique. Changes in fundamental frequency and average auditory-perceptual ratings from baseline were examined to determine if participants adjusted their technique from RVT and CTT baseline to acquisition. RESULTS Performance on the Stroop task was, on average, worse post CTT than post RVT, but both post-CTT and post-RVT Stroop scores were poorer than baseline. These results suggest that both treatment techniques taxed cognitive resources but that CTT was more cognitively taxing than RVT. However, despite differences in raw averages, no statistically significant differences were found between the baseline, post-CTT, and post-RVT Stroop scores, likely due to the small sample size. Participant ratings of mental effort for CTT and RVT were statistically similar. Likewise, poorer post-RVT Stroop scores were associated with participants' greater perceived mental effort with RVT acquisition, but there was no significant association between mental effort ratings for CTT acquisition and post-CTT Stroop scores. Significantly higher fundamental frequency and perceived ratings of the accuracy of technique from baseline to acquisition for both CTT and RVT were found, providing evidence of vocal behavior changes as a result of each technique. CONCLUSIONS Brief exposure to initial treatment tasks in CTT is more cognitively depleting than initial RVT tasks. Results also indicate that vocally healthy participants are able to make a voice change in response to a brief therapy prompt. Finally, participant-rated measures of mental effort and secondary measures of cognitive depletion do not always correlate.
Collapse
Affiliation(s)
| | - Raquel Tripp
- Department of Communicative Sciences and Disorders, New York University, NY
| | - Sandeep Shelly
- Emory Voice Center, Department of Otolarynngology-Head and Neck Surgery, Emory University, Atlanta, GA
| | - Amanda Gillespie
- Emory Voice Center, Department of Otolarynngology-Head and Neck Surgery, Emory University, Atlanta, GA
| |
Collapse
|
11
|
Cavalcanti JC, Englert M, Oliveira M, Constantini AC. Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study. J Voice 2023; 37:162-172. [PMID: 33451892 DOI: 10.1016/j.jvoice.2020.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/01/2020] [Accepted: 12/03/2020] [Indexed: 11/29/2022]
Abstract
OBJECTIVE This study aimed to analyze the effects of microphone and audio compression variables on voice and speech parameters acquisition. METHOD Acoustic measures were recorded and compared using a high-quality reference microphone and three testing microphones. The tested microphones displayed differences in specifications and acoustic properties. Furthermore, the impact of the audio compression was assessed by resampling the original uncompressed audio files into the MPEG-1/2 Audio Layer 3 (mp3) format at three different compression rates (128 kbps, 64 kbps, 32 kbps). Eight speakers were recruited in each recording session and asked to produce four sustained vowels: two [a] segments and two [ɛ] segments. The audio was captured simultaneously by the reference and tested microphones. The recordings were synchronized and analyzed using the Praat software. RESULTS From a set of eight acoustic parameters assessed (f0, F1, F2, jitter%, shimmer%, HNR, H1-H2, and CPP), three (f0, F2, and jitter%) were suggested as resistant regarding the microphone and audio compression variables. In contrast, some parameters seemed to be significantly affected by both factors: HNR, H1-H2, and CPP; while shimmer% was found sensitive only concerning the latter factor. Moreover, higher compression rates appeared to yield more frequent acoustic distortions than lower rates. CONCLUSION Overall, the outcomes suggest that acoustic parameters are influenced by both the microphone selection and the audio compression usage, which may reflect the practical implications of these components on the acoustic analysis reliability.
Collapse
Affiliation(s)
- Julio Cesar Cavalcanti
- Universidade Estadual de Campinas (UNICAMP), Institute of Language Studies, Campinas - SP, Brazil.
| | - Marina Englert
- Universidade Federal de São Paulo (UNIFESP), Department of Communication Disorders, São Paulo - SP, Brazil; Centro de Estudos da Voz (CEV), São Paulo - SP, Brazil
| | - Miguel Oliveira
- Universidade Federal de Alagoas (UFAL), Department of Letters, Maceió - AL, Brazil
| | | |
Collapse
|
12
|
Calà F, Manfredi C, Battilocchi L, Frassineti L, Cantarella G. Speaking with mask in the COVID-19 era: Multiclass machine learning classification of acoustic and perceptual parameters. J Acoust Soc Am 2023; 153:1204. [PMID: 36859154 DOI: 10.1121/10.0017244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
The intensive use of personal protective equipment often requires increasing voice intensity, with possible development of voice disorders. This paper exploits machine learning approaches to investigate the impact of different types of masks on sustained vowels /a/, /i/, and /u/ and the sequence /a'jw/ inside a standardized sentence. Both objective acoustical parameters and subjective ratings were used for statistical analysis, multiple comparisons, and in multivariate machine learning classification experiments. Significant differences were found between mask+shield configuration and no-mask and between mask and mask+shield conditions. Power spectral density decreases with statistical significance above 1.5 kHz when wearing masks. Subjective ratings confirmed increasing discomfort from no-mask condition to protective masks and shield. Machine learning techniques proved that masks alter voice production: in a multiclass experiment, random forest (RF) models were able to distinguish amongst seven masks conditions with up to 94% validation accuracy, separating masked from unmasked conditions with up to 100% validation accuracy and detecting the shield presence with up to 86% validation accuracy. Moreover, an RF classifier allowed distinguishing male from female subject in masked conditions with 100% validation accuracy. Combining acoustic and perceptual analysis represents a robust approach to characterize masks configurations and quantify the corresponding level of discomfort.
Collapse
Affiliation(s)
- F Calà
- Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - C Manfredi
- Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - L Battilocchi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| | - L Frassineti
- Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - G Cantarella
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| |
Collapse
|
13
|
Uloza V, Ulozaite-Staniene N, Petrauskas T. An iOS-based VoiceScreen application: feasibility for use in clinical settings-a pilot study. Eur Arch Otorhinolaryngol 2023; 280:277-84. [PMID: 35906420 DOI: 10.1007/s00405-022-07546-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/06/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVES To elaborate the application suitable for smartphones for estimation of Acoustic Voice Quality Index (AVQI) and evaluate its usability in the clinical setting. METHODS An elaborated AVQI automatization and background noise monitoring functions were implemented into a mobile "VoiceScreen" application running the iOS operating system. A study group consisted of 103 adult individuals with normal voices (n = 30) and 73 patients with pathological voices. Voice recordings were performed in the clinical setting with "VoiceScreen" app using iPhone 8 microphones. Voices of 30 patients were recorded before and 1 month after phonosurgical intervention. To evaluate the diagnostic accuracy differentiating normal and pathological voice, the receiver-operating characteristic statistics, i.e., area under the curve (AUC), sensitivity and specificity, and correct classification rate (CCR) were used. RESULTS A high level of precision of AVQI in discriminating between normal and dysphonic voices was yielded with corresponding AUC = 0.937. The AVQI cutoff score of 3.4 demonstrated a sensitivity of 86.3% and specificity of 95.6% with a CCR of 89.2%. The preoperative mean value of the AVQI [6.01(SD 2.39)] in the post-phonosurgical follow-up group decreased to 2.00 (SD 1.08). No statistically significant differences (p = 0.216) between AVQI measurements in a normal voice and 1-month follow-up after phonosurgery groups were revealed. CONCLUSIONS The "VoiceScreen" app represents an accurate and robust tool for voice quality measurement and demonstrates the potential to be used in clinical settings as a sensitive measure of voice changes across phonosurgical treatment outcomes.
Collapse
|
14
|
Fahed VS, Doheny EP, Busse M, Hoblyn J, Lowery MM. Comparison of Acoustic Voice Features Derived From Mobile Devices and Studio Microphone Recordings. J Voice 2022:S0892-1997(22)00312-5. [PMID: 36379826 DOI: 10.1016/j.jvoice.2022.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/10/2022] [Accepted: 10/10/2022] [Indexed: 11/14/2022]
Abstract
OBJECTIVES/HYPOTHESIS Improvements in mobile device technology offer new opportunities for remote monitoring of voice for home and clinical assessment. However, there is a need to establish equivalence between features derived from signals recorded from mobile devices and gold standard microphone-preamplifiers. In this study acoustic voice features from android smartphone, tablet, and microphone-preamplifier recordings were compared. METHODS Data were recorded from 37 volunteers (20 female) with no history of speech disorder and six volunteers with Huntington's disease (HD) during sustained vowel (SV) phonation, reading passage (RP), and five syllable repetition (SR) tasks. The following features were estimated: fundamental frequency median and standard deviation (F0 and SD F0), harmonics-to-noise ratio (HNR), local jitter, relative average perturbation of jitter (RAP), five-point period perturbation quotient (PPQ5), difference of differences of amplitude and periods (DDA and DDP), shimmer, and amplitude perturbation quotients (APQ3, APQ5, and APQ11). RESULTS Bland-Altman analysis revealed good agreement between microphone and mobile devices for fundamental frequency, jitter, RAP, PPQ5, and DDP during all tasks and a bias for HNR, shimmer and its variants (APQ3, APQ5, APQ11, and DDA). Significant differences were observed between devices for HNR, shimmer, and its variants for all tasks. High correlation was observed between devices for all features, except SD F0 for RP. Similar results were observed in the HD group for SV and SR task. Biological sex had a significant effect on F0 and HNR during all tests, and for jitter, RAP, PPQ5, DDP, and shimmer for RP and SR. No significant effect of age was observed. CONCLUSIONS Mobile devices provided good agreement with state of the art, high-quality microphones during structured speech tasks for features derived from frequency components of the audio recordings. Caution should be taken when estimating HNR, shimmer and its variants from recordings made with mobile devices.
Collapse
Affiliation(s)
- Vitória S Fahed
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland.
| | - Emer P Doheny
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| | - Monica Busse
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Jennifer Hoblyn
- School of Medicine, Trinity College Dublin, Dublin, Ireland; Bloomfield Health Services, Dublin, Ireland
| | - Madeleine M Lowery
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|
15
|
Di Pietro DA, Olivares A, Comini L, Vezzadini G, Luisa A, Petrolati A, Boccola S, Boccali E, Pasotti M, Danna L, Vitacca M. Voice Alterations, Dysarthria, and Respiratory Derangements in Patients With Parkinson's Disease. J Speech Lang Hear Res 2022; 65:3749-3757. [PMID: 36194769 DOI: 10.1044/2022_jslhr-21-00539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
PURPOSE Almost 90% of people with Parkinson's disease (PD) develop voice and speech disorders during the course of the disease. Ventilatory dysfunction is one of the main causes. We aimed to evaluate relationships between respiratory impairments and speech/voice changes in PD. METHOD At Day 15 from admission, in consecutive clinically stable PD patients in a neurorehabilitation unit, we collected clinical data as follows: comorbidities, PD severity, motor function and balance, respiratory function at rest (including muscle strength and cough ability), during exercise-induced desaturation and at night, voice function (Voice Handicap Index [VHI] and acoustic analysis [Praat]), speech disorders (Robertson Dysarthria Profile [RDP]), and postural abnormalities. Based on an arbitrary RDP cutoff, two groups with different dysarthria degree were identified-moderate-severe versus no-mild dysarthria-and compared. RESULTS Of 55 patients analyzed (median value Unified Parkinson's Disease Rating Scale Part II 9 and Part III 17), we found significant impairments in inspiratory and expiratory muscle pressure (> 90%, both), exercise tolerance at 6-min walking distance (96%), nocturnal (12.7%) and exercise-induced (21.8%) desaturation, VHI (34%), and Praat Shimmer% (89%). Patients with moderate-severe dysarthria (16% of total sample) had more comorbidities/disabilities and worse respiratory pattern and postural abnormalities (camptocormia) than those with no-mild dysarthria. Moreover, the risk of presenting nocturnal desaturation, reduced peak expiratory flow, and cough ability was about 11, 13, and 8 times higher in the moderate-severe group. CONCLUSIONS Dysarthria and respiratory dysfunction are closely associated in PD patients, particularly nocturnal desaturation and reduced cough ability. In addition, postural condition could be at the base of both respiratory and voice impairments. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21210944.
Collapse
Affiliation(s)
- Davide Antonio Di Pietro
- Neurorehabilitation Unit of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| | - Adriana Olivares
- Scientific Direction of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| | - Laura Comini
- Scientific Direction of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| | - Giuliana Vezzadini
- Neurorehabilitation Unit of the Institute of Castel Goffredo, Istituti Clinici Scientifici Maugeri IRCCS, Mantova, Italy
| | - Alberto Luisa
- Neurorehabilitation Unit of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| | - Anna Petrolati
- Neurorehabilitation Unit of the Institute of Castel Goffredo, Istituti Clinici Scientifici Maugeri IRCCS, Mantova, Italy
| | - Sara Boccola
- Neurorehabilitation Unit of the Institute of Castel Goffredo, Istituti Clinici Scientifici Maugeri IRCCS, Mantova, Italy
| | - Elisa Boccali
- Neurorehabilitation Unit of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| | - Monica Pasotti
- Neurorehabilitation Unit of the Institute of Castel Goffredo, Istituti Clinici Scientifici Maugeri IRCCS, Mantova, Italy
| | - Laura Danna
- Neurorehabilitation Unit of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| | - Michele Vitacca
- Respiratory Rehabilitation of the Institute of Lumezzane, Istituti Clinici Scientifici Maugeri IRCCS, Brescia, Italy
| |
Collapse
|
16
|
Gerosa M, Kenny C. The Effects of Vocal Loading and Steam Inhalation on Acoustic, Aerodynamic and Vocal Tract Discomfort Measures in Adults. J Voice 2022. [DOI: 10.1016/j.jvoice.2022.09.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
17
|
Pommée T, Morsomme D. Voice Quality in Telephone Interviews: A preliminary Acoustic Investigation. J Voice 2022:S0892-1997(22)00268-5. [PMID: 36192289 DOI: 10.1016/j.jvoice.2022.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/24/2022] [Accepted: 08/25/2022] [Indexed: 10/07/2022]
Abstract
OBJECTIVES To investigate the impact of standardized mobile phone recordings passed through a telecom channel on acoustic markers of voice quality and on its perception by voice experts in normophonic speakers. METHODS Continuous speech and a sustained vowel were recorded for fourteen female and ten male normophonic speakers. The recordings were done simultaneously with a head-mounted high-quality microphone and through the telephone network on a receiving smartphone. Twenty-two acoustic voice quality, breathiness and pitch-related measures were extracted from the recordings. Nine vocologists perceptually rated the G, R and B parameters of the GRBAS scale on each voice sample. The reproducibility, the recording type, the stimulus type and the gender effects, as well as the correlation between acoustic and perceptual measures were investigated. RESULTS The sustained vowel samples are damped after one second. Only the frequencies between 100 and 3700Hz are passed through the telecom channel and the frequency response is characterized by peaks and troughs. The acoustic measures show a good reproducibility over the three repetitions. All measures significantly differ between the recording types, except for the local jitter, the harmonics-to-noise ratio by Dejonckere and Lebacq, the period standard deviation and all six pitch measures. The AVQI score is higher in telephone recordings, while the ABI score is lower. Significant differences between genders are also found for most of the measures; while the AVQI is similar in men and women, the ABI is higher in women in both recording types. For the perceptual assessment, the interrater agreement is rather low, while the reproducibility over the three repetitions is good. Few significant differences between recording types are observed, except for lower breathiness ratings on telephone recordings. G ratings are significantly more severe on the sustained vowel on both recording types, R ratings only on telephone recordings. While roughness is rated higher in men on telephone recordings by most experts, no gender effect is observed for breathiness on either recording types. Finally, neither the AVQI nor the ABI yield strong correlations with any of the perceptual parameters. CONCLUSIONS Our results show that passing a voice signal through a telecom channel induces filter and noise effects that limit the use of common acoustic voice quality measures and indexes. The AVQI and ABI are both significantly impacted by the recording type. The most reliable acoustic measures seem to be pitch perturbation (local jitter and period standard deviation) as well as the harmonics-to-noise ratio from Dejonckere and Lebacq. Our results also underline that raters are not equally sensitive to the various factors, including the recording type, the stimulus type and the gender effects. Neither of the three perceptual parameters G, R and B seem to be reliably measurable on telephone recordings using the two investigated acoustic indexes. Future studies investigating the impact of voice quality in telephone conversations should thus focus on acoustic measures on continuous speech samples that are limited to the frequency response of the telecom channel and that are not too sensitive to environmental and additive noise.
Collapse
Affiliation(s)
- Timothy Pommée
- Research Unit for a life-Course perspective on Health and Education, Voice Unit, University of Liège, Belgium.
| | - Dominique Morsomme
- Research Unit for a life-Course perspective on Health and Education, Voice Unit, University of Liège, Belgium
| |
Collapse
|
18
|
Rodríguez Marconi D, Morales C, Araya P, Ferrada R, Ibarra M, Catrifol MT. Uso del smartphone en telepráctica para trastornos de la voz. Una revisión desde el concepto de Mhealth. Rev investig logop 2022. [DOI: 10.5209/rlog.78550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
El uso de los smartphones y el concepto de mobile health (mHealth) es reciente en vocología y sus posibles beneficios en el tratamiento y entrenamiento vocal en contexto de telepráctica. Se realizó una revisión narrativa con el objetivo describir los beneficios de la mHealth a través del smartphone en el contexto de la telepráctica fonoaudiológica de los trastornos vocales. Se buscaron artículos científicos en Pubmed, ScienceDirect y Google Scholar, asociados al uso del smartphone en vocología, considerando voces humanas normales, patológicas y voces sintéticas; relacionados a la intervención, evaluación, valoración, monitoreo, prevención, intervención, supervisión, educación, consulta y entrenamiento vocal. Se revisaron 42 estudios, de los cuales fueron seleccionados 15 de acuerdo a los criterios de inclusión. Los estudios analizados se relacionan con grabación de voz para análisis acústico con smartphone, teleterapia con smartphone y dispositivos periféricos para análisis vocal y seguimiento. Se destaca el potencial de los dispositivos móviles para incrementar accecibilidad, reducir costos y favorecer el seguimiento terapéutico con medidas objetivas en diversos contextos de salud vocal.
Collapse
|
19
|
Yamada Y, Shinkawa K, Nemoto M, Arai T. Automatic Assessment of Loneliness in Older Adults Using Speech Analysis on Responses to Daily Life Questions. Front Psychiatry 2021; 12:712251. [PMID: 34966297 PMCID: PMC8710612 DOI: 10.3389/fpsyt.2021.712251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Loneliness is a perceived state of social and emotional isolation that has been associated with a wide range of adverse health effects in older adults. Automatically assessing loneliness by passively monitoring daily behaviors could potentially contribute to early detection and intervention for mitigating loneliness. Speech data has been successfully used for inferring changes in emotional states and mental health conditions, but its association with loneliness in older adults remains unexplored. In this study, we developed a tablet-based application and collected speech responses of 57 older adults to daily life questions regarding, for example, one's feelings and future travel plans. From audio data of these speech responses, we automatically extracted speech features characterizing acoustic, prosodic, and linguistic aspects, and investigated their associations with self-rated scores of the UCLA Loneliness Scale. Consequently, we found that with increasing loneliness scores, speech responses tended to have less inflections, longer pauses, reduced second formant frequencies, reduced variances of the speech spectrum, more filler words, and fewer positive words. The cross-validation results showed that regression and binary-classification models using speech features could estimate loneliness scores with an R 2 of 0.57 and detect individuals with high loneliness scores with 95.6% accuracy, respectively. Our study provides the first empirical results suggesting the possibility of using speech data that can be collected in everyday life for the automatic assessments of loneliness in older adults, which could help develop monitoring technologies for early detection and intervention for mitigating loneliness.
Collapse
Affiliation(s)
| | | | - Miyuki Nemoto
- Dementia Medical Center, University of Tsukuba Hospital, Tsukuba, Japan
| | - Tetsuaki Arai
- Division of Clinical Medicine, Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
20
|
Castillo-Allendes A, Contreras-Ruston F, Cantor L, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Terapia de voz en el contexto de la pandemia covid-19; recomendaciones para la práctica clínica. J Voice 2021; 35:808.e1-808.e12. [PMID: 32917457 PMCID: PMC7442931 DOI: 10.1016/j.jvoice.2020.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Since the beginning of the new pandemic, COVID-19 health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from 5 different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULTS The clinical guide provides 79 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendations; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Francisco Contreras-Ruston
- Speech-Language Pathology and Audiology Department, Universidad de Valparaíso, San Felipe, Chile,Address correspondence and reprint requests to Francisco Contreras-Ruston, CEV–Centro de Estudos da Voz, Rua Machado Bittencourt, 361, SP 04044-001, Brazil
| | - Lady Cantor
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia,Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan,Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México,Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina,Servicio de Fonoudiología, Hospital de Clínicas “José de San Martin”, Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
21
|
Castillo-Allendes A, Contreras-Ruston F, Cantor L, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Terapia Vocal No Contexto Da Pandemia Do Covid-19; Orientações Para A Prática Clínica. J Voice 2021; 35:808.e13-808.e24. [PMID: 32917460 PMCID: PMC7439998 DOI: 10.1016/j.jvoice.2020.08.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Since the beginning of the new pandemic, Corona Virus Disease 2019 (COVID-19) health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from five different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULT The clinical guide provides 79 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendation; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Francisco Contreras-Ruston
- Speech-Language Pathology and Audiology Department, Universidad de Valparaíso, San Felipe, Chile,Address correspondence and reprint requests to Francisco Contreras-Ruston, CEV–Centro de Estudos da Voz, Rua Machado Bittencourt, 361, SP 04044-001, Brazil
| | - Lady Cantor
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia,Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan,Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México,Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina,Servicio de Fonoudiología, Hospital de Clínicas “José de San Martin,” Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil,Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
22
|
Castillo-Allendes A, Contreras-Ruston F, Cantor-Cutiva LC, Codino J, Guzman M, Malebran C, Manzano C, Pavez A, Vaiano T, Wilder F, Behlau M. Voice Therapy in the Context of the COVID-19 Pandemic: Guidelines for Clinical Practice. J Voice 2021; 35:717-727. [PMID: 32878736 PMCID: PMC7413113 DOI: 10.1016/j.jvoice.2020.08.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/30/2020] [Accepted: 08/03/2020] [Indexed: 01/14/2023]
Abstract
INTRODUCTION Since the beginning of the new pandemic, COVID-19 health services have had to face a new scenario. Voice therapy faces a double challenge, interventions using telepractice, and delivering rehabilitation services to a growing population of patients at risk of functional impairment related to the COVID-19 disease. Moreover, as COVID-19 is transmitted through droplets, it is critical to understand how to mitigate these risks during assessment and treatment. OBJECTIVE To promote safety, and effective clinical practice to voice assessment and rehabilitation in the pandemic COVID-19 context for speech-language pathologists. METHODS A group of 11 experts in voice and swallowing disorders from five different countries conducted a consensus recommendation following the American Academy of Otolaryngology-Head and Neck Surgery rules building a clinical guide for speech-language pathologists during this pandemic context. RESULTS The clinical guide provides 65 recommendations for clinicians in the management of voice disorders during the pandemic and includes advice from assessment, direct treatment, telepractice, and teamwork. The consensus was reached 95% for all topics. CONCLUSION This guideline should be taken only as recommendation; each clinician must attempt to mitigate the risk of infection and achieve the best therapeutic results taking into account the patient's particular reality.
Collapse
Affiliation(s)
- Adrián Castillo-Allendes
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | | | - Lady Catherine Cantor-Cutiva
- Department of Collective Health, Universidad Nacional de Colombia, Bogotá, Colombia; Program of Speech and Language Pathology, Universidad Manuela Beltrán, Bogotá, Colombia
| | - Juliana Codino
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan; Lakeshore Professional Voice Center, Lakeshore Ear, Nose, and Throat Center, St. Clair Shores, Michigan
| | - Marco Guzman
- Universidad de los Andes, Chile, Santiago, Chile
| | - Celina Malebran
- Escuela de Fonoaudiología, Universidad Católica Silva Henríquez, Santiago, Chile
| | - Carlos Manzano
- Hospital Médica Sur, Ciudad de México, México; Centro Médico ABC, Ciudad de México, México
| | - Axel Pavez
- Physical Medicine and Rehabilitation Service, Hospital de Urgencia Asistencia Pública. Santiago, Chile
| | - Thays Vaiano
- CEV - Centro de Estudos da Voz, São Paulo, Brazil; Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| | - Fabiana Wilder
- Carrera de Fonoaudiología, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina; Servicio de Fonoudiología, Hospital de Clínicas "José de San Martin", Buenos Aires, Argentina
| | - Mara Behlau
- CEV - Centro de Estudos da Voz, São Paulo, Brazil; Speech-Language Pathology and Audiology Department, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, Brazil
| |
Collapse
|
23
|
Zhang C, Jepson K, Lohfink G, Arvaniti A. Comparing acoustic analyses of speech data collected remotely. J Acoust Soc Am 2021; 149:3910. [PMID: 34241427 PMCID: PMC8269758 DOI: 10.1121/10.0005132] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 05/11/2021] [Accepted: 05/12/2021] [Indexed: 06/01/2023]
Abstract
Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.
Collapse
Affiliation(s)
- Cong Zhang
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| | - Kathleen Jepson
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| | - Georg Lohfink
- School of European Culture and Languages, University of Kent, Canterbury, Kent, CT2 7NF, United Kingdom
| | - Amalia Arvaniti
- Faculty of Arts, Radboud University, Nijmegen, Gelderland, 6500 HD, The Netherlands
| |
Collapse
|
24
|
Yamada Y, Shinkawa K, Kobayashi M, Takagi H, Nemoto M, Nemoto K, Arai T. Using Speech Data From Interactions With a Voice Assistant to Predict the Risk of Future Accidents for Older Drivers: Prospective Cohort Study. J Med Internet Res 2021; 23:e27667. [PMID: 33830066 PMCID: PMC8063093 DOI: 10.2196/27667] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/08/2021] [Accepted: 03/15/2021] [Indexed: 01/27/2023] Open
Abstract
Background With the rapid growth of the older adult population worldwide, car accidents involving this population group have become an increasingly serious problem. Cognitive impairment, which is assessed using neuropsychological tests, has been reported as a risk factor for being involved in car accidents; however, it remains unclear whether this risk can be predicted using daily behavior data. Objective The objective of this study was to investigate whether speech data that can be collected in everyday life can be used to predict the risk of an older driver being involved in a car accident. Methods At baseline, we collected (1) speech data during interactions with a voice assistant and (2) cognitive assessment data—neuropsychological tests (Mini-Mental State Examination, revised Wechsler immediate and delayed logical memory, Frontal Assessment Battery, trail making test-parts A and B, and Clock Drawing Test), Geriatric Depression Scale, magnetic resonance imaging, and demographics (age, sex, education)—from older adults. Approximately one-and-a-half years later, we followed up to collect information about their driving experiences (with respect to car accidents) using a questionnaire. We investigated the association between speech data and future accident risk using statistical analysis and machine learning models. Results We found that older drivers (n=60) with accident or near-accident experiences had statistically discernible differences in speech features that suggest cognitive impairment such as reduced speech rate (P=.048) and increased response time (P=.040). Moreover, the model that used speech features could predict future accident or near-accident experiences with 81.7% accuracy, which was 6.7% higher than that using cognitive assessment data, and could achieve up to 88.3% accuracy when the model used both types of data. Conclusions Our study provides the first empirical results that suggest analysis of speech data recorded during interactions with voice assistants could help predict future accident risk for older drivers by capturing subtle impairments in cognitive function.
Collapse
Affiliation(s)
| | | | | | | | - Miyuki Nemoto
- Department of Psychiatry, University of Tsukuba Hospital, Ibaraki, Japan
| | - Kiyotaka Nemoto
- Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Tetsuaki Arai
- Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| |
Collapse
|
25
|
Angelakis E, Andreopoulou A, Georgaki A. Multisensory biofeedback: Promoting the recessive somatosensory control in operatic singing pedagogy. Biomed Signal Process Control 2021; 66:102400. [DOI: 10.1016/j.bspc.2020.102400] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
26
|
Uloza V, Ulozaitė-Stanienė N, Petrauskas T, Kregždytė R. Accuracy of Acoustic Voice Quality Index Captured With a Smartphone - Measurements With Added Ambient Noise. J Voice 2021; 37:465.e19-465.e26. [PMID: 33676807 DOI: 10.1016/j.jvoice.2021.01.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To evaluate the accuracy of Acoustic Voice Quality Index (AVQI) measures obtained from voice recordings simultaneously using oral and smartphone microphones in a sound-proof room, and to compare them with AVQIs obtained from the same smartphone voice recordings with added ambient noise. METHODS A study group of 183 subjects with normal voices (n = 86) and various voice disorders (n = 97) was asked to read aloud a standard text and sustain the vowel /a/. The controlled ambient noise averaged at 29.61 dB SPL was added digitally to the smartphone voice recordings. Repeated measures analysis of variances (ANOVA) with Greenhouse-Geiser correction was used to evaluate AVQI changes within subjects. To evaluate the level of agreement between AVQI measurements obtained from different voice recordings Bland-Altman plots were used. RESULTS Repeated measures ANOVA showed that differences among AVQI results obtained from voice recordings done with oral studio microphone, recordings done with a smartphone microphone, and recordings done with a smartphone microphone with added ambient noise were not statistically significant (P = 0.07). No significant systemic differences and acceptable level of random errors in AVQI measurements of voice recordings made with oral and smartphone microphones (including added noise) were revealed. CONCLUSION The AVQI measures obtained from smartphone microphones voice recordings with experimentally added ambient noise revealed an acceptable agreement with results of oral microphone recordings, thus suggesting the suitability of smartphone microphone recordings performed even in the presence of acceptable ambient noise for estimation of AVQI.
Collapse
Affiliation(s)
- Virgilijus Uloza
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Nora Ulozaitė-Stanienė
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania.
| | - Tadas Petrauskas
- Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Rima Kregždytė
- Department of Preventive Medicine, Lithuanian University of Health Sciences, Kaunas, Lithuania
| |
Collapse
|
27
|
Tabatabaei SAH, Fischer P, Schneider H, Koehler U, Gross V, Sohrabi K. Methods for Adventitious Respiratory Sound Analyzing Applications Based on Smartphones: A Survey. IEEE Rev Biomed Eng 2021; 14:98-115. [PMID: 32746364 DOI: 10.1109/rbme.2020.3002970] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Detection and classification of adventitious acoustic lung sounds plays an important role in diagnosing, monitoring, controlling and, caring the patients with lung diseases. Such systems can be presented as different platforms like medical devices, standalone software or smartphone application. Ubiquity of smartphones and widespread use of the corresponding applications make such a device an attractive platform for hosting the detection and classification systems for adventitious lung sounds. In this paper, the smartphone-based systems for automatic detection and classification of the adventitious lung sounds are surveyed. Such adventitious sounds include cough, wheeze, crackle and, snore. Relevant sounds related to abnormal respiratory activities are considered as well. The methods are shortly described and the analyzing algorithms are explained. The analysis includes detection and/or classification of the sound events. A summary of the main surveyed methods together with the classification parameters and used features for the sake of comparison is given. Existing challenges, open issues and future trends will be discussed as well.
Collapse
|
28
|
Ditthapron A, O Agu E, C Lammert A. Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment. IEEE Open J Eng Med Biol 2021; 2:304-313. [PMID: 35402977 PMCID: PMC8940203 DOI: 10.1109/ojemb.2021.3063994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 02/24/2021] [Accepted: 03/01/2021] [Indexed: 12/03/2022] Open
Abstract
Goal: Smartphones can be used to passively assess and monitor patients’ speech impairments caused by ailments such as Parkinson’s disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer’s disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers’ speech in audio recordings with two or more speakers’ voices, is a crucial pre-processing step in such scenarios. Prior speech separation methods analyzed raw audio. However, in order to preserve speaker privacy, passively recorded smartphone audio and machine learning-based speech assessment are often performed on derived speech features such as Mel-Frequency Cepstral Coefficients (MFCCs). In this paper, we propose a novel Deep MFCC bAsed SpeaKer Separation (Deep-MASKS). Methods: Deep-MASKS uses an autoencoder to reconstruct MFCC components of an individual’s speech from an i-vector, x-vector or d-vector representation of their speech learned during the enrollment period. Deep-MASKS utilizes a Deep Neural Network (DNN) for MFCC signal reconstructions, which yields a more accurate, higher-order function compared to prior work that utilized a mask. Unlike prior work that operates on utterances, Deep-MASKS operates on continuous audio recordings. Results: Deep-MASKS outperforms baselines, reducing the Mean Squared Error (MSE) of MFCC reconstruction by up to 44% and the number of additional bits required to represent clean speech entropy by 36%.
Collapse
Affiliation(s)
- Apiwat Ditthapron
- Computer Science DepartmentWorcester Polytechnic Institute Worcester MA 01609 USA
| | - Emmanuel O Agu
- Computer Science DepartmentWorcester Polytechnic Institute Worcester MA 01609 USA
| | - Adam C Lammert
- Biomedical Engineering DepartmentWorcester Polytechnic Institute Worcester MA 01609 USA
| |
Collapse
|
29
|
Robin J, Harrison JE, Kaufman LD, Rudzicz F, Simpson W, Yancheva M. Evaluation of Speech-Based Digital Biomarkers: Review and Recommendations. Digit Biomark 2020; 4:99-108. [PMID: 33251474 DOI: 10.1159/000510820] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 08/11/2020] [Indexed: 12/23/2022] Open
Abstract
Speech represents a promising novel biomarker by providing a window into brain health, as shown by its disruption in various neurological and psychiatric diseases. As with many novel digital biomarkers, however, rigorous evaluation is currently lacking and is required for these measures to be used effectively and safely. This paper outlines and provides examples from the literature of evaluation steps for speech-based digital biomarkers, based on the recent V3 framework (Goldsack et al., 2020). The V3 framework describes 3 components of evaluation for digital biomarkers: verification, analytical validation, and clinical validation. Verification includes assessing the quality of speech recordings and comparing the effects of hardware and recording conditions on the integrity of the recordings. Analytical validation includes checking the accuracy and reliability of data processing and computed measures, including understanding test-retest reliability, demographic variability, and comparing measures to reference standards. Clinical validity involves verifying the correspondence of a measure to clinical outcomes which can include diagnosis, disease progression, or response to treatment. For each of these sections, we provide recommendations for the types of evaluation necessary for speech-based biomarkers and review published examples. The examples in this paper focus on speech-based biomarkers, but they can be used as a template for digital biomarker development more generally.
Collapse
Affiliation(s)
| | - John E Harrison
- Metis Cognition Ltd., Park House, Kilmington Common, Warminster, United Kingdom.,Alzheimer Center, AUmc, Amsterdam, The Netherlands.,Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | | | - Frank Rudzicz
- Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - William Simpson
- Winterlight Labs, Toronto, Ontario, Canada.,Department of Psychiatry and Behavioural Neuroscience, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
30
|
Baldanzi C, Crispiatico V, Foresti S, Groppo E, Rovaris M, Cattaneo D, Vitali C. Effects of Intensive Voice Treatment (The Lee Silverman Voice Treatment [LSVT LOUD]) in Subjects With Multiple Sclerosis: A Pilot Study. J Voice 2020; 36:585.e1-585.e13. [PMID: 32819780 DOI: 10.1016/j.jvoice.2020.07.025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/21/2020] [Accepted: 07/21/2020] [Indexed: 11/16/2022]
Abstract
AIM The rehabilitation of voice disorders is an unmet need in multiple sclerosis (MS). The Lee Silverman Voice Treatment (LSVT LOUD) is a well-documented and effective speech treatment, developed to treat voice disorders in Parkinson Disease. The purpose of the present study was to examine the viability of applying the LSVT LOUD to individuals with MS and verify short- and long-term improvements in acoustic and perceptual voice parameters. METHODS A single subject design was performed in a consecutive sample of 8 subjects with MS. The subjects' voice was recorded with PRAAT software for 5 days at baseline during the 16 treatment sessions, and at follow-up (FU) 6/12 months later. PRAAT provided data on sustained /a/ (SPL/a/) voice intensity and maximum phonation time (MPT/a/) of sustained /a/, and on functional sentences voice intensity. In addition, self-assessment questionnaire Voice Handicap Index, the perceptual GIRBAS scale and intensity of monologue were collected at first day of baseline, post-treatment and at FU. In the treatment phase each subject received treatment according to LSVT LOUD protocol. Visual analysis calculated for daily acoustic variables was used to determine baseline stability and analyse changes following treatment. The Wilcoxon test was used to assess statistically significant differences between baseline and post treatment. RESULTS All participants completed the LSVT LOUD programme; one participant dropped out at FU. Improvements in acoustic analysis were found: SPL/a/ improved on average (± standard deviation) 11.64 ± 4.19 dB with 7 subjects showing statistically significant improvement (P < 0.05); MPT/a/ improved on average 1.2 ± 1.53seconds, while intensity of functional sentences improved on average 8.11 ± 3.46 dB with 4 and 5 subjects showed statistically significant improvement, respectively. Intensity of monologue improved 14.90 ± 3.33 dB. Acoustic values are maintained or increased at FU respect to baseline. All subjects improved perceptual ratings at Voice Handicap Index and results were maintained at FU. These changes were associated with improvements on five parameters on the GIRBAS scale at post-treatment, however no further improvement were observed at FU. CONCLUSION Intensive LSVT LOUD treatment is a viable approach to treat hypophonia in MS. LSVT LOUD improved both quantitative-instrumental and perceptive-subjective assessments. Randomised controlled trials are needed to provide a firm support on the effectiveness of LSVT LOUD in MS.
Collapse
Affiliation(s)
| | | | | | - Elisabetta Groppo
- Ospedale San Paolo - Azienda Socio-Sanitaria Territoriale (ASST), Milano, Italy
| | | | | | | |
Collapse
|
31
|
Petrizzo D, Popolo PS. Smartphone Use in Clinical Voice Recording and Acoustic Analysis: A Literature Review. J Voice 2020; 35:499.e23-499.e28. [PMID: 32736910 DOI: 10.1016/j.jvoice.2019.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 10/11/2019] [Accepted: 10/11/2019] [Indexed: 11/30/2022]
Abstract
OBJECTIVE With the increase of smartphone use and availability over the last decade, mobile healthcare applications have become more accessible. Many of these applications allow users to track behaviors and goals, and acquire feedback and information while on the go. Recent studies appearing in the literature suggest that smartphones may offer a means of augmenting clinical voice assessment by recording individuals with voice disorders outside the clinic for the purpose of extracting acoustic characteristics. This review examines the effectiveness of smartphones in clinical voice assessment and treatment, as reported in the current literature. METHODS The PubMed database was searched using a combination and variation of different term related to smartphones, voice, and recording apps, in order to find articles that address the role of smartphones in clinical voice recording and assessment. RESULTS AND CONCLUSION Six studies published in the last 3 years were reviewed and examined in terms of types of device and operating systems used, types of subjects and disorders studied, voice parameters extracted, and microphones used. Considerations such as impact of environmental noise, and privacy and security issues are also examined. While smartphones and mobile apps have the potential to be valuable tools in voice assessment outside the clinic, further efforts are needed for them to be effectively used in a clinical setting.
Collapse
Affiliation(s)
- Danielle Petrizzo
- Department of Communication Sciences and Disorders, Montclair State University, Montclair, New Jersey
| | | |
Collapse
|
32
|
Illner V, Sovka P, Rusz J. Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101831] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
33
|
Yamada Y, Shinkawa K, Shimmei K. Atypical Repetition in Daily Conversation on Different Days for Detecting Alzheimer Disease: Evaluation of Phone-Call Data From Regular Monitoring Service. JMIR Ment Health 2020; 7:e16790. [PMID: 31934870 PMCID: PMC6996758 DOI: 10.2196/16790] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 12/04/2019] [Accepted: 12/16/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Identifying signs of Alzheimer disease (AD) through longitudinal and passive monitoring techniques has become increasingly important. Previous studies have succeeded in quantifying language dysfunctions and identifying AD from speech data collected during neuropsychological tests. However, whether and how we can quantify language dysfunction in daily conversation remains unexplored. OBJECTIVE The objective of this study was to explore the linguistic features that can be used for differentiating AD patients from daily conversations. METHODS We analyzed daily conversational data of seniors with and without AD obtained from longitudinal follow-up in a regular monitoring service (from n=15 individuals including 2 AD patients at an average follow-up period of 16.1 months; 1032 conversational data items obtained during phone calls and approximately 221 person-hours). In addition to the standard linguistic features used in previous studies on connected speech data during neuropsychological tests, we extracted novel features related to atypical repetition of words and topics reported by previous observational and descriptive studies as one of the prominent characteristics in everyday conversations of AD patients. RESULTS When we compared the discriminative power for AD, we found that atypical repetition in two conversations on different days outperformed other linguistic features used in previous studies on speech data during neuropsychological tests. It was also a better indicator than atypical repetition in single conversations as well as that in two conversations separated by a specific number of conversations. CONCLUSIONS Our results show how linguistic features related to atypical repetition across days could be used for detecting AD from daily conversations in a passive manner by taking advantage of longitudinal data.
Collapse
Affiliation(s)
| | | | - Keita Shimmei
- IBM Research, Tokyo, Japan.,Poverty and Equity Global Practice, The World Bank, Washington, DC, United States
| |
Collapse
|
34
|
Ulozaite-Staniene N, Petrauskas T, Šaferis V, Uloza V. Exploring the feasibility of the combination of acoustic voice quality index and glottal function index for voice pathology screening. Eur Arch Otorhinolaryngol 2019; 276:1737-1745. [DOI: 10.1007/s00405-019-05433-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 04/12/2019] [Indexed: 11/25/2022]
|
35
|
Jannetts S, Schaeffler F, Beck J, Cowen S. Assessing voice health using smartphones: bias and random error of acoustic voice parameters captured by different smartphone types. Int J Lang Commun Disord 2019; 54:292-305. [PMID: 30779425 DOI: 10.1111/1460-6984.12457] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 10/31/2018] [Accepted: 01/16/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND Occupational voice problems constitute a serious public health issue with substantial financial and human consequences for society. Modern mobile technologies such as smartphones have the potential to enhance approaches to prevention and management of voice problems. This paper addresses an important aspect of smartphone-assisted voice care: the reliability of smartphone-based acoustic analysis for voice health state monitoring. AIM To assess the reliability of acoustic parameter extraction for a range of commonly used smartphones by comparison with studio recording equipment. METHODS & PROCEDURES Twenty-two vocally healthy speakers (12 female, 10 male) were recorded producing sustained vowels and connected speech under studio conditions using a high-quality studio microphone and an array of smartphones. For both types of utterance, Bland-Altman analysis was used to assess overall reliability for mean F0, cepstral peak prominence (CPPS), Jitter (RAP) and Shimmer %. OUTCOMES & RESULTS Analysis of the systematic and random error indicated significant bias for CPPS across both sustained vowels and passage reading. Analysis of the random error of the devices indicated that that mean F0 and CPPS showed acceptable random error size, while jitter and shimmer random error was judged as problematic. CONCLUSIONS & IMPLICATIONS Confidence in the feasibility of smartphone-based voice assessment is increased by the experimental finding of high levels of reliability for some clinically relevant acoustic parameters, while the use of other parameters is discouraged. We also challenge the practice of using statistical tests (e.g., t-tests) for measurement reliability assessment.
Collapse
Affiliation(s)
| | | | - Janet Beck
- CASL Research Centre, Queen Margaret University, Edinburgh, UK
| | - Steve Cowen
- CASL Research Centre, Queen Margaret University, Edinburgh, UK
| |
Collapse
|
36
|
Munnings AJ. The Current State and Future Possibilities of Mobile Phone "Voice Analyser" Applications, in Relation to Otorhinolaryngology. J Voice 2019; 34:527-532. [PMID: 30655018 DOI: 10.1016/j.jvoice.2018.12.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 12/21/2018] [Accepted: 12/26/2018] [Indexed: 10/27/2022]
Abstract
BACKGROUND A large proportion of the population suffers from voice disorders. The use of mobile phone technology in healthcare is increasing, and this includes applications that can analyze voice. OBJECTIVE This study aimed to review the potential for voice analyzer applications to aid the management of voice disorders. METHODS A literature search was conducted yielding eight studies which were further analyzed. RESULTS Seven out of the eight studies concluded that smartphone assessments were comparable to current techniques. Nevertheless there remained some common issues with using applications such as; voice parameters used; voice pathology tested; smartphone software consistency and microphone specifications. CONCLUSIONS It is clear that further developments are required before a mobile application can be used widely in voice analysis. However, promising results have been obtained thus far, and the benefits of mobile technology in this field, particularly in voice rehabilitation, warrant further research into its widespread implementation.
Collapse
|
37
|
Rusz J, Hlavnicka J, Tykalova T, Novotny M, Dusek P, Sonka K, Ruzicka E. Smartphone Allows Capture of Speech Abnormalities Associated With High Risk of Developing Parkinson’s Disease. IEEE Trans Neural Syst Rehabil Eng 2018; 26:1495-1507. [DOI: 10.1109/tnsre.2018.2851787] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
38
|
Lebacq J, Schoentgen J, Cantarella G, Bruss FT, Manfredi C, DeJonckere P. Maximal Ambient Noise Levels and Type of Voice Material Required for Valid Use of Smartphones in Clinical Voice Research. J Voice 2017; 31:550-556. [DOI: 10.1016/j.jvoice.2017.02.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 02/22/2017] [Accepted: 02/24/2017] [Indexed: 10/19/2022]
|