1
|
Shin J, Bae SM. Use of voice features from smartphones for monitoring depressive disorders: Scoping review. Digit Health 2024; 10:20552076241261920. [PMID: 38882248 PMCID: PMC11179519 DOI: 10.1177/20552076241261920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/29/2024] [Indexed: 06/18/2024] Open
Abstract
Object This review evaluates the use of smartphone-based voice data for predicting and monitoring depression. Methods A scoping review was conducted, examining 14 studies from Medline, Scopus, and Web of Science (2010-2023) on voice data collection methods and the use of voice features for minitoring depression. Results Voice data, especially prosodic features like fundamental frequency and pitch, show promise for predicting depression, though their sole predictive power requires further validation. Integrating voice with multimodal sensor data has been shown to improve accuracy significantly. Conclusion Smartphone-based voice monitoring offers a promising, noninvasive, and cost-effective approach to depression management. The integration of machine learning with sensor data could significantly enhance mental health monitoring, necessitating further research and longitudinal studies for validation.
Collapse
Affiliation(s)
- Jaeeun Shin
- Department of Psychology, Chung-Ang University, Seoul, Republic of Korea
| | - Sung Man Bae
- Department of Psychology and Psychotherapy, Dankook University, Cheonan, Republic of Korea
- Department of Psychology, Graduate School, Dankook University, Cheonan, Republic of Korea
| |
Collapse
|
2
|
Ceylan ME, Cangi ME, Yılmaz G, Peru BS, Yiğit Ö. Are smartphones and low-cost external microphones comparable for measuring time-domain acoustic parameters? Eur Arch Otorhinolaryngol 2023; 280:5433-5444. [PMID: 37584753 DOI: 10.1007/s00405-023-08179-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/05/2023] [Indexed: 08/17/2023]
Abstract
PURPOSE This study examined and compared the diagnostic accuracy and correlation levels of the acoustic parameters of the audio recordings obtained from smartphones on two operating systems and from dynamic and condenser types of external microphones. METHOD The study included 87 adults: 57 with voice disorder and 30 with a healthy voice. Each participant was asked to perform a sustained vowel phonation (/a/). The recordings were taken simultaneously using five microphones AKG-P220, Shure-SM58, Samson Go Mic, Apple iPhone 6, and Samsung Galaxy J7 Pro microphones in an acoustically insulated cabinet. Acoustic examinations were performed using Praat version 6.2.09. The data were examined using Pearson correlation and receiver-operating characteristic (ROC) analyses. RESULTS The parameters with the highest area under curve (AUC) values among all microphone recordings in the time-domain analyses were the frequency perturbation parameters. Additionally, considering the correlation coefficients obtained by synchronizing the microphones with each other and the AUC values together, the parameter with the highest correlation coefficient and diagnostic accuracy values was the jitter-local parameter. CONCLUSION Period-to-period perturbation parameters obtained from audio recordings made with smartphones show similar levels of diagnostic accuracy to external microphones used in clinical conditions.
Collapse
Affiliation(s)
- M Enes Ceylan
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - M Emrah Cangi
- University of Health Sciences, Speech and Language Therapy, Selimiye, Tıbbiye Cd No: 38, Istanbul, 34668, Üsküdar, Türkiye.
| | - Göksu Yılmaz
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Beyza Sena Peru
- Üsküdar University, Speech and Language Therapy, Istanbul, Türkiye
| | - Özgür Yiğit
- Istanbul Şişli Hamidiye Etfal Training and Research Hospital, Istanbul, Türkiye
| |
Collapse
|
3
|
Llico AF, Shanley SN, Friedman AD, Bamford LM, Roberts RM, McKenna VS. Comparison Between Custom Smartphone Acoustic Processing Algorithms and Praat in Healthy and Disordered Voices. J Voice 2023:S0892-1997(23)00241-2. [PMID: 37690854 DOI: 10.1016/j.jvoice.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/31/2023] [Accepted: 07/31/2023] [Indexed: 09/12/2023]
Abstract
OBJECTIVE The aim of this study was to understand the relationship between temporal and spectral-based acoustic measures derived using Praat and custom smartphone algorithms across patients with a wide range of vocal pathologies. METHODS Voice samples were collected from 56 adults (11 vocally healthy, 45 dysphonic, aged 18-80 years) performing three speech tasks: (a) sustained vowel, (b) maximum phonation, and (c) the second and third sentences of the Rainbow passage. Data were analyzed to extract mean fundamental frequency (fo), maximum phonation time (MPT), and cepstral peak prominence (CPP) using Praat and our custom smartphone algorithms. Linear regression models were calculated with and without outliers to determine relationships. RESULTS Statistically significant relationships were found between the smartphone algorithms and Praat for all three measures (r2 = 0.68-0.95, with outliers; r2 = 0.80-0.98, without outliers). An offset between CPP measures was found where Praat values were consistently lower than those computed by the smartphone app. Outlying data were identified and described, and findings indicated that speakers with high levels of clinician-perceived dysphonia resulted in smartphone algorithm errors. CONCLUSIONS These results suggest that the proposed algorithms can provide measurements comparable to clinically derived values. However, clinicians should take caution when analyzing severely dysphonic voices as the current algorithms show reduced accuracy for measures of mean fo and MPT for these voice types.
Collapse
Affiliation(s)
- Andres F Llico
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio
| | - Savannah N Shanley
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio
| | - Aaron D Friedman
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio
| | - Leigh M Bamford
- Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, Ohio
| | - Rachel M Roberts
- Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio
| | - Victoria S McKenna
- Department of Biomedical Engineering, University of Cincinnati, Cincinnati, Ohio; Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio; Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio.
| |
Collapse
|
4
|
Awan SN, Shaikh MA, Awan JA, Abdalla I, Lim KO, Misono S. Smartphone Recordings are Comparable to "Gold Standard" Recordings for Acoustic Measurements of Voice. J Voice 2023:S0892-1997(23)00031-0. [PMID: 37019804 PMCID: PMC10545813 DOI: 10.1016/j.jvoice.2023.01.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 04/07/2023]
Abstract
PURPOSE The purpose of this study was to assess the relationship and comparability of cepstral and spectral measures of voice obtained from a high-cost "flat" microphone and precision sound level meter (SLM) vs. high-end and entry level models of commonly and currently used smartphones (iPhone i12 and iSE; Samsung s21 and s9 smartphones). Device comparisons were also conducted in different settings (sound-treated booth vs. typical "quiet" office room) and at different mouth-to-microphone distances (15 and 30 cm). METHODS The SLM and smartphone devices were used to record a series of speech and vowel samples from a prerecorded diverse set of 24 speakers representing a wide range of sex, age, fundamental frequency (F0), and voice quality types. Recordings were analyzed for the following measures: smoothed cepstral peak prominence (CPP in dB); the low vs high spectral ratio (L/H Ratio in dB); and the Cepstral Spectral Index of Dysphonia (CSID). RESULTS A strong device effect was observed for L/H Ratio (dB) in both vowel and sentence contexts and for CSID in the sentence context. In contrast, device had a weak effect on CPP (dB), regardless of context. Recording distance was observed to have a small-to-moderate effect on measures of CPP and CSID but had a negligible effect on L/H Ratio. With the exception of L/H Ratio in the vowel context, setting was observed to have a strong effect on all three measures. While these aforementioned effects resulted in significant differences between measures obtained with SLM vs. smartphone devices, the intercorrelations of the measurements were extremely strong (r's > 0.90), indicating that all devices were able to capture the range of voice characteristics represented in the voice sample corpus. Regression modeling showed that acoustic measurements obtained from smartphone recordings could be successfully converted to comparable measurements obtained by a "gold standard" (precision SLM recordings conducted in a sound-treated booth at 15 cm) with small degrees of error. CONCLUSIONS These findings indicate that a variety of commonly available modern smartphones can be used to collect high quality voice recordings usable for informative acoustic analysis. While device, setting, and distance can have significant effects on acoustic measurements, these effects are predictable and can be accounted for using regression modeling.
Collapse
Affiliation(s)
- Shaheen N Awan
- University of South Florida, Dept. of Communication Sciences & Disorders, Tampa FL 33620.
| | - Mohsin Ahmed Shaikh
- Commonwealth University of Pennsylvania, Dept. of Communication Sciences & Disorders, Bloomsburg PA 17815
| | - Jordan A Awan
- Purdue University, Dept. of Statistics, Mathematical Sciences Building, 150 N. University Street, West Lafayette, IN 47907
| | - Ibrahim Abdalla
- University of Minnesota Medical School, 420 Delaware Street SE, Minneapolis, MN 55455
| | - Kelvin O Lim
- University of Minnesota Medical School, Dept. of Psychiatry and Behavioral Sciences, 420 Delaware Street SE, Minneapolis, MN 55455
| | - Stephanie Misono
- University of Minnesota Medical School, Division of Laryngology, Department of Otolaryngology, Head and Neck Surgery, 420 Delaware Street SE, Minneapolis, MN 55455
| |
Collapse
|
5
|
An iOS-based VoiceScreen application: feasibility for use in clinical settings-a pilot study. Eur Arch Otorhinolaryngol 2023; 280:277-284. [PMID: 35906420 PMCID: PMC9811036 DOI: 10.1007/s00405-022-07546-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/06/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVES To elaborate the application suitable for smartphones for estimation of Acoustic Voice Quality Index (AVQI) and evaluate its usability in the clinical setting. METHODS An elaborated AVQI automatization and background noise monitoring functions were implemented into a mobile "VoiceScreen" application running the iOS operating system. A study group consisted of 103 adult individuals with normal voices (n = 30) and 73 patients with pathological voices. Voice recordings were performed in the clinical setting with "VoiceScreen" app using iPhone 8 microphones. Voices of 30 patients were recorded before and 1 month after phonosurgical intervention. To evaluate the diagnostic accuracy differentiating normal and pathological voice, the receiver-operating characteristic statistics, i.e., area under the curve (AUC), sensitivity and specificity, and correct classification rate (CCR) were used. RESULTS A high level of precision of AVQI in discriminating between normal and dysphonic voices was yielded with corresponding AUC = 0.937. The AVQI cutoff score of 3.4 demonstrated a sensitivity of 86.3% and specificity of 95.6% with a CCR of 89.2%. The preoperative mean value of the AVQI [6.01(SD 2.39)] in the post-phonosurgical follow-up group decreased to 2.00 (SD 1.08). No statistically significant differences (p = 0.216) between AVQI measurements in a normal voice and 1-month follow-up after phonosurgery groups were revealed. CONCLUSIONS The "VoiceScreen" app represents an accurate and robust tool for voice quality measurement and demonstrates the potential to be used in clinical settings as a sensitive measure of voice changes across phonosurgical treatment outcomes.
Collapse
|
6
|
The Effects of Vocal Loading and Steam Inhalation on Acoustic, Aerodynamic and Vocal Tract Discomfort Measures in Adults. J Voice 2022. [DOI: 10.1016/j.jvoice.2022.09.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
7
|
Tonn P, Seule L, Degani Y, Herzinger S, Klein A, Schulze N. Evaluation of a Digital Content-free Speech Analysis Tool to Measure Affective Distress in Mental Health (Preprint). JMIR Form Res 2022; 6:e37061. [PMID: 36040767 PMCID: PMC9472064 DOI: 10.2196/37061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/08/2022] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Peter Tonn
- Neuropsychiatric Center of Hamburg, Hamburg, Germany
| | - Lea Seule
- Neuropsychiatric Center of Hamburg, Hamburg, Germany
| | | | | | | | - Nina Schulze
- Neuropsychiatric Center of Hamburg, Hamburg, Germany
| |
Collapse
|
8
|
Carrón J, Campos-Roca Y, Madruga M, Pérez CJ. A mobile-assisted voice condition analysis system for Parkinson's disease: assessment of usability conditions. Biomed Eng Online 2021; 20:114. [PMID: 34802448 PMCID: PMC8607631 DOI: 10.1186/s12938-021-00951-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 11/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND AND OBJECTIVE Automatic voice condition analysis systems to detect Parkinson's disease (PD) are generally based on speech data recorded under acoustically controlled conditions and professional supervision. The performance of these approaches in a free-living scenario is unknown. The aim of this research is to investigate the impact of uncontrolled conditions (realistic acoustic environment and lack of supervision) on the performance of automatic PD detection systems based on speech. METHODS A mobile-assisted voice condition analysis system is proposed to aid in the detection of PD using speech. The system is based on a server-client architecture. In the server, feature extraction and machine learning algorithms are designed and implemented to discriminate subjects with PD from healthy ones. The Android app allows patients to submit phonations and physicians to check the complete record of every patient. Six different machine learning classifiers are applied to compare their performance on two different speech databases. One of them is an in-house database (UEX database), collected under professional supervision by using the same Android-based smartphone in the same room, whereas the other one is an age, sex and health-status balanced subset of mPower study for PD, which provides real-world data. By applying identical methodology, single-database experiments have been performed on each database, and also cross-database tests. Cross-validation has been applied to assess generalization performance and hypothesis tests have been used to report statistically significant differences. RESULTS In the single-database experiments, a best accuracy rate of 0.92 (AUC = 0.98) has been obtained on UEX database, while a considerably lower best accuracy rate of 0.71 (AUC = 0.76) has been achieved using the mPower-based database. The cross-database tests provided very degraded accuracy metrics. CONCLUSION The results clearly show the potential of the proposed system as an aid for general practitioners to conduct triage or an additional tool for neurologists to perform diagnosis. However, due to the performance degradation observed using data from mPower study, semi-controlled conditions are encouraged, i.e., voices recorded at home by the patients themselves following a strict recording protocol and control of the information about patients by the medical doctor at charge.
Collapse
Affiliation(s)
- Javier Carrón
- Departamento de Matemáticas, Universidad de Extremadura, Cáceres, Spain
| | - Yolanda Campos-Roca
- Departamento de Tecnología de los Computadores y las Comunicaciones, Universidad de Extremadura, Cáceres, Spain
| | - Mario Madruga
- Departamento de Matemáticas, Universidad de Extremadura, Cáceres, Spain
| | - Carlos J Pérez
- Departamento de Matemáticas, Universidad de Extremadura, Cáceres, Spain.
| |
Collapse
|
9
|
Gagliardi G, Kokkinakis D, Duñabeitia JA. Editorial: Digital Linguistic Biomarkers: Beyond Paper and Pencil Tests. Front Psychol 2021; 12:752238. [PMID: 34603170 PMCID: PMC8481582 DOI: 10.3389/fpsyg.2021.752238] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Affiliation(s)
- Gloria Gagliardi
- Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
| | - Dimitrios Kokkinakis
- Department of Swedish, Faculty of Humanities, University of Gothenburg, Gothenburg, Sweden.,AgeCap, The Centre for Ageing and Health, Gothenburg, Sweden
| | - Jon Andoni Duñabeitia
- Centro de Investigación Nebrija en Cognición, Nebrija University, Madrid, Spain.,UiT The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
10
|
Stasak B, Huang Z, Razavi S, Joachim D, Epps J. Automatic Detection of COVID-19 Based on Short-Duration Acoustic Smartphone Speech Analysis. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2021; 5:201-217. [PMID: 33723525 PMCID: PMC7948650 DOI: 10.1007/s41666-020-00090-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 11/11/2020] [Accepted: 12/03/2020] [Indexed: 12/16/2022]
Abstract
Currently, there is an increasing global need for COVID-19 screening to help reduce the rate of infection and at-risk patient workload at hospitals. Smartphone-based screening for COVID-19 along with other respiratory illnesses offers excellent potential due to its rapid-rollout remote platform, user convenience, symptom tracking, comparatively low cost, and prompt result processing timeframe. In particular, speech-based analysis embedded in smartphone app technology can measure physiological effects relevant to COVID-19 screening that are not yet digitally available at scale in the healthcare field. Using a selection of the Sonde Health COVID-19 2020 dataset, this study examines the speech of COVID-19-negative participants exhibiting mild and moderate COVID-19-like symptoms as well as that of COVID-19-positive participants with mild to moderate symptoms. Our study investigates the classification potential of acoustic features (e.g., glottal, prosodic, spectral) from short-duration speech segments (e.g., held vowel, pataka phrase, nasal phrase) for automatic COVID-19 classification using machine learning. Experimental results indicate that certain feature-task combinations can produce COVID-19 classification accuracy of up to 80% as compared with using the all-acoustic feature baseline (68%). Further, with brute-forced n-best feature selection and speech task fusion, automatic COVID-19 classification accuracy of upwards of 82-86% was achieved, depending on whether the COVID-19-negative participant had mild or moderate COVID-19-like symptom severity.
Collapse
Affiliation(s)
- Brian Stasak
- School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW Australia
| | - Zhaocheng Huang
- School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW Australia
| | | | | | - Julien Epps
- School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, NSW Australia
| |
Collapse
|
11
|
Nemr K, Simões-Zenari M, de Almeida VC, Martins GA, Saito IT. COVID-19 and the teacher's voice: self-perception and contributions of speech therapy to voice and communication during the pandemic. Clinics (Sao Paulo) 2021; 76:e2641. [PMID: 33787658 PMCID: PMC7978665 DOI: 10.6061/clinics/2021/e2641] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/11/2021] [Indexed: 01/06/2023] Open
Abstract
OBJECTIVES We aimed to analyze the vocal self-perception of Brazilian teachers and their communication needs, vocal signs and symptoms, and voice-related lifestyles during the coronavirus disease (COVID-19) pandemic and, based on this information, to develop guidance materials intended for dissemination to these teachers and the general community. METHODS An online questionnaire designed for this survey was distributed via the researchers' networks and was available for completion by any teacher, except those who were not working at the time. There were 1,253 teachers from all over Brazil, of both sexes, covering a wide age range, working at different levels of education, and most with more than ten years of experience. Descriptive and inferential analyses of the data were performed. RESULTS On comparing the prepandemic period with the current one, participants indicated voice improvements. In contrast, they presented symptoms such as dry throat, effort in addressing remote classes, hoarseness after classes, and difficulties with the use of headphones, among others. They further indicated stress, general fatigue, impact of the pandemic on mental health, and the overlapping of many home tasks with professional tasks. Some smoked, and others hydrated insufficiently. CONCLUSION Although teachers generally noticed voice improvements during the pandemic, a proportion of them perceived worsening of voices. Many indicated several factors in which speech-language pathologists could guide them with the aim of improving performance and comfort during remote and hybrid classes, an initiative that will positively impact not only their voice and communication but also their quality of life.
Collapse
Affiliation(s)
- Katia Nemr
- Departamento de Fisioterapia, Fonoaudiologia e Terapia Ocupacional, Faculdade de Medicina FMUSP, Universidade de Sao Paulo, Sao Paulo, SP, BR
| | - Marcia Simões-Zenari
- Departamento de Fisioterapia, Fonoaudiologia e Terapia Ocupacional, Faculdade de Medicina FMUSP, Universidade de Sao Paulo, Sao Paulo, SP, BR
- *Corresponding author. E-mail:
| | - Vanessa Cássia de Almeida
- Departamento de Fisioterapia, Fonoaudiologia e Terapia Ocupacional, Faculdade de Medicina FMUSP, Universidade de Sao Paulo, Sao Paulo, SP, BR
| | - Glauciene Amaral Martins
- Departamento de Fisioterapia, Fonoaudiologia e Terapia Ocupacional, Faculdade de Medicina FMUSP, Universidade de Sao Paulo, Sao Paulo, SP, BR
| | - Isabele Tiemi Saito
- Departamento de Fisioterapia, Fonoaudiologia e Terapia Ocupacional, Faculdade de Medicina FMUSP, Universidade de Sao Paulo, Sao Paulo, SP, BR
| |
Collapse
|