1
|
Heller Murray ES, Chao A, Colletti L. A Practical Guide to Calculating Cepstral Peak Prominence in Praat. J Voice 2025; 39:365-370. [PMID: 36210224 DOI: 10.1016/j.jvoice.2022.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 11/05/2022]
Abstract
The acoustic measure of cepstral peak prominence (CPP) is recommended for the analysis of dysphonia. Yet, clinical use of this measure is not universal, as clinicians and researchers are still learning the strengths and limitations of this measure. Furthermore, affordable access to specialized acoustic software is a significant barrier to universal CPP use. This article will provide a guide on how to calculate CPP in Praat, a free software program, using a new CPP plugin. Important external factors that could influence CPP measures are discussed, and suggestions for clinical use are provided. As CPP becomes more widely used by clinicians and researchers, it is important to consider external factors that may inadvertently influence CPP values. Controlling for these external factors will aid in reducing variability across CPP values, which will make CPP a valuable tool for both clinical and research purposes.
Collapse
Affiliation(s)
- Elizabeth S Heller Murray
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania.
| | - Andie Chao
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania
| | - Lauren Colletti
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania
| |
Collapse
|
2
|
Fahed VS, Doheny EP, Busse M, Hoblyn J, Lowery MM. Comparison of Acoustic Voice Features Derived From Mobile Devices and Studio Microphone Recordings. J Voice 2025; 39:559.e1-559.e18. [PMID: 36379826 DOI: 10.1016/j.jvoice.2022.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/10/2022] [Accepted: 10/10/2022] [Indexed: 11/14/2022]
Abstract
OBJECTIVES/HYPOTHESIS Improvements in mobile device technology offer new opportunities for remote monitoring of voice for home and clinical assessment. However, there is a need to establish equivalence between features derived from signals recorded from mobile devices and gold standard microphone-preamplifiers. In this study acoustic voice features from android smartphone, tablet, and microphone-preamplifier recordings were compared. METHODS Data were recorded from 37 volunteers (20 female) with no history of speech disorder and six volunteers with Huntington's disease (HD) during sustained vowel (SV) phonation, reading passage (RP), and five syllable repetition (SR) tasks. The following features were estimated: fundamental frequency median and standard deviation (F0 and SD F0), harmonics-to-noise ratio (HNR), local jitter, relative average perturbation of jitter (RAP), five-point period perturbation quotient (PPQ5), difference of differences of amplitude and periods (DDA and DDP), shimmer, and amplitude perturbation quotients (APQ3, APQ5, and APQ11). RESULTS Bland-Altman analysis revealed good agreement between microphone and mobile devices for fundamental frequency, jitter, RAP, PPQ5, and DDP during all tasks and a bias for HNR, shimmer and its variants (APQ3, APQ5, APQ11, and DDA). Significant differences were observed between devices for HNR, shimmer, and its variants for all tasks. High correlation was observed between devices for all features, except SD F0 for RP. Similar results were observed in the HD group for SV and SR task. Biological sex had a significant effect on F0 and HNR during all tests, and for jitter, RAP, PPQ5, DDP, and shimmer for RP and SR. No significant effect of age was observed. CONCLUSIONS Mobile devices provided good agreement with state of the art, high-quality microphones during structured speech tasks for features derived from frequency components of the audio recordings. Caution should be taken when estimating HNR, shimmer and its variants from recordings made with mobile devices.
Collapse
Affiliation(s)
- Vitória S Fahed
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland.
| | - Emer P Doheny
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| | - Monica Busse
- Centre for Trials Research, Cardiff University, Cardiff, UK
| | - Jennifer Hoblyn
- School of Medicine, Trinity College Dublin, Dublin, Ireland; Bloomfield Health Services, Dublin, Ireland
| | - Madeleine M Lowery
- School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland; Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|
3
|
Paulmann S, Weinstein N. Motivating tones to enhance education: The effects of vocal awareness on teachers' voices. BRITISH JOURNAL OF EDUCATIONAL PSYCHOLOGY 2025. [PMID: 39797652 DOI: 10.1111/bjep.12737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 12/24/2024] [Indexed: 01/13/2025]
Abstract
BACKGROUND Effective classroom communication is key to shaping the learning environment and inspiring student engagement. And, it's not just what is said, but how it's said, that influences students. Yet, few (current or future) teachers receive education on vocal pedagogy. AIMS This study examined the impact of raising vocal awareness in teachers on their voice production through delivering a voice training program. METHOD Specifically, we explored how primary school teacher trainees produced motivational (either soft, warm, and encouraging, or harsh, pressuring, and controlling) and neutral communications before and after the delivery of a voice education program that concentrated on raising voice awareness, vocal anatomy, exercise techniques (e.g. breath control, voice modulation), and voice care. HYPOTHESES We hypothesised that trainees' voice production would change over the course of the program and lead to more 'prototypical' displays of motivational prosody (e.g. softly spoken encouraging intentions vs. harshly spoken controlling intentions). RESULTS Results indicated a noticeable difference when communicating motivational intentions between pre- and post-training voice samples: post training, trainees spoke more slowly and with reduced vocal effort irrespective of motivational intention, suggesting that raising vocal awareness can alter classroom communications. CONCLUSION The results underscore the importance of vocal awareness training to create a supportive and autonomy-enhancing learning environment.
Collapse
|
4
|
Sforza E, Calà F, Manfredi C, Lanatà A, Guala A, Danesino C, Cistaro A, Mazzocca M, D'Alatri L, Onesimo R, Frassineti L, Zampino G. From phenotype to phonotype: a comprehensive description of voice features of Cri du chat syndrome. Eur J Pediatr 2024; 184:60. [PMID: 39627468 DOI: 10.1007/s00431-024-05828-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 07/23/2024] [Accepted: 11/01/2024] [Indexed: 01/03/2025]
Abstract
Genetic syndromes have been studied by extensive research allowing a better definition of their clinical manifestations, natural history, and etiopathogenetic mechanisms. Nevertheless, some relevant, but still unexplored aspects of these multisystemic conditions need to be clarified. One of these aspects is the characterization of the vocal production, especially in some genetic syndromes in which the distinctive voice is the hallmark of the syndrome (e.g., Cri du chat syndrome, CdCS). The aim of this study is to provide a detailed description of phonotype of patients affected by CdCS. We prospectively recorded and analysed acoustical features of three corner vowels [a], [i], and [u] and number listing from 1 to 10 of 29 patients with molecularly confirmed CdCS (age range 4-21 years; mean 11 ± 6; median 10 years). For perceptual analysis, the GIRBAS scale was completed. The acoustical analysis was performed through BioVoice software. When stratified by age and gender, in the older men subgroup the grade, roughness, and asthenia mean values are the highest for each vowel, when compared with values of the same parameters obtained in the other subgroups. Statistical analysis highlighted 26 significant differences: 38% (10) concern the sustained phonation of /a/, 27% (7) are related to /i/ whereas 19% (5) to /u/. Ratio1, Ratio2, VSA, and FCR were also significant. Conclusion: The voice production not only conveys linguistic and paralinguistic information but also can give information regarding the speaker's biological and clinical characteristics.
Collapse
Affiliation(s)
- Elisabetta Sforza
- Università Cattolica del Sacro Cuore, Rome, 00168, Italy
- A.B.C. Associazione Bambini Cri du chat Scientific Committee, Firenze, Italy
| | - Federico Calà
- Department of Information Engineering, University of Florence, Florence, 50139, Italy
| | - Claudia Manfredi
- Department of Information Engineering, University of Florence, Florence, 50139, Italy
| | - Antonio Lanatà
- Department of Information Engineering, University of Florence, Florence, 50139, Italy
| | - Andrea Guala
- A.B.C. Associazione Bambini Cri du chat Scientific Committee, Firenze, Italy
- Department of Pediatrics, Castelli Hospital, Verbania, Italy
| | - Cesare Danesino
- A.B.C. Associazione Bambini Cri du chat Scientific Committee, Firenze, Italy
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Angelina Cistaro
- A.B.C. Associazione Bambini Cri du chat Scientific Committee, Firenze, Italy
- Nuclear Medicine Department, Salus Alliance Medical, Genoa, Italy
| | - Matelda Mazzocca
- A.B.C. Associazione Bambini Cri du chat Scientific Committee, Firenze, Italy
| | - Lucia D'Alatri
- Unit for Ear, Nose and Throat Medicine, Department of Neuroscience, Sensory Organs and Chest, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, 00168, Italy
| | - Roberta Onesimo
- Center for Rare Diseases and Birth Defects, Department of Woman and Child Health and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, 00168, Italy
| | - Lorenzo Frassineti
- Department of Information Engineering, University of Florence, Florence, 50139, Italy
| | - Giuseppe Zampino
- Università Cattolica del Sacro Cuore, Rome, 00168, Italy.
- Center for Rare Diseases and Birth Defects, Department of Woman and Child Health and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, 00168, Italy.
| |
Collapse
|
5
|
Li R, Huang G, Wang X, Lawler K, Goldberg LR, Roccati E, St George RJ, Aiyede M, King AE, Bindoff AD, Vickers JC, Bai Q, Alty J. Smartphone automated motor and speech analysis for early detection of Alzheimer's disease and Parkinson's disease: Validation of TapTalk across 20 different devices. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2024; 16:e70025. [PMID: 39445342 PMCID: PMC11496774 DOI: 10.1002/dad2.70025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 09/17/2024] [Accepted: 09/23/2024] [Indexed: 10/25/2024]
Abstract
INTRODUCTION Smartphones are proving useful in assessing movement and speech function in Alzheimer's disease and other neurodegenerative conditions. Valid outcomes across different smartphones are needed before population-level tests are deployed. This study introduces the TapTalk protocol, a novel app designed to capture hand and speech function and validate it in smartphones against gold-standard measures. METHODS Twenty different smartphones collected video data from motor tests and audio data from speech tests. Features were extracted using Google Mediapipe (movement) and Python audio analysis packages (speech). Electromagnetic sensors (60 Hz) and a microphone acquired simultaneous movement and voice data, respectively. RESULTS TapTalk video and audio outcomes were comparable to gold-standard data: 90.3% of video, and 98.3% of audio, data recorded tapping/speech frequencies within ± 1 Hz of the gold-standard measures. DISCUSSION Validation of TapTalk across a range of devices is an important step in the development of smartphone-based telemedicine and was achieved in this study. Highlights TapTalk evaluates hand motor and speech functions across a wide range of smartphones.Data showed 90.3% motor and 98.3% speech accuracy within +/-1 Hz of gold standards.Validation advances smartphone-based telemedicine for neurodegenerative diseases.
Collapse
Affiliation(s)
- Renjie Li
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
- School of ICTUniversity of TasmaniaHobartTasmaniaAustralia
| | - Guan Huang
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Xinyi Wang
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Katherine Lawler
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
- School of Allied HealthHuman Services and SportLa Trobe UniversityMelbourneVictoriaAustralia
| | - Lynette R. Goldberg
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Eddy Roccati
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | | | - Mimieveshiofuo Aiyede
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Anna E. King
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Aidan D. Bindoff
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - James C. Vickers
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Quan Bai
- School of ICTUniversity of TasmaniaHobartTasmaniaAustralia
| | - Jane Alty
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
- School of MedicineUniversity of TasmaniaHobartTasmaniaAustralia
- Neurology DepartmentRoyal Hobart HospitalHobartTasmaniaAustralia
| |
Collapse
|
6
|
Awan SN, Bahr R, Watts S, Boyer M, Budinsky R, Bensoussan Y. Evidence-Based Recommendations for Tablet Recordings From the Bridge2AI-Voice Acoustic Experiments. J Voice 2024:S0892-1997(24)00283-2. [PMID: 39306498 PMCID: PMC11922786 DOI: 10.1016/j.jvoice.2024.08.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 03/21/2025]
Abstract
BACKGROUND As part of a larger goal to create best practices for voice data collection to fuel voice artificial intelligence (AI) research, the objective of this study was to investigate the ability of readily available iOS and Android tablets with and without low-cost headset microphones to produce recordings and subsequent acoustic measures of voice comparable to "research quality" instrumentation. METHODS Recordings of 24 sustained vowel samples representing a wide range of typical and disordered voices were played via a head-and-torso model and recorded using a research quality standard microphone/preamplifier/audio interface. Acoustic measurements from the standard were compared with two popular tablets using their built-in microphones and with low-cost headset microphones at different distances from the mouth. RESULTS Voice measurements obtained via tablets + headset microphones close to the mouth (2.5 and 5 cm) strongly correlated (r's > 0.90) with the research standard and resulted in no significant differences for measures of vocal frequency and perturbation. In contrast, voice measurements obtained using the tablets' built-in microphones at typical reading distances (30 and 45 cm) tended to show substantial variability in measurement, greater mean differences in voice measurements, and relatively poorer correlations vs the standard. CONCLUSION Findings from this study support preliminary recommendations from the Bridge2AI-Voice Consortium recommending the use of smartphones paired with low-cost headset microphones as adequate methods of recording for large-scale voice data collection from a variety of clinical and nonclinical settings. Compared with recording using a tablet direct, a headset microphone controls for recording distance and reduces the effects of background noise, resulting in decreased variability in recording quality. DATA AVAILABILITY Data supporting the results reported in this article may be obtained upon request from the contact author.
Collapse
Affiliation(s)
- Shaheen N Awan
- School of Communication Sciences & Disorders, University of Central Florida, Orlando, Florida.
| | - Ruth Bahr
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, Florida
| | - Stephanie Watts
- Department of Otolaryngology - Head & Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida
| | - Micah Boyer
- Department of Otolaryngology - Head & Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida
| | - Robert Budinsky
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, Florida
| | - Yael Bensoussan
- Department of Otolaryngology - Head & Neck Surgery, University of South Florida Morsani College of Medicine, Tampa, Florida
| |
Collapse
|
7
|
Heller Murray E. Conducting high-quality and reliable acoustic analysis: A tutorial focused on training research assistants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2603-2611. [PMID: 38629881 PMCID: PMC11026110 DOI: 10.1121/10.0025536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/19/2024]
Abstract
Open science practices have led to an increase in available speech datasets for researchers interested in acoustic analysis. Accurate evaluation of these databases frequently requires manual or semi-automated analysis. The time-intensive nature of these analyses makes them ideally suited for research assistants in laboratories focused on speech and voice production. However, the completion of high-quality, consistent, and reliable analyses requires clear rules and guidelines for all research assistants to follow. This tutorial will provide information on training and mentoring research assistants to complete these analyses, covering areas including RA training, ongoing data analysis monitoring, and documentation needed for reliable and re-creatable findings.
Collapse
Affiliation(s)
- Elizabeth Heller Murray
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
8
|
Di Cesare MG, Perpetuini D, Cardone D, Merla A. Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:1499. [PMID: 38475034 DOI: 10.3390/s24051499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/22/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024]
Abstract
Parkinson's disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King's College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life.
Collapse
Affiliation(s)
| | - David Perpetuini
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| | - Daniela Cardone
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| | - Arcangelo Merla
- Department of Engineering and Geology, University G. D'Annunzio of Chieti-Pescara, 65127 Pescara, Italy
| |
Collapse
|
9
|
Schneider SL, Habich L, Weston ZM, Rosen CA. Observations and Considerations for Implementing Remote Acoustic Voice Recording and Analysis in Clinical Practice. J Voice 2024; 38:69-76. [PMID: 34366193 DOI: 10.1016/j.jvoice.2021.06.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/12/2021] [Accepted: 06/15/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVES/HYPOTHESIS Remote voice recording and acoustic analysis allow for comprehensive voice assessment and outcome tracking without the requirements of travel to the clinic, in-person visit, or expensive equipment. This paper delineates the process and considerations for implementing remote voice recording and acoustic analysis in a high-volume university voice clinic. STUDY DESIGN Clinical Focus. METHODS Acoustic voice recordings were attempted on 108 unique patients over a 6-month period using a remote voice recording phone application. Development of the clinical process including determining normative data in which to compare acoustic results, clinician training, and clinical application is described. The treating Speech Language Pathologists (SLPs) were surveyed 2 months after implementation to assess ease of application, identify challenges and assess implementation of potential solutions. RESULTS Of 108 unique patients, 83 patients were successful in completing the process of synchronous remote acoustic voice recording in conjunction with their SLP clinician. The process of downloading the application, setting up, and obtaining voice recordings was most commonly 10-20 minutes according to the 8 SLPs surveyed. Challenges and helpful techniques were identified. CONCLUSIONS Remote acoustic voice recordings have allowed SLPs to continue to complete a comprehensive voice evaluation in a telepractice model. Given emerging knowledge about the viability of remote voice recordings, the success in obtaining acoustic data remotely, and the accessibility of a low-cost app for SLPs makes remote voice recordings a viable option to facilitate remote clinical care and research investigation.
Collapse
Affiliation(s)
- Sarah L Schneider
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California.
| | - Laura Habich
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California
| | - Zoe M Weston
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California
| | - Clark A Rosen
- UCSF Voice and Swallowing Center, Department of Otolaryngology Head and Neck Surgery, University of California San Francisco, San Francisco, California
| |
Collapse
|
10
|
Shembel AC, Lee J, Sacher JR, Johnson AM. Characterization of Primary Muscle Tension Dysphonia Using Acoustic and Aerodynamic Voice Metrics. J Voice 2023; 37:897-906. [PMID: 34281751 PMCID: PMC9762233 DOI: 10.1016/j.jvoice.2021.05.019] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/22/2021] [Accepted: 05/25/2021] [Indexed: 01/18/2023]
Abstract
OBJECTIVES/HYPOTHESIS The objectives of this study were to (1) identify optimal clusters of 15 standard acoustic and aerodynamic voice metrics recommended by the American Speech-Language-Hearing Association (ASHA) to improve characterization of patients with primary muscle tension dysphonia (pMTD) and (2) identify combinations of these 15 metrics that could differentiate pMTD from other types of voice disorders. STUDY DESIGN Retrospective multiparametric METHODS: Random forest modeling, independent t-tests, logistic regression, and affinity propagation clustering were implemented on a retrospective dataset of 15 acoustic and aerodynamic metrics. RESULTS Ten percent of patients seen at the New York University (NYU) Voice Center over two years met the study criteria for pMTD (92 out of 983 patients), with 65 patients with pMTD and 701 of non-pMTD patients with complete data across all 15 acoustic and aerodynamic voice metrics. PCA plots and affinity propagation clustering demonstrated substantial overlap between the two groups on these parameters. The highest ranked parameters by level of importance with random forest models-(1) mean airflow during voicing (L/sec), (2) mean SPL during voicing (dB), (3) mean peak air pressure (cmH2O), (4) highest F0 (Hz), and (5) CPP mean vowel (dB)-accounted for only 65% of variance. T-tests showed three of these parameters-(1) CPP mean vowel (dB), (2) highest F0 (Hz), and (3) mean peak air pressure (cmH2O)-were statistically significant; however, the log2-fold change for each parameter was minimal. CONCLUSION Computational models and multivariate statistical testing on 15 acoustic and aerodynamic voice metrics were unable to adequately characterize pMTD and determine differences between the two groups (pMTD and non-pMTD). Further validation of these metrics is needed with voice elicitation tasks that target physiological challenges to the vocal system from baseline vocal acoustic and aerodynamic ouput. Future work should also place greater focus on validating metrics of physiological correlates (eg, neuromuscular processes, laryngeal-respiratory kinematics) across the vocal subsystems over traditional vocal output measures (eg, acoustics, aerodynamics) for patients with pMTD. LEVEL OF EVIDENCE II.
Collapse
Affiliation(s)
- Adrianna C Shembel
- Department of Speech, Language, and Hearing, University of Texas at Dallas, Dallas, Texas; Department of Otolaryngology-Head and Neck Surgery, University of Texas at Southwestern Medical Center, Dallas, Texas; Department of Otolaryngology-Head and Neck Surgery, New York University School of Medicine, New York, New York.
| | - Jeon Lee
- Lyda Hill Department of Bioinformatics, University of Texas at Southwestern, Dallas, Texas
| | - Joshua R Sacher
- Center for the Development of Therapeutics, Broad Institute, Cambridge, Massachusetts
| | - Aaron M Johnson
- Department of Otolaryngology-Head and Neck Surgery, New York University School of Medicine, New York, New York
| |
Collapse
|
11
|
Rameau A, Cox SR, Sussman SH, Odigie E. Addressing disparities in speech-language pathology and laryngology services with telehealth. JOURNAL OF COMMUNICATION DISORDERS 2023; 105:106349. [PMID: 37321106 PMCID: PMC10239150 DOI: 10.1016/j.jcomdis.2023.106349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 05/10/2023] [Accepted: 06/02/2023] [Indexed: 06/17/2023]
Abstract
The COVID-19 pandemic disproportionately affected the health and well-being of marginalized communities, and it brought greater awareness to disparities in health care access and utilization. Addressing these disparities is difficult because of their multidimensional nature. Predisposing factors (demographic information, social structure, and beliefs), enabling factors (family and community) and illness levels (perceived and evaluated illness) are thought to jointly contribute to such disparities. Research has demonstrated that disparities in access and utilization of speech-language pathology and laryngology services are the result of racial and ethnic differences, geographic factors, sex, gender, educational background, income level and insurance status. For example, persons from diverse racial and ethnic backgrounds have been found to be less likely to attend or adhere to voice rehabilitation, and they are more likely to delay health care due to language barriers, longer wait times, a lack of transportation and difficulties contacting their physician. The purpose of this paper is to summarize existing research on telehealth, discuss how telehealth offers the potential to eliminate some disparities in the access and utilization of voice care, review its limitations, and encourage continued research in this area. A clinical perspective from a large volume laryngology clinic in a major city in northeastern United States highlights the use of telehealth in the provision of voice care by a laryngologist and speech-language pathologist during and after the COVID19 pandemic.
Collapse
Affiliation(s)
- Anaïs Rameau
- Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, NY, United States of America.
| | - Steven R Cox
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY, United States of America
| | - Scott H Sussman
- Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, NY, United States of America
| | - Eseosa Odigie
- Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, NY, United States of America
| |
Collapse
|
12
|
van der Woerd B, Chen Z, Flemotomos N, Oljaca M, Sund LT, Narayanan S, Johns MM. A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity. J Voice 2023:S0892-1997(23)00179-0. [PMID: 37429808 DOI: 10.1016/j.jvoice.2023.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/07/2023] [Accepted: 06/07/2023] [Indexed: 07/12/2023]
Abstract
OBJECTIVES Auditory-perceptual assessments are the gold standard for assessing voice quality. This project aims to develop a machine-learning model for measuring perceptual dysphonia severity of audio samples consistent with assessments by expert raters. METHODS The Perceptual Voice Qualities Database samples were used, including sustained vowel and Consensus Auditory-Perceptual Evaluation of Voice sentences, which were previously expertly rated on a 0-100 scale. The OpenSMILE (audEERING GmbH, Gilching, Germany) toolkit was used to extract acoustic (Mel-Frequency Cepstral Coefficient-based, n = 1428) and prosodic (n = 152) features, pitch onsets, and recording duration. We utilized a support vector machine and these features (n = 1582) for automated assessment of dysphonia severity. Recordings were separated into vowels (V) and sentences (S) and features were extracted separately from each. Final voice quality predictions were made by combining the features extracted from the individual components with the whole audio (WA) sample (three file sets: S, V, WA). RESULTS This algorithm has a high correlation (r = 0.847) with estimates of expert raters. The root mean square error was 13.36. Increasing signal complexity resulted in better estimation of dysphonia, whereby combining the features outperformed WA, S, and V sets individually. CONCLUSION A novel machine-learning algorithm was able to perform perceptual estimates of dysphonia severity using standardized audio samples on a 100-point scale. This was highly correlated to expert raters. This suggests that ML algorithms could offer an objective method for evaluating voice samples for dysphonia severity. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Benjamin van der Woerd
- Department of Surgery, Division of Otolaryngology-Head & Neck Surgery, McMaster University, Hamilton, Ontario, Canada.
| | - Zhuohao Chen
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California
| | - Nikolaos Flemotomos
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California
| | - Maria Oljaca
- Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Lauren Timmons Sund
- Department of Otolaryngology-Head & Neck Surgery, University of Southern California, Los Angeles, California
| | - Shrikanth Narayanan
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California; Department of Otolaryngology-Head & Neck Surgery, University of Southern California, Los Angeles, California
| | - Michael M Johns
- Department of Otolaryngology-Head & Neck Surgery, University of Southern California, Los Angeles, California
| |
Collapse
|
13
|
Weed E, Fusaroli R, Simmons E, Eigsti IM. Different in different ways: A network-analysis approach to voice and prosody in Autism Spectrum Disorder. LANGUAGE LEARNING AND DEVELOPMENT : THE OFFICIAL JOURNAL OF THE SOCIETY FOR LANGUAGE DEVELOPMENT 2023; 20:40-57. [PMID: 38486613 PMCID: PMC10936700 DOI: 10.1080/15475441.2023.2196528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
The current study investigated whether the difficulty in finding group differences in prosody between speakers with autism spectrum disorder (ASD) and neurotypical (NT) speakers might be explained by identifying different acoustic profiles of speakers which, while still perceived as atypical, might be characterized by different acoustic qualities. We modelled the speech from a selection of speakers (N = 26), with and without ASD, as a network of nodes defined by acoustic features. We used a community-detection algorithm to identify clusters of speakers who were acoustically similar and compared these clusters with atypicality ratings by naïve and expert human raters. Results identified three clusters: one primarily composed of speakers with ASD, one of mostly NT speakers, and one comprised of an even mixture of ASD and NT speakers. The human raters were highly reliable at distinguishing speakers with and without ASD, regardless of which cluster the speaker was in. These results suggest that community-detection methods using a network approach may complement commonly-employed human ratings to improve our understanding of the intonation profiles in ASD.
Collapse
Affiliation(s)
- Ethan Weed
- Linguistics, Cognitive Science, and Semiotics, Aarhus University, Aarhus, Denmark
| | - Riccardo Fusaroli
- Linguistics, Cognitive Science, and Semiotics, Aarhus University, Aarhus, Denmark
| | - Elizabeth Simmons
- Communication Disorders, Sacred Heart University, Fairfield, Connecticut, USA
| | - Inge-Marie Eigsti
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
14
|
Alfano LN, James MK, Ramdharry GM, Lowes LP. 266th ENMC International Workshop: Remote delivery of clinical care and validation of remote clinical outcome assessments in neuromuscular disorders: A response to COVID-19 and proactive planning for the future. Hoofddorp, The Netherlands, 1-3 April 2022. Neuromuscul Disord 2023; 33:339-348. [PMID: 36965197 DOI: 10.1016/j.nmd.2023.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 02/22/2023] [Indexed: 03/07/2023]
Affiliation(s)
- Lindsay N Alfano
- The Abigail Wexner Research Institute at Nationwide Children's Hospital, Center for Gene Therapy, Columbus, OH, United States; The Ohio State University College of Medicine, Department of Pediatrics, Columbus, OH, United States.
| | - Meredith K James
- The John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trust, Newcastle Upon Tyne, United Kingdom
| | - Gita M Ramdharry
- Queen Square Centre for Neuromuscular Diseases, National Hospital for Neurology and Neurosurgery, University College London Hospitals NHS Trust, London, United Kingdom; Department of Neuromuscular Diseases, UCL Institute of Neurology, London, United Kingdom
| | - Linda P Lowes
- The Abigail Wexner Research Institute at Nationwide Children's Hospital, Center for Gene Therapy, Columbus, OH, United States; The Ohio State University College of Medicine, Department of Pediatrics, Columbus, OH, United States
| |
Collapse
|
15
|
An iOS-based VoiceScreen application: feasibility for use in clinical settings-a pilot study. Eur Arch Otorhinolaryngol 2023; 280:277-284. [PMID: 35906420 PMCID: PMC9811036 DOI: 10.1007/s00405-022-07546-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/06/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVES To elaborate the application suitable for smartphones for estimation of Acoustic Voice Quality Index (AVQI) and evaluate its usability in the clinical setting. METHODS An elaborated AVQI automatization and background noise monitoring functions were implemented into a mobile "VoiceScreen" application running the iOS operating system. A study group consisted of 103 adult individuals with normal voices (n = 30) and 73 patients with pathological voices. Voice recordings were performed in the clinical setting with "VoiceScreen" app using iPhone 8 microphones. Voices of 30 patients were recorded before and 1 month after phonosurgical intervention. To evaluate the diagnostic accuracy differentiating normal and pathological voice, the receiver-operating characteristic statistics, i.e., area under the curve (AUC), sensitivity and specificity, and correct classification rate (CCR) were used. RESULTS A high level of precision of AVQI in discriminating between normal and dysphonic voices was yielded with corresponding AUC = 0.937. The AVQI cutoff score of 3.4 demonstrated a sensitivity of 86.3% and specificity of 95.6% with a CCR of 89.2%. The preoperative mean value of the AVQI [6.01(SD 2.39)] in the post-phonosurgical follow-up group decreased to 2.00 (SD 1.08). No statistically significant differences (p = 0.216) between AVQI measurements in a normal voice and 1-month follow-up after phonosurgery groups were revealed. CONCLUSIONS The "VoiceScreen" app represents an accurate and robust tool for voice quality measurement and demonstrates the potential to be used in clinical settings as a sensitive measure of voice changes across phonosurgical treatment outcomes.
Collapse
|
16
|
Compton EC, Cruz T, Andreassen M, Beveridge S, Bosch D, Randall DR, Livingstone D. Developing an Artificial Intelligence Tool to Predict Vocal Cord Pathology in Primary Care Settings. Laryngoscope 2022. [PMID: 36226791 DOI: 10.1002/lary.30432] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/16/2022] [Accepted: 09/09/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Diagnostic tools for voice disorders are lacking for primary care physicians. Artificial intelligence (AI) tools may add to the armamentarium for physicians, decreasing the time to diagnosis and limiting the burden of dysphonia. METHODS Voice recordings of patients were collected from 2019 to 2021 using smartphones. The Saarbruecken dataset was included for comparison. Audio files were converted to mel-spectrograms using TensorFlow. Diagnostic categories were created to group pathology, including neurological and muscular disorders, inflammatory, mass lesions, and normal. The samples were further separated into sustained/a/and the rainbow passage. RESULTS Two hundred three prospective samples and 1131 samples were used from the Saarbruecken database. The AI detected abnormal pathology with an F1-score of 98%. The artificial neural network (ANN) differentiated key pathologies, including unilateral paralysis, laryngitis, adductor spasmodic dysphonia (ADSD), mass lesions, and normal samples with 39%-87% F-1 scores. The Calgary database models had higher F-1 scores in a head-to-head comparison to the Saarbruecken and combined datasets (87% vs. 58% and 50%). The AI outperformed otolaryngologists using a standardized test set of recordings (83% compared to 55% ± 15%). CONCLUSION An AI tool was created to differentiate pathology by individual or categorical diagnosis with high evaluation metrics. Prospective data should be collected in a controlled fashion to reduce intrinsic variability between recordings. Multi-center data collaborations are imperative to increase the prediction capability of AI tools for detecting vocal cord pathology. We provide proof-of-concept for an AI tool to assist primary care physicians in managing dysphonic patients. LEVEL OF EVIDENCE 3 Laryngoscope, 2022.
Collapse
Affiliation(s)
- Evan C Compton
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Tim Cruz
- Department of Data Science and Analytics, Faculty of Science, University of Calgary, Calgary, Alberta, Canada
| | - Meri Andreassen
- Section of Otolaryngology-Head and Neck Surgery, Calgary Voice Program, Alberta Health Services, Calgary, Alberta, Canada
| | - Shari Beveridge
- Section of Otolaryngology-Head and Neck Surgery, Calgary Voice Program, Alberta Health Services, Calgary, Alberta, Canada
| | - Doug Bosch
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Derrick R Randall
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Devon Livingstone
- Section of Otolaryngology-Head and Neck Surgery, Department of Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
17
|
Pierce JL, Tanner K, Merrill RM, Shnowske L, Roy N. Acoustic Variability in the Healthy Female Voice Within and Across Days: How Much and Why? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3015-3031. [PMID: 34269598 DOI: 10.1044/2021_jslhr-21-00018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose The aims of this study were (1) to quantify variability in voice production (as measured acoustically) within and across consecutive days in vocally healthy female speakers, (2) to identify which acoustic measures are sensitive to this variability, and (3) to identify participant characteristics related to such voice variability. Method Participants included 45 young women with normal voices who were stratified by age, specifically 18-23, 24-29, and 30-35 years. Following an initial acoustic and auditory-perceptual voice assessment, participants performed standardized field voice recordings 3 times daily across a 7-day period. Acoustic analyses involved 32 cepstral-, spectral-, and time-based measures of connected speech and sustained vowels. Relationships among acoustic data and select demographic, health, and lifestyle (i.e., participant-based) factors were also examined. Results Significant time-of-day effects were observed for acoustic analyses within speakers (p < .05), with voices generally being worse in the morning. No significant differences were observed across consecutive days. Variations in voice production were associated with several participant factors, including improved voice with increased voice use; self-perceived poor voice function, minimal or no alcohol consumption, and extroverted personality; and worse voice with regular or current menstruation, depression, and anxiety. Conclusions This acoustic study provides essential information regarding the nature and extent to which healthy voices vary throughout the day and week. Participant-based factors that were associated with improved voice over time included increased voice use, self-perceived poor voice function, minimal or no alcohol consumption, and extroverted personality. Factors associated with worse voice production over time included regular or current menstruation, and depression and anxiety.
Collapse
Affiliation(s)
- Jenny L Pierce
- Department of Surgery, The University of Utah, Salt Lake City
- Department of Communication Sciences and Disorders, The University of Utah, Salt Lake City
| | - Kristine Tanner
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Ray M Merrill
- Department of Public Health, Brigham Young University, Provo, UT
| | - Lauren Shnowske
- Department of Communication Sciences and Disorders, The University of Utah, Salt Lake City
- Department of Communication Sciences and Disorders, University of Kentucky, Lexington
| | - Nelson Roy
- Department of Communication Sciences and Disorders, The University of Utah, Salt Lake City
| |
Collapse
|
18
|
Madruga M, Campos-Roca Y, Pérez CJ. Impact of noise on the performance of automatic systems for vocal fold lesions detection. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|