1
|
Stone TC, Erickson ML. Experienced and Inexperienced Listeners' Perception of Vocal Strain. J Voice 2024:S0892-1997(24)00024-9. [PMID: 38443265 DOI: 10.1016/j.jvoice.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/07/2024]
Abstract
OBJECTIVE The ability to perceive strain or tension in a voice is critical for both speech-language pathologists and singing teachers. Research on voice quality has focused primarily on the perception of breathiness or roughness. The perception of vocal strain has not been extensively researched and is poorly understood. METHODS/DESIGN This study employs a group and a within-subject design. Synthetic female sung stimuli were created that varied in source slope and vocal tract transfer function. Two groups of listeners, inexperienced listeners and experienced vocal pedagogues, listened to the stimuli and rated the perceived strain using a visual analog scale Synthetic female stimuli were constructed on the vowel /ɑ/ at 2 pitches, A3 and F5, using glottal source slopes that drop in amplitude at constant rates varying from - 6 dB/octave to - 18 dB/octave. All stimuli were filtered using three vocal tract transfer functions, one derived from a lyric/coloratura soprano, one derived from a mezzo-soprano, and a third that has resonance frequencies mid-way between the two. Listeners heard the stimuli over headphones and rated them on a scale from "no strain" to "very strained" using a visual-analog scale. RESULTS Spectral source slope was strongly related to the perception of strain in both groups of listeners. Experienced listeners' perception of strain was also related to formant pattern, while inexperienced listeners' perception of strain was also related to pitch. CONCLUSION This study has shown that spectral source slope can be a powerful cue to the perception of strain. However, inexperienced and experienced listeners also differ from each other in how strain is perceived across speaking and singing pitches. These differences may be based on both experience and the goals of the listener.
Collapse
Affiliation(s)
- Taylor Colton Stone
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee.
| | - Molly L Erickson
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee
| |
Collapse
|
2
|
Dragicevic DA, Dahl KL, Perkins Z, Abur D, Stepp CE. Effects of a Concurrent Working Memory Task on Speech Acoustics in Parkinson's Disease. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2024; 33:418-434. [PMID: 38081054 PMCID: PMC11001185 DOI: 10.1044/2023_ajslp-23-00214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/30/2023] [Accepted: 10/26/2023] [Indexed: 01/05/2024]
Abstract
PURPOSE The purpose of this study was to determine the effect of a concurrent working memory task on acoustic measures of speech in individuals with Parkinson's disease (PD). METHOD Individuals with PD and age- and sex-matched controls performed a speaking task with and without a Stroop-like concurrent working memory task. Cepstral peak prominence, low-to-high spectral energy ratio, fundamental frequency (fo) standard deviation, articulation rate, pause duration, articulatory-acoustic vowel space, relative fo, mean voice onset time (VOT), and VOT variability were calculated for each condition. Mixed-model analyses of variance were performed to determine the effects of group, condition (presence of the concurrent working memory task), and their interaction on the acoustic measures. RESULTS All measures except for VOT variability, mean pause duration, and relative fo offset differed between people with and without PD. Cepstral peak prominence, articulation rate, and relative fo offset differed as a function of condition. However, no measures indicated disparate effects of condition as a function of group. CONCLUSION Although differentially impactful on limb motor function in PD, here a concurrent working memory task was not found to be differentially disruptive to speech acoustics in PD. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.24759648.
Collapse
Affiliation(s)
| | - Kimberly L. Dahl
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Zoe Perkins
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Defne Abur
- Department of Speech, Language and Hearing Sciences, Boston University, MA
- Center for Language and Cognition Groningen, University of Groningen, the Netherlands
| | - Cara E. Stepp
- Department of Speech, Language and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology—Head and Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
3
|
Sauder CL, Kapsner-Smith MR, Simmons E, Meyer T, Doyle PC, Eadie TL. The Effect of Rating Method on Reliability of Judgments of Strain Across Populations. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2024; 33:393-405. [PMID: 38060689 PMCID: PMC11000812 DOI: 10.1044/2023_ajslp-23-00174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/17/2023] [Accepted: 10/17/2023] [Indexed: 01/05/2024]
Abstract
PURPOSE Variability in auditory-perceptual ratings of voice limits their utility, with the poorest reliability often noted for vocal strain. The purpose of this study was to determine whether an experimental method, called visual sort and rate (VSR), promoted stronger rater reliability than visual analog scale (VAS), for ratings of strain in two clinical populations: adductor laryngeal dystonia (ADLD) and vocal hyperfunction (VH). METHOD Connected speech samples from speakers with ADLD and VH as well as age- and sex-matched controls were selected from a database. Fifteen inexperienced listeners rated strain for two speaker sets (25 ADLD speakers and five controls; 25 VH speakers and five controls) across four rating blocks: VAS-ADLD, VSR-ADLD, VAS-VH, and VSR-VH. For the VAS task, listeners rated each speaker for strain using a vertically oriented 100-mm VAS. For the VSR task, stimuli were distributed into sets of samples with a range of severities in each set. Listeners sorted and ranked samples for strain within each set, and final ratings were captured on a vertically oriented 100-mm VAS. Intrarater reliability (Pearson's r) and interrater variability (mean of the squared differences between a listener's ratings and group mean ratings) were compared across rating methods and populations using two repeated-measures analyses of variance. RESULTS Intrarater reliability of strain was significantly stronger when listeners used VSR compared to VAS; listeners also showed significantly better intrarater reliability in ADLD than VH. Listeners demonstrated significantly less interrater variability (better reliability) when using VSR compared to VAS. No significant effect of population or interactions was found between listeners for measures of interrater variability. CONCLUSIONS VSR increases intrarater reliability for ratings of vocal strain in speakers with VH and ADLD. VSR decreases variability of auditory-perceptual judgments of strain between inexperienced listeners in these clinical populations. Future research should determine whether benefits of VSR extend to voice clinicians and/or clinical settings.
Collapse
Affiliation(s)
- Cara L. Sauder
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | | | - Emily Simmons
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Tanya Meyer
- Department of Otolaryngology—Head & Neck Surgery, University of Washington School of Medicine, Seattle
| | - Philip C. Doyle
- Division of Laryngology, Department of Otolaryngology—Head & Neck Surgery, Stanford University School of Medicine, CA
| | - Tanya L. Eadie
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Department of Otolaryngology—Head & Neck Surgery, University of Washington School of Medicine, Seattle
| |
Collapse
|
4
|
Cacace AT, Berri B. Blast Overpressures as a Military and Occupational Health Concern. Am J Audiol 2023; 32:779-792. [PMID: 37713532 DOI: 10.1044/2023_aja-23-00125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2023] Open
Abstract
PURPOSE This tutorial reviews effects of environmental stressors like blast overpressures and other well-known acoustic contaminants (continuous, intermittent, and impulsive noise) on hearing, tinnitus, vestibular, and balance-related functions. Based on the overall outcome of these effects, detailed consideration is given to the health and well-being of individuals. METHOD Because hearing loss and tinnitus are consequential in affecting quality of life, novel neuromodulation paradigms are reviewed for their positive abatement and treatment-related effects. Examples of clinical data, research strategies, and methodological approaches focus on repetitive transcranial magnetic stimulation (rTMS) and electrical stimulation of the vagus nerve paired with tones (VNSt) for their unique contributions to this area. RESULTS Acoustic toxicants transmitted through the atmosphere are noteworthy for their propensity to induce hearing loss and tinnitus. Mounting evidence also indicates that high-level rapid onset changes in atmospheric sound pressure can significantly impact vestibular and balance function. Indeed, the risk of falling secondary to loss of, or damage to, sensory receptor cells in otolith organs (utricle and saccule) is a primary reason for this concern. As part of the complexities involved in VNSt treatment strategies, vocal dysfunction may also manifest. In addition, evaluation of temporospatial gait parameters is worthy of consideration based on their ability to detect and monitor incipient neurological disease, cognitive decline, and mortality. CONCLUSION Highlighting these respective areas underscores the need to enhance information exchange among scientists, clinicians, and caregivers on the benefits and complications of these outcomes.
Collapse
Affiliation(s)
- Anthony T Cacace
- Department of Communication Sciences & Disorders, Wayne State University, Detroit, MI
| | - Batoul Berri
- Department of Communication Sciences & Disorders, Wayne State University, Detroit, MI
- Department of Otolaryngology, University of Michigan, Ann Arbor
| |
Collapse
|
5
|
Fujiki RB, Thibeault SL. Are Children with Cleft Palate at Increased Risk for Laryngeal Pathology? Cleft Palate Craniofac J 2023; 60:1385-1394. [PMID: 35912443 DOI: 10.1177/10556656221104027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
To determine the prevalence of laryngeal pathology in children presenting with cleft palate with or without cleft lip (CP ± L) who underwent nasoendoscopy to assess palatal function. A secondary aim was to determine the relationship between patient demographics, resonance, articulation, and prevalence of laryngeal pathology in this population. Retrospective, observational cohort study. Outpatient pediatric cranio-facial anomalies clinic. Children ≤18 years of age presenting with CP ± L (N = 215) who underwent nasoendoscopy, speech language pathology, plastic surgery, and otolaryngological evaluations between 2009 and 2020. Laryngeal diagnosis by pediatric otolaryngologists. 21.9% of children presented with laryngeal pathology. Diagnoses included benign vocal fold lesions and laryngeal edema sufficiently severe to alter vocal fold edge contour. Likelihood of laryngeal pathology increased by approximately 12% with every increase of 1 year in age (P = .001, OR = 1.12). Children with laryngeal pathology were 50% more likely to have undergone palatal repair (P < .001, OR = 1.50). In addition, children with severely hypernasal resonance were 78% less likely to present with laryngeal pathology (P =.046, OR = 0.22). This population is at increased risk for laryngeal pathologies as determined by nasoendoscopy. This finding underscores the importance of careful laryngeal imaging in assessing these children. Additional research is warranted to identify the mechanisms underlying the increased risk for morphological vocal fold changes.
Collapse
Affiliation(s)
| | - Susan L Thibeault
- Department of Surgery, University of Wisconsin Madison, Madison, WI, USA
| |
Collapse
|
6
|
Park Y, Baker Brehm S, Kelchner L, Weinrich B, McElfresh K, Anand S, Shrivastav R, de Alarcon A, Eddins DA. Effects of Vibratory Source on Auditory-Perceptual and Bio-Inspired Computational Measures of Pediatric Voice Quality. J Voice 2023:S0892-1997(23)00254-0. [PMID: 37739862 PMCID: PMC10950844 DOI: 10.1016/j.jvoice.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/24/2023]
Abstract
OBJECTIVE The vibratory source for voicing in children with dysphonia is classified into three categories including a glottal vibratory source (GVS) observed in those with vocal lesions or hyperfunction; supraglottal vibratory sources (SGVS) observed secondary to laryngeal airway injuries, malformations, or reconstruction surgeries; and a combination of both glottal and supraglottal vibratory sources called mixed vibratory source (MVS). This study evaluated the effects of vibratory source on three primary dimensions of voice quality (breathiness, roughness, and strain) in children with GVS, SGVS, and MVS using single-variable matching tasks and computational measures obtained from bio-inspired auditory models. METHODS A total of 44 dysphonic voice samples from children aged 4-11 years were selected. Seven listeners rated breathiness, roughness, and strain of 1000-ms /ɑ/ samples using single-variable matching tasks. Computational estimates of pitch strength, amplitude modulation filterbank output, and sharpness were obtained through custom-designed MATLAB algorithms. RESULTS Perceived roughness and strain were significantly higher in children with SGVS and MVS compared to children with GVS. Among the computational measures, only the modulation filterbank output resulted in significant differences among vibratory sources; a posthoc test revealed that children with SGVS had greater amplitude modulation than children with GVS, as expected from their rougher voice quality. CONCLUSIONS The results indicate that the output of an auditory amplitude modulation filterbank model may capture characteristics of SGVS that are strongly related to the rough voice quality.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando, Florida.
| | - Susan Baker Brehm
- Department of Speech Pathology and Audiology, Miami University, Oxford, Ohio; Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Lisa Kelchner
- Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio
| | - Barbara Weinrich
- Department of Speech Pathology and Audiology, Miami University, Oxford, Ohio; Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Kevin McElfresh
- Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University, Bloomington, Indiana
| | - Alessandro de Alarcon
- Pediatric Otolaryngology Head & Neck Surgery, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - David A Eddins
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando, Florida
| |
Collapse
|
7
|
Nguyen DD, Madill C. Auditory-perceptual Parameters as Predictors of Voice Acoustic Measures. J Voice 2023:S0892-1997(23)00088-7. [PMID: 37003863 DOI: 10.1016/j.jvoice.2023.02.030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 04/03/2023]
Abstract
BACKGROUND Much research has examined the relationship between perceptual and acoustic measures. However, little is known about the prediction values of perceptual measures on an acoustic parameter. AIMS This study utilized simulated and disordered voice samples to investigate the prediction values of breathiness, roughness, and strain ratings on the selection of some time-based and spectral-based measures of voice quality. METHOD This study retrospectively analysed two sets of precollected data. The experimental data had been collected from nine trained speakers manipulating false vocal fold activity, true vocal fold mass, and larynx height. The voice-disordered data had been extracted from a clinical database for 68 patients with muscle tension voice disorders (MTVD). Both data sets had been perceptually rated for breathiness, roughness, and strain. Voice samples (prolonged vowel /ɑ/ and Rainbow Passage readings) had undergone acoustic analysis using Praat for harmonics-to-noise ratio (HNR) and the program "Analysis of Dysphonia in Speech and Voice" (ADSV) for cepstral peak prominence (CPP), Cepstral/Spectral Index of Dysphonia (CSID), and Low/High spectral ratio (L/H ratio). Perceptual parameters were regressed against these acoustic measures to test their prediction values. RESULTS Reliability data showed satisfactory intra- and inter-reliability of perceptual ratings for both data sets. Breathiness significantly predicted CPP (both vocal tasks) and CSID (Rainbow Passage) in experimental data and predicted all the acoustic measures in MTVD data. Roughness significantly predicted HNR, CPP, and CSID in experimental data, and CPP (Rainbow Passage) and CSID (both vocal tasks) in MTVD data. Strain (both vocal tasks) significantly predicted L/H ratio in both data sets. CONCLUSIONS Breathiness ratings predicted selection of HNR, CPP and CSID; roughness ratings predicted selection of CPP and CSID, and strain ratings predicted L/H ratio.
Collapse
Affiliation(s)
- Duy Duong Nguyen
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Catherine Madill
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia.
| |
Collapse
|
8
|
Anand S. Perceptual and Computational Estimates of Vocal Breathiness and Roughness in Sustained Phonation and Connected Speech. J Voice 2023:S0892-1997(23)00069-3. [PMID: 36933971 DOI: 10.1016/j.jvoice.2023.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 03/18/2023]
Abstract
OBJECTIVE Clinical assessment of voice quality (VQ) often uses a combination of sustained phonations and more prolonged and more complex vocalizations. The purpose of this study was to compare the perceived vocal breathiness and vocal roughness of sustained phonations and connected speech over a wide range of dysphonia severity and to evaluate their relationship with acoustic measures and bioinspired models of breathiness and roughness. METHODS VQ dimension-specific single-variable matching task (SVMT) was used to index the perceived breathiness or roughness of five male and five female talkers on the basis of a sustained /a/ phonation and the 5th CAPE-V sentence. Acoustic measures of cepstral peak, autocorrelation peak and psychoacoustic measures of pitch strength, and temporal envelope standard deviation (EnvSD) was used to predict perceived breathiness and roughness judgments obtained from 10 listeners, respectively. RESULTS High intra- and inter-listener reliability was observed for sustained phonations and connected speech. Perceived breathiness and roughness of sustained vowels and sentences obtained using SVMT were highly correlated for most dysphonic voices. The pitch strength model of breathiness was able to capture larger amount of perceptual variance compared to cepstral peak in both vowels and sentences. Autocorrelation peak was strongly correlated to perceived roughness in sentences while EnvSD was strongly correlated to perceived roughness in vowels. CONCLUSIONS Results provide evidence that perception of VQ via SVMT can be successfully extended to connected speech. Computational models of VQ can be easily adapted to connected speech. Such automated models of VQ perception are valuable due to their computational efficiency and their ability to accurately capture the non-linearities of the human auditory system.
Collapse
Affiliation(s)
- Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
| |
Collapse
|
9
|
Maffei MF, Green JR, Murton O, Yunusova Y, Rowe HP, Wehbe F, Diana K, Nicholson K, Berry JD, Connaghan KP. Acoustic Measures of Dysphonia in Amyotrophic Lateral Sclerosis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:872-887. [PMID: 36802910 PMCID: PMC10205101 DOI: 10.1044/2022_jslhr-22-00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/25/2022] [Accepted: 12/01/2022] [Indexed: 05/25/2023]
Abstract
PURPOSE Identifying efficacious measures to characterize dysphonia in complex neurodegenerative diseases is key to optimal assessment and intervention. This study evaluates the validity and sensitivity of acoustic features of phonatory disruption in amyotrophic lateral sclerosis (ALS). METHOD Forty-nine individuals with ALS (40-79 years old) were audio-recorded while producing a sustained vowel and continuous speech. Perturbation/noise-based (jitter, shimmer, and harmonics-to-noise ratio) and cepstral/spectral (cepstral peak prominence, low-high spectral ratio, and related features) acoustic measures were extracted. The criterion validity of each measure was assessed using correlations with perceptual voice ratings provided by three speech-language pathologists. Diagnostic accuracy of the acoustic features was evaluated using area-under-the-curve analysis. RESULTS Perturbation/noise-based and cepstral/spectral features extracted from /a/ were significantly correlated with listener ratings of roughness, breathiness, strain, and overall dysphonia. Fewer and smaller correlations between cepstral/spectral measures and perceptual ratings were observed for the continuous speech task, although post hoc analyses revealed stronger correlations in speakers with less perceptually impaired speech. Area-under-the-curve analyses revealed that multiple acoustic features, particularly from the sustained vowel task, adequately differentiated between individuals with ALS with and without perceptually dysphonic voices. CONCLUSIONS Our findings support using both perturbation/noise-based and cepstral/spectral measures of sustained /a/ to assess phonatory quality in ALS. Results from the continuous speech task suggest that multisubsystem involvement impacts cepstral/spectral analyses in complex motor speech disorders such as ALS. Further investigation of the validity and sensitivity of cepstral/spectral measures during continuous speech in ALS is warranted.
Collapse
Affiliation(s)
- Marc F. Maffei
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
| | - Jordan R. Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
- Speech and Hearing Bioscience and Technology Program, Harvard University, Cambridge, MA
| | - Olivia Murton
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
| | - Yana Yunusova
- Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, Ontario, Canada
- Toronto Rehabilitation Institute, University Health Network, Ontario, Canada
| | - Hannah P. Rowe
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
| | - Farah Wehbe
- Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, Ontario, Canada
| | - Kathleen Diana
- Department of Neurology, Neurological Clinical Research Institute, Massachusetts General Hospital, Boston
| | - Katharine Nicholson
- Department of Neurology, Neurological Clinical Research Institute, Massachusetts General Hospital, Boston
| | - James D. Berry
- Department of Neurology, Neurological Clinical Research Institute, Massachusetts General Hospital, Boston
| | - Kathryn P. Connaghan
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
| |
Collapse
|
10
|
Park Y, Anand S, Gifford SM, Shrivastav R, Eddins DA. Development and Validation of a Single-Variable Comparison Stimulus for Matching Strained Voice Quality Using a Psychoacoustic Framework. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:16-29. [PMID: 36516473 PMCID: PMC10023177 DOI: 10.1044/2022_jslhr-22-00280] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 08/17/2022] [Accepted: 09/01/2022] [Indexed: 06/17/2023]
Abstract
PURPOSE Acoustic and perceptual quantification of vocal strain has been a vexing problem for years. To increase measurement rigor, a suitable single-variable matching stimulus for strain was developed and validated, based on the matching stimulus used previously for breathy and rough voice qualities. METHOD A set of 21 comparison stimuli for a single-variable matching task (SVMT) was synthesized based on a speech-shaped sawtooth waveform mixed with speech-shaped noise. Variable bandpass filter gain in mid-to-high frequencies achieved a wide range of computed sharpness (in constant sharpness steps) and served as the independent variable for the SVMT. Ten natural /ɑ/ stimuli with a wide range of the primary voice quality of strain and a minimum of breathiness or roughness were selected and assessed using the SVMT. Natural voice samples and synthetic comparison stimuli were also assessed using a perceptual magnitude estimation (ME) task. RESULTS ME data validated the correspondence of the set of comparison stimuli to varying perceived strain. Perceived strain magnitudes of the comparison stimuli increased significantly and linearly with computed sharpness (r 2 = .99). A linear regression revealed that strain matching values were significantly predicted by computed sharpness (r 2 = .96) and perceived strain magnitudes (r 2 = .95) of the natural voice stimuli. CONCLUSION The perception of vocal strain is strongly associated with computed sharpness and is captured accurately and precisely using an SVMT, in which the independent variable is the bandpass filter gain (in steps of equal sharpness) applied to the comparison stimuli.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Supraja Anand
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Sophia M. Gifford
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University, Bloomington
| | - David A. Eddins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| |
Collapse
|
11
|
Kopf LM, Huh-Yoo J. A User-Centered Design Approach to Developing a Voice Monitoring System for Disorder Prevention. J Voice 2023; 37:48-59. [PMID: 33189486 DOI: 10.1016/j.jvoice.2020.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/22/2020] [Accepted: 10/23/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Many individuals will experience a voice disorder in their lifetime, especially occupational voice users. While a number of voice monitoring systems have been developed, most were designed with the clinician/researcher as the end user. For a patient to use these systems, they need field experts to help them interpret data from the system to understand its meaning. Most of these systems would have challenges in being used in a preventative context with the occupational voice user as the sole system user. OBJECTIVE The current study introduces a novel design approach: user-centered design (UCD) with paper prototypes in the creation of a voice monitoring system for voice disorder prevention (VDP). The goal of this design approach is to design systems that are engaging and intuitive for users so they will be interested in interacting with the system and be able to benefit from the system without the need of external support. METHODS The current study was conducted in two phases: an iterative design phase and a test phase. In the iterative design phase, 15 participants gave their opinions on the measures and feedback designs they felt would be the most beneficial to users. In the test phase, the researchers collected real voice data over multiple sessions for 18 additional participants and provided this data using the final feedback displays from the design phase. RESULTS By engaging in UCD, the researchers identified key design challenges for VDP: (1) educating the user, (2) balancing contextualization and granularity, and (3) addressing disconnection between user and system goals. CONCLUSION UCD holds promise for designing VDP systems that are both engaging and intuitive for occupational voice users.
Collapse
Affiliation(s)
- Lisa M Kopf
- Department of Communication Sciences and Disorders, University of Northern Iowa, Cedar Falls, Iowa; Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| | - Jina Huh-Yoo
- College of Computing and Informatics, Drexel University, Philadelphia, Pennsylvania
| |
Collapse
|
12
|
Nagle KF. Clinical Use of the CAPE-V Scales: Agreement, Reliability and Notes on Voice Quality. J Voice 2022:S0892-1997(22)00366-6. [PMID: 36543606 DOI: 10.1016/j.jvoice.2022.11.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 11/09/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022]
Abstract
OBJECTIVES The CAPE-V is a widely used protocol developed to help standardize the evaluation of voice. Variability of voice quality ratings has prevented development of training protocols that might themselves improve interrater agreement among new clinicians. As part of a larger mixed methods project, this study examines agreement and reliability for experienced clinicians using the CAPE-V scales. STUDY DESIGN Observational. METHODS Experienced voice clinicians (N=20) provided ratings of recordings from 12 speakers representing a range of overall voice quality. Participants were instructed to rate the voices as they normally would, using the CAPE-V scales. Descriptive data were recorded and two levels of agreement were calculated. Single rater reliability was calculated using a 2-way random model of absolute agreement for intraclass correlations (ICC [2,1]). RESULTS Participants use of the CAPE-V scales varied considerably, although most rated overall severity, breathiness, roughness and strain. Data from one participant did not meet a priori agreement criteria. Because outcomes were significantly different without their data, agreement and reliability were analyzed based on the reduced data set from 19 participants. Interrater agreement and reliability were comparable to previous research; the mean range of ratings was at least 47mm for all dimensions of voice quality. CONCLUSIONS Results indicated differential use of the components of the CAPE-V form and scales in evaluating voice quality and severity of dysphonia, including categorical variability among ratings of all of the primary CAPE-V dimensions of voice quality that may complicate the clinical description of a voice as mildly, moderately or severely dysphonic.
Collapse
Affiliation(s)
- Kathleen F Nagle
- Department of Speech-Language Pathology, School of Health & Medical Science, Seton Hall University, Nutley, New Jersey.
| |
Collapse
|
13
|
Hidaka S, Lee Y, Nakanishi M, Wakamiya K, Nakagawa T, Kaburagi T. Automatic GRBAS Scoring of Pathological Voices using Deep Learning and a Small Set of Labeled Voice Data. J Voice 2022:S0892-1997(22)00347-2. [PMID: 36437171 DOI: 10.1016/j.jvoice.2022.10.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 10/27/2022] [Accepted: 10/27/2022] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Auditory-perceptual evaluation frameworks, such as the grade-roughness-breathiness-asthenia-strain (GRBAS) scale, are the gold standard for the quantitative evaluation of pathological voice quality. However, the evaluation is subjective; thus, the ratings lack reproducibility due to inter- and intra-rater variation. Prior researchers have proposed deep-learning-based automatic GRBAS score estimation to address this problem. However, these methods require large amounts of labeled voice data. Therefore, this study investigates the potential of automatic GRBAS estimation using deep learning with smaller amounts of data. METHODS A dataset consisting of 300 pathological sustained /a/ vowel samples was created and rated by eight experts (200 for training, 50 for validation, and 50 for testing). A neural network model that predicts the probability distribution of GRBAS scores from an onset-to-offset waveform was proposed. Random speed perturbation, random crop, and frequency masking were investigated as data augmentation techniques, and power, instantaneous frequency, and group delay were investigated as time-frequency representations. RESULTS Five-fold cross-validation was conducted, and the automatic scoring performance was evaluated using the quadratic weighted Cohen's kappa. The results showed that the kappa values of the automatic scoring performance were comparable to those of the inter-rater reliability of experts for all GRBAS items and the intra-rater reliability of experts for items G, B, A, and S. Random speed perturbation was the most effective data augmentation technique overall. When data augmentation was applied, power was the most effective for items G, R, A, and S; for Item B, combining group delay and power yielded additional performance gains. CONCLUSION The automatic GRBAS scoring achieved by the proposed model using scant labeled data was comparable to that of experts. This suggests that the challenges resulting from insufficient data can be alleviated. The findings of this study can also contribute to performance improvements in other tasks such as automatic voice disorder detection.
Collapse
Affiliation(s)
- Shunsuke Hidaka
- Graduate School of Design, Kyushu University, Fukuoka, Japan.
| | - Yogaku Lee
- Graduate School of Design, Kyushu University, Fukuoka, Japan; Department of Otorhinolaryngology, Faculty of Medicine, Kyushu University, Fukuoka, Japan
| | - Moe Nakanishi
- Graduate School of Design, Kyushu University, Fukuoka, Japan
| | | | - Takashi Nakagawa
- Department of Otorhinolaryngology, Faculty of Medicine, Kyushu University, Fukuoka, Japan
| | | |
Collapse
|
14
|
de Abreu SR, Sousa ESDS, de Moraes RM, Lopes LW. Performance of Acoustic Measures for the Discrimination Among Healthy, Rough, Breathy, and Strained Voices Using the Feedforward Neural Network. J Voice 2022:S0892-1997(22)00203-X. [PMID: 36028370 DOI: 10.1016/j.jvoice.2022.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 07/03/2022] [Accepted: 07/05/2022] [Indexed: 10/15/2022]
Abstract
OBJECTIVE To identify and evaluate the best set of acoustic measures to discriminate among healthy, rough, breathy, and strained voices. METHODS This study used the vocal samples of the sustained /ε/ vowel from 251 patients with the vocal complaints, among which 51, 80, 63, and 57 patients exhibited healthy, rough, breathy, and strained voices, respectively. Twenty-two acoustic measures were extracted, and feature selection was applied to reduce the number of combinations of acoustic measures and obtain an optimal subset of measures according to the information gain attribute ranking algorithm. To classify signals as a function of predominant voice quality, a feedforward neural network was applied using a Levenberg-Marquardt supervised learning algorithm. RESULTS The best results were obtained from 11 combinations, with each combination presenting six acoustic measures. Kappa indices ranged from 0.7527 to 0.7743, the overall hit rates are 81.67%-83.27%, and the hit rates of healthy, rough, breathy, and strained voices are 74.51%-84.31%, 78.75%-90.00%, 85.71%-98.41%, and 68.42%-82.46%, respectively. CONCLUSIONS We obtained the best results from 11 combinations, with each combination exhibiting six acoustic measures for discriminating among healthy, rough, breathy, and strained voices. These sets exhibited good Kappa performance and a good overall hit rate. The hit rate varied between acceptable and good for healthy voices, acceptable and excellent for rough voices, good and excellent for breathy voices, and poor and good for strained voices.
Collapse
Affiliation(s)
- Samuel Ribeiro de Abreu
- Graduate Program in Decision Models and Health, Statistics Departament, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil
| | - Estevão Silvestre da Silva Sousa
- Graduate Program in Decision Models and Health, Statistics Departament, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil
| | | | - Leonardo Wanderley Lopes
- Department of Speech-Language and Hearing Sciences, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil.
| |
Collapse
|
15
|
Park Y, Anand S, Ozmeral EJ, Shrivastav R, Eddins DA. Predicting Perceived Vocal Roughness Using a Bio-Inspired Computational Model of Auditory Temporal Envelope Processing. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2748-2758. [PMID: 35867607 PMCID: PMC9911094 DOI: 10.1044/2022_jslhr-22-00101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/14/2022] [Accepted: 04/25/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Vocal roughness is often present in many voice disorders but the assessment of roughness mainly depends on the subjective auditory-perceptual evaluation and lacks acoustic correlates. This study aimed to apply the concept of roughness in general sound quality perception to vocal roughness assessment and to characterize the relationship between vocal roughness and temporal envelop fluctuation measures obtained from an auditory model. METHOD Ten /ɑ/ recordings with a wide range of roughness were selected from an existing database. Ten listeners rated the roughness of the recordings in a single-variable matching task. Temporal envelope fluctuations of the recordings were analyzed with an auditory processing model of amplitude modulation that utilizes a modulation filterbank of different modulation frequencies. Pitch strength and the smoothed cepstral peak prominence were also obtained for comparison. RESULTS Individual simple regression models yielded envelope standard deviation from a modulation filter with a low center frequency (64.3 Hz) as a statistically significant predictor of vocal roughness with a strong coefficient of determination (r 2 = .80). Pitch strength and CPPS were not significant predictors of roughness. CONCLUSION This result supports the possible utility of envelope fluctuation measures from an auditory model as objective correlates of vocal roughness.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Erol J. Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University Bloomington
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| |
Collapse
|
16
|
Fujiki RB, Huber JE, Sivasankar MP. The effects of vocal exertion on lung volume measurements and acoustics in speakers reporting high and low vocal fatigue. PLoS One 2022; 17:e0268324. [PMID: 35551535 PMCID: PMC9098027 DOI: 10.1371/journal.pone.0268324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 04/26/2022] [Indexed: 12/02/2022] Open
Abstract
Purpose Vocal exertion is common and often results in reduced respiratory and laryngeal efficiency. It is unknown, however, whether the respiratory kinematic and acoustic adjustments employed during vocal exertion differ between speakers reporting vocal fatigue and those who do not. This study compared respiratory kinematics and acoustic measures in individuals reporting low and high levels of vocal fatigue during a vocal exertion task. Methods Individuals reporting low (N = 20) and high (N = 10) vocal fatigue participated in a repeated measures design study over 2 days. On each day, participants completed a 10-minute vocal exertion task consisting of repeated, loud vowel productions at elevated F0 sustained for maximum phonation time. Respiratory kinematic and acoustic measures were analyzed on the 1st vowel production (T0), and the vowels produced 2 minutes (T2), 5 minutes (T5), 7 minutes (T7), and 10 minutes (T10) into the vocal exertion task. Vowel durations were also measured at each time point. Results No differences in respiratory kinematics were observed between low and high vocal fatigue groups at T0. As the vocal exertion task progressed (T2-T10), individuals reporting high vocal fatigue initiated phonation at lower lung volumes while individuals with low vocal fatigue initiated phonation at higher lung volumes. As the exertion task progressed, total lung volume excursion decreased in both groups. Differences in acoustic measures were observed, as individuals reporting high vocal fatigue produced softer, shorter vowels from T0 through T10. Conclusions Individuals reporting high vocal fatigue employed less efficient respiratory strategies during periods of increased vocal demand when compared with individuals reporting low vocal fatigue. Individuals reporting high vocal fatigue had shorter maximum phonation time on loud vowels. Further study should examine the potential screening value of loud maximum phonation time, as well as the clinical implications of the observed respiratory patterns for managing vocal fatigue.
Collapse
Affiliation(s)
- Robert Brinton Fujiki
- Department of Surgery, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Jessica E Huber
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, United States of America
| | - M Preeti Sivasankar
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, United States of America
| |
Collapse
|
17
|
Kapsner-Smith MR, Díaz-Cádiz ME, Vojtech JM, Buckley DP, Mehta DD, Hillman RE, Tracy LF, Noordzij JP, Eadie TL, Stepp CE. Clinical Cutoff Scores for Acoustic Indices of Vocal Hyperfunction That Combine Relative Fundamental Frequency and Cepstral Peak Prominence. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1349-1369. [PMID: 35263546 PMCID: PMC9499364 DOI: 10.1044/2021_jslhr-21-00466] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
PURPOSE This study examined the discriminative ability of acoustic indices of vocal hyperfunction combining smoothed cepstral peak prominence (CPPS) and relative fundamental frequency (RFF). METHOD Demographic, CPPS, and RFF parameters were entered into logistic regression models trained on two 1:1 case-control groups: individuals with and without nonphonotraumatic vocal hyperfunction (NPVH; n = 360) and phonotraumatic vocal hyperfunction (PVH; n = 240). Equations from the final models were used to predict group membership in two independent test sets (n = 100 each). RESULTS Both CPPS and RFF parameters significantly improved model fits for NPVH and PVH after accounting for demographics. CPPS explained unique variance beyond RFF in both models. RFF explained unique variance beyond CPPS in the PVH model. Final models included CPPS and RFF offset parameters for both NPVH and PVH; RFF onset parameters were significant only in the PVH model. Area under the receiver operating characteristic curve analysis for the independent test sets revealed acceptable classification for NPVH (72%) and good classification for PVH (86%). CONCLUSIONS A combination of CPPS and RFF parameters showed better discriminative ability than either measure alone for PVH. Clinical cutoff scores for acoustic indices of vocal hyperfunction are proposed for assessment and screening purposes.
Collapse
Affiliation(s)
| | | | - Jennifer M Vojtech
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
| | - Daniel P Buckley
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Otolaryngology-Head & Neck Surgery, Boston University School of Medicine, MA
| | - Daryush D Mehta
- MGH Institute of Health Professions, Boston, MA
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston
- Department of Surgery, Harvard Medical School, Cambridge, MA
| | - Robert E Hillman
- MGH Institute of Health Professions, Boston, MA
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston
- Department of Surgery, Harvard Medical School, Cambridge, MA
| | - Lauren F Tracy
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Otolaryngology-Head & Neck Surgery, Boston University School of Medicine, MA
| | - J Pieter Noordzij
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Otolaryngology-Head & Neck Surgery, Boston University School of Medicine, MA
| | - Tanya L Eadie
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Cara E Stepp
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology-Head & Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
18
|
Abur D, Perkell JS, Stepp CE. Impact of Vocal Effort on Respiratory and Articulatory Kinematics. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:5-21. [PMID: 34843405 PMCID: PMC9150749 DOI: 10.1044/2021_jslhr-21-00323] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 07/27/2021] [Accepted: 08/24/2021] [Indexed: 06/13/2023]
Abstract
PURPOSE The goal of this study was to examine the effects of increases in vocal effort, without changing speech intensity, on respiratory and articulatory kinematics in young adults with typical voices. METHOD A total of 10 participants completed a reading task under three speaking conditions: baseline, mild vocal effort, and maximum vocal effort. Respiratory inductance plethysmography bands around the chest and abdomen were used to estimate lung volumes during speech, and sensor coils for electromagnetic articulography were used to transduce articulatory movements, resulting in the following outcome measures: lung volume at speech initiation (LVSI) and at speech termination (LVST), articulatory kinematic vowel space (AKVS) of two points on the tongue dorsum (body and blade), and lip aperture. RESULTS With increases in vocal effort, and no statistical changes in speech intensity, speakers showed: (a) no statistically significant differences in LVST, (b) statistically significant increases in LVSI, (c) no statistically significant differences in AKVS measures, and (d) statistically significant reductions in lip aperture. CONCLUSIONS Speakers with typical voices exhibited larger lung volumes at speech initiation during increases in vocal effort, paired with reduced lip displacements. To our knowledge, this is the first study to demonstrate evidence that articulatory kinematics are impacted by modulations in vocal effort. However, the mechanisms underlying vocal effort may differ between speakers with and without voice disorders. Thus, future work should examine the relationship between articulatory kinematics, respiratory kinematics, and laryngeal-level changes during vocal effort in speakers with and without voice disorders. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.17065457.
Collapse
Affiliation(s)
- Defne Abur
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Joseph S. Perkell
- Department of Speech, Language and Hearing Sciences, Boston University, MA
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge
| | - Cara E. Stepp
- Department of Speech, Language and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology-Head & Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
19
|
Kapsner-Smith MR, Opuszynski A, Stepp CE, Eadie TL. The Effect of Visual Sort and Rate Versus Visual Analog Scales on the Reliability of Judgments of Dysphonia. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1571-1580. [PMID: 33909472 PMCID: PMC8608224 DOI: 10.1044/2021_jslhr-20-00623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Purpose The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of visual analog scales (VAS) for ratings of overall severity (OS) and breathiness (BR) in speakers with voicedisorders. Method Fifty speech samples were selected from a database of speakers with voice disorders. Twenty-two inexperienced listeners provided ratings of OS or BR in four rating blocks: VSR-OS, VSR-BR, VAS-OS, and VSR-BR. For the VAS task, listeners rated each speaker for BR or OS using a vertically oriented 100-mm VAS. For the VSR task, stimuli were distributed into sets of samples with a range of speaker severities in each set. Listeners sorted and ranked samples for OS or BR within each set, and final ratings were captured on a vertically oriented 100-mm VAS. Interrater variability, defined as the mean of the squared differences between a listener's ratings and group mean ratings, and intrarater reliability (Pearson r) were compared across rating tasks for OS and BR using paired t tests. Results Results showed that listeners had significantly less interrater variability (better reliability) when using VSR methods compared to VAS for judgments of both OS and BR. Intrarater reliability was high across rating tasks and dimensions; however, ratings of BR were significantly more consistent within individual listeners when using VAS than when using VSR. Conclusions VSR is an experimental method that decreases variability of auditory-perceptual judgments between inexperienced listeners when rating speakers with a range of dysphonic severities and disorders. Future research should determine whether a clinically viable tool may be developed based on VSR principles and whether such benefits extend to experienced listeners.
Collapse
Affiliation(s)
| | - Amanda Opuszynski
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Cara E. Stepp
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Otolaryngology – Head & Neck Surgery, Boston University School of Medicine, MA
- Department of Biomedical Engineering, Boston University, MA
| | - Tanya L. Eadie
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
20
|
Fujiki RB, Thibeault SL. The Relationship Between Auditory-Perceptual Rating Scales and Objective Voice Measures in Children With Voice Disorders. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2021; 30:228-238. [PMID: 33439742 DOI: 10.1044/2020_ajslp-20-00188] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose The purpose of this study was to determine concurrent validity of the Grade, Roughness, Breathiness, Asthenia, and Strain (GRBAS) and Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) auditory-perceptual scales in children with voice disorders. A secondary purpose was to determine correlation between the GRBAS, CAPE-V, and objective voice measures. Method GRBAS and CAPE-V ratings and acoustic and aerodynamic measures were collected from the University of Wisconsin-Madison Voice and Swallow Outcomes Database. Correlations between CAPE-V and GRBAS ratings were calculated for overall severity of dysphonia, roughness, breathiness, and strain. Correlations between auditory-perceptual voice ratings and objective voice measures were also examined. Results One hundred thirty GRBAS and CAPE-V auditory-perceptual ratings were significantly correlated for overall severity, roughness, breathiness, and strain. r 2 values were highest for overall severity of dysphonia (r 2 = .75) and lowest for strain (r 2 = .54). CAPE-V and GRBAS ratings were largely associated with similar acoustic and aerodynamic measures. The highest correlations were observed for auditory-perceptual ratings of breathiness and jitter% (CAPE-V r 2 = .44, GRBAS r 2 = .44), shimmer% (CAPE-V r 2 = .45, GRBAS r 2 = .45), noise-to-harmonic ratio (CAPE-V r 2 = .42, GRBAS r 2 = .40), fundamental frequency (CAPE-V r 2 = .47, GRBAS r 2 = .44), and maximum phonation time (CAPE-V r 2 = .56, GRBAS r 2 = .51). Akaike information criterion values indicated that CAPE-V ratings were more strongly correlated with objective voice measures than GRBAS ratings. Conclusions CAPE-V and GRBAS scales have concurrent validity in children with voice disorders. CAPE-V ratings are more strongly correlated with acoustic and aerodynamic voice measures.
Collapse
|
21
|
Park Y, Cádiz MD, Nagle KF, Stepp CE. Perceptual and Acoustic Assessment of Strain Using Synthetically Modified Voice Samples. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3897-3908. [PMID: 33151770 PMCID: PMC8608200 DOI: 10.1044/2020_jslhr-20-00294] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 07/23/2020] [Accepted: 08/17/2020] [Indexed: 06/11/2023]
Abstract
Purpose Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method Stimuli were created using recordings of speakers producing /ifi/ with a comfortable voice and with maximum vocal effort. RFF values of the comfortable voice samples were synthetically lowered, and RFF values of the maximum vocal effort samples were synthetically raised. Mid-to-high frequency noise was added to the samples. Twenty listeners rated strain in a visual sort-and-rate task. The effects of RFF modification and added noise on strain were assessed using an analysis of variance; intra- and interrater reliability were compared with and without noise. Results Lowering RFF in the comfortable voice samples increased their perceived strain, whereas raising RFF in the maximum vocal effort samples decreased their strain. Adding noise increased strain and decreased intra- and interrater reliability relative to samples without added noise. Conclusions Both RFF and mid-to-high frequency noise contribute to the perception of strain. The presence of dysphonia may decrease the reliability of auditory-perceptual evaluation of strain, which supports the need for complementary objective assessments. Supplemental Material https://doi.org/10.23641/asha.13172252.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Manuel Díaz Cádiz
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Kathleen F. Nagle
- Department of Speech-Language Pathology, Seton Hall University, South Orange, NJ
| | - Cara E. Stepp
- Department of Speech, Language and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology – Head and Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
22
|
Barsties V Latoszek B, Kim GH, Delgado Hernández J, Hosokawa K, Englert M, Neumann K, Hetjens S. The validity of the Acoustic Breathiness Index in the evaluation of breathy voice quality: A Meta-Analysis. Clin Otolaryngol 2020; 46:31-40. [PMID: 32770718 DOI: 10.1111/coa.13629] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 07/03/2020] [Accepted: 07/31/2020] [Indexed: 02/01/2023]
Abstract
BACKGROUND The evaluation of voice quality with acoustic measurements is useful to objectify the diagnostic process. Particularly, breathiness was highly evaluated and the Acoustic Breathiness Index (ABI) might have promising features. OBJECTIVE OF REVIEW The goal of the present meta-analysis is to quantify, from existing cross-validation studies, the evidence for the diagnostic accuracy of ABI, including its sensitivity and specificity. TYPE OF REVIEW Meta-analysis. SEARCH STRATEGY We searched in MEDLINE, Google Scholar and Science Citation Index, and as manual search for the term Acoustic Breathiness Index from inception to February 2020. Studies were included that used equal proportion of continuous speech and sustained vowel segments, a recording hardware with a sufficient standard for voice signal analyses, the software Praat for signal processing and the customised Praat script, and two groups of subjects (vocally healthy and voice-disordered). Furthermore, the diagnostic accuracy of ABI was measured. EVALUATION METHOD The primary outcome variable was ABI. The score ranged from 0 to 10 with varying thresholds according to different languages to determine the absence or presence of breathiness. A meta-analysis was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses of diagnostic test accuracy study guidelines. Data were extracted, and the risk of bias was assessed using the QUADAS-2 tool. The pooled sensitivity and specificity of ABI were determined using a summary receiver operating characteristic (SROC) approach to calculate also a weighted threshold value of ABI with its sensitivity and specificity. RESULTS A total of 34 unique citations were screened, and 10 full-text articles were reviewed, including six studies. In total, 3603 voice samples were considered for further analysis separating into 467 vocally healthy and 3136 voice-disordered voice samples. The pooled sensitivity was 0.84 (95% CI, 0.83-0.85), and the pooled specificity was 0.92 (95% CI, 0.89-0.94). The area under the curve of the SROC curve of this analysis showed an excellent value of 0.94. The weighted ABI threshold was determined at 3.40 (sensitivity: 0.86, 95% CI, 0.84-0.87.; specificity: 0.90, 95% CI 0.88-0.92). CONCLUSIONS The results confirm the ABI as robust and valid objective measure for evaluating breathiness.
Collapse
Affiliation(s)
- Ben Barsties V Latoszek
- Department of Phoniatrics and Pediatric Audiology, University Hospital Münster, Westphalian Wilhelm University, Münster, Germany.,Speech-Language Pathology, SRH University of Applied Health Sciences, Düsseldorf, Germany
| | - Geun-Hyo Kim
- Department of Otorhinolaryngology-Head and Neck Surgery and Biomedical Research Institute, Pusan National University Hospital, Busan, South Korea
| | | | - Kiyohito Hosokawa
- Department of Otorhinolaryngology, Japan Community Health Care Organization, Osaka Hospital, Osaka, Japan.,Department of Otorhinolaryngology, Osaka Police Hospital, Osaka, Japan.,Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Marina Englert
- Department of Communication Disorders, UNIFESP - Universidade Federal de São Paulo, São Paulo, Brazil.,CEV, Centro de Estudos da Voz, São Paulo, Brazil
| | - Katrin Neumann
- Speech-Language Pathology, SRH University of Applied Health Sciences, Düsseldorf, Germany
| | - Svetlana Hetjens
- Department of Statistics, Medical Faculty Mannheim, Ruprecht Karls University of Heidelberg, Mannheim, Germany
| |
Collapse
|
23
|
Kochilas HL, Cacace AT, Arnold A, Seidman MD, Tarver WB. Vagus nerve stimulation paired with tones for tinnitus suppression: Effects on voice and hearing. Laryngoscope Investig Otolaryngol 2020; 5:286-296. [PMID: 32337360 PMCID: PMC7178458 DOI: 10.1002/lio2.364] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 01/23/2020] [Accepted: 02/08/2020] [Indexed: 12/16/2022] Open
Abstract
OBJECTIVE In individuals with chronic tinnitus, our interest was to determine whether daily low-level electrical stimulation of the vagus nerve paired with tones (paired-VNSt) for tinnitus suppression had any adverse effects on motor-speech production and physiological acoustics of sustained vowels. Similarly, we were also interested in evaluating for changes in pure-tone thresholds, word-recognition performance, and minimum-masking levels. Both voice and hearing functions were measured repeatedly over a period of 1 year. STUDY DESIGN Longitudinal with repeated-measures. METHODS Digitized samples of sustained frontal, midline, and back vowels (/e/, /o/, /ah/) were analyzed with computer software to quantify the degree of jitter, shimmer, and harmonic-to-noise ratio contained in these waveforms. Pure-tone thresholds, monosyllabic word-recognition performance, and MMLs were also evaluated for VNS alterations. Linear-regression analysis was the benchmark statistic used to document change over time in voice and hearing status from a baseline condition. RESULTS Most of the regression functions for the vocal samples and audiometric variables had slope values that were not significantly different from zero. Four of the nine vocal functions showed a significant improvement over time, whereas three of the pure tone regression functions at 2-4 kHz showed some degree of decline; all changes observed were for the left ear, all were at adjacent frequencies, and all were ipsilateral to the side of VNS. However, mean pure-tone threshold changes did not exceed 4.29 dB from baseline and therefore, would not be considered clinically significant. In some individuals, larger threshold shifts were observed. No significant regression/slope effects were observed for word-recognition or MMLs. CONCLUSION Quantitative voice analysis and assessment of audiometric variables showed minimal if any evidence of adverse effects using paired-VNSt over a treatment period of 1 year. Therefore, we conclude that paired-VNSt is a safe tool for tinnitus abatement in humans without significant side effects. LEVEL OF EVIDENCE Level IV.
Collapse
Affiliation(s)
- Helen L. Kochilas
- North Atlanta Ears, Nose, Throat & Allergy, AlpharettaGeorgia
- Present address:
North Atlanta Ears, Nose, Throat & AllergyAlpharettaGeorgia
| | - Anthony T. Cacace
- Department of Communication Sciences & Disorders, Wayne State University, DetroitMichigan
| | - Amy Arnold
- The Hearing Clinic, BrightonMichigan
- Present address:
The Hearing ClinicBrightonMichigan
| | - Michael D. Seidman
- Florida ENT Surgical Specialists, Florida Hospital Medical Group, Head & Neck Surgery Center of Florida, CelebrationFlorida
- Present address:
Florida Hospital Medical GroupHead & Neck Surgery Center of FloridaCelebrationFlorida
| | | |
Collapse
|
24
|
Vojtech JM, Segina RK, Buckley DP, Kolin KR, Tardif MC, Noordzij JP, Stepp CE. Refining algorithmic estimation of relative fundamental frequency: Accounting for sample characteristics and fundamental frequency estimation method. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3184. [PMID: 31795681 PMCID: PMC6847943 DOI: 10.1121/1.5131025] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 10/07/2019] [Accepted: 10/08/2019] [Indexed: 05/26/2023]
Abstract
Relative fundamental frequency (RFF) is a promising acoustic measure for evaluating voice disorders. Yet, the accuracy of the current RFF algorithm varies across a broad range of vocal signals. The authors investigated how fundamental frequency (fo) estimation and sample characteristics impact the relationship between manual and semi-automated RFF estimates. Acoustic recordings were collected from 227 individuals with and 256 individuals without voice disorders. Common fo estimation techniques were compared to the autocorrelation method currently implemented in the RFF algorithm. Pitch strength-based categories were constructed using a training set (1158 samples), and algorithm thresholds were tuned to each category. RFF was then computed on an independent test set (291 samples) using category-specific thresholds and compared against manual RFF via mean bias error (MBE) and root-mean-square error (RMSE). Auditory-SWIPE' for fo estimation led to the greatest correspondence with manual RFF and was implemented in concert with category-specific thresholds. Refining fo estimation and accounting for sample characteristics led to increased correspondence with manual RFF [MBE = 0.01 semitones (ST), RMSE = 0.28 ST] compared to the unmodified algorithm (MBE = 0.90 ST, RMSE = 0.34 ST), reducing the MBE and RMSE of semi-automated RFF estimates by 88.4% and 17.3%, respectively.
Collapse
Affiliation(s)
- Jennifer M Vojtech
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, USA
| | - Roxanne K Segina
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Daniel P Buckley
- Department of Otolaryngology-Head and Neck Surgery, Boston University School of Medicine, 72 East Concord Street, Boston, Massachusetts 02118, USA
| | - Katharine R Kolin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Monique C Tardif
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - J Pieter Noordzij
- Department of Otolaryngology-Head and Neck Surgery, Boston University School of Medicine, 72 East Concord Street, Boston, Massachusetts 02118, USA
| | - Cara E Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|