1
|
Anand S, Park Y, Shrivastav R, Eddins DA. Evaluating the Effect of Voice Quality Covariance on Auditory-Perceptual Evaluation Using a Novel Two-Dimensional Magnitude Estimation Task. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:4849-4859. [PMID: 37902504 PMCID: PMC11001379 DOI: 10.1044/2023_jslhr-23-00226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/12/2023] [Accepted: 09/03/2023] [Indexed: 10/31/2023]
Abstract
PURPOSE Most people with dysphonia present with voices that vary along more than one voice quality (VQ) dimension. This study sought to examine the effect of covariance between breathy and rough VQ in natural voices. METHOD A two-dimensional matrix of 16 /a/ vowels was selected such that two VQ dimensions (breathiness and roughness) were sampled on a 4-point severity scale (none, mild, moderate, and severe). Ten listeners evaluated 480 stimuli (16 stimuli × 10 repetitions × 3 blocks) on one-dimensional magnitude estimation (1DME) tasks and a novel two-dimensional magnitude estimation (2DME) task that allowed for simultaneous measurement of breathiness and roughness. RESULTS Data indicated high intra- and interrater reliabilities for both breathiness and roughness in the 2DME and 1DME tasks. Correlation analyses revealed a strong correlation between 2DME and 1DME judgments for breathiness and roughness (r > .95). There was also a minimal correlation between breathy and rough VQ in the 2DME task (r < .10). CONCLUSIONS Covarying roughness or breathiness had less impact on the perception of the other VQ in natural dysphonic voices in 2DME compared to 1DME. An understanding and quantification of the perceptual interactions among the dimensions will aid in the refinement of computational models and in the establishment of the validity of clinical scales for VQ perception.
Collapse
Affiliation(s)
- Supraja Anand
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Yeonggwang Park
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University Bloomington
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando
| |
Collapse
|
2
|
Park Y, Baker Brehm S, Kelchner L, Weinrich B, McElfresh K, Anand S, Shrivastav R, de Alarcon A, Eddins DA. Effects of Vibratory Source on Auditory-Perceptual and Bio-Inspired Computational Measures of Pediatric Voice Quality. J Voice 2023:S0892-1997(23)00254-0. [PMID: 37739862 PMCID: PMC10950844 DOI: 10.1016/j.jvoice.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/24/2023]
Abstract
OBJECTIVE The vibratory source for voicing in children with dysphonia is classified into three categories including a glottal vibratory source (GVS) observed in those with vocal lesions or hyperfunction; supraglottal vibratory sources (SGVS) observed secondary to laryngeal airway injuries, malformations, or reconstruction surgeries; and a combination of both glottal and supraglottal vibratory sources called mixed vibratory source (MVS). This study evaluated the effects of vibratory source on three primary dimensions of voice quality (breathiness, roughness, and strain) in children with GVS, SGVS, and MVS using single-variable matching tasks and computational measures obtained from bio-inspired auditory models. METHODS A total of 44 dysphonic voice samples from children aged 4-11 years were selected. Seven listeners rated breathiness, roughness, and strain of 1000-ms /ɑ/ samples using single-variable matching tasks. Computational estimates of pitch strength, amplitude modulation filterbank output, and sharpness were obtained through custom-designed MATLAB algorithms. RESULTS Perceived roughness and strain were significantly higher in children with SGVS and MVS compared to children with GVS. Among the computational measures, only the modulation filterbank output resulted in significant differences among vibratory sources; a posthoc test revealed that children with SGVS had greater amplitude modulation than children with GVS, as expected from their rougher voice quality. CONCLUSIONS The results indicate that the output of an auditory amplitude modulation filterbank model may capture characteristics of SGVS that are strongly related to the rough voice quality.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando, Florida.
| | - Susan Baker Brehm
- Department of Speech Pathology and Audiology, Miami University, Oxford, Ohio; Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Lisa Kelchner
- Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio
| | - Barbara Weinrich
- Department of Speech Pathology and Audiology, Miami University, Oxford, Ohio; Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Kevin McElfresh
- Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University, Bloomington, Indiana
| | - Alessandro de Alarcon
- Pediatric Otolaryngology Head & Neck Surgery, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - David A Eddins
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando, Florida
| |
Collapse
|
3
|
Anand S. Perceptual and Computational Estimates of Vocal Breathiness and Roughness in Sustained Phonation and Connected Speech. J Voice 2023:S0892-1997(23)00069-3. [PMID: 36933971 DOI: 10.1016/j.jvoice.2023.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 03/18/2023]
Abstract
OBJECTIVE Clinical assessment of voice quality (VQ) often uses a combination of sustained phonations and more prolonged and more complex vocalizations. The purpose of this study was to compare the perceived vocal breathiness and vocal roughness of sustained phonations and connected speech over a wide range of dysphonia severity and to evaluate their relationship with acoustic measures and bioinspired models of breathiness and roughness. METHODS VQ dimension-specific single-variable matching task (SVMT) was used to index the perceived breathiness or roughness of five male and five female talkers on the basis of a sustained /a/ phonation and the 5th CAPE-V sentence. Acoustic measures of cepstral peak, autocorrelation peak and psychoacoustic measures of pitch strength, and temporal envelope standard deviation (EnvSD) was used to predict perceived breathiness and roughness judgments obtained from 10 listeners, respectively. RESULTS High intra- and inter-listener reliability was observed for sustained phonations and connected speech. Perceived breathiness and roughness of sustained vowels and sentences obtained using SVMT were highly correlated for most dysphonic voices. The pitch strength model of breathiness was able to capture larger amount of perceptual variance compared to cepstral peak in both vowels and sentences. Autocorrelation peak was strongly correlated to perceived roughness in sentences while EnvSD was strongly correlated to perceived roughness in vowels. CONCLUSIONS Results provide evidence that perception of VQ via SVMT can be successfully extended to connected speech. Computational models of VQ can be easily adapted to connected speech. Such automated models of VQ perception are valuable due to their computational efficiency and their ability to accurately capture the non-linearities of the human auditory system.
Collapse
Affiliation(s)
- Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
| |
Collapse
|
4
|
Park Y, Anand S, Gifford SM, Shrivastav R, Eddins DA. Development and Validation of a Single-Variable Comparison Stimulus for Matching Strained Voice Quality Using a Psychoacoustic Framework. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:16-29. [PMID: 36516473 PMCID: PMC10023177 DOI: 10.1044/2022_jslhr-22-00280] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 08/17/2022] [Accepted: 09/01/2022] [Indexed: 06/17/2023]
Abstract
PURPOSE Acoustic and perceptual quantification of vocal strain has been a vexing problem for years. To increase measurement rigor, a suitable single-variable matching stimulus for strain was developed and validated, based on the matching stimulus used previously for breathy and rough voice qualities. METHOD A set of 21 comparison stimuli for a single-variable matching task (SVMT) was synthesized based on a speech-shaped sawtooth waveform mixed with speech-shaped noise. Variable bandpass filter gain in mid-to-high frequencies achieved a wide range of computed sharpness (in constant sharpness steps) and served as the independent variable for the SVMT. Ten natural /ɑ/ stimuli with a wide range of the primary voice quality of strain and a minimum of breathiness or roughness were selected and assessed using the SVMT. Natural voice samples and synthetic comparison stimuli were also assessed using a perceptual magnitude estimation (ME) task. RESULTS ME data validated the correspondence of the set of comparison stimuli to varying perceived strain. Perceived strain magnitudes of the comparison stimuli increased significantly and linearly with computed sharpness (r 2 = .99). A linear regression revealed that strain matching values were significantly predicted by computed sharpness (r 2 = .96) and perceived strain magnitudes (r 2 = .95) of the natural voice stimuli. CONCLUSION The perception of vocal strain is strongly associated with computed sharpness and is captured accurately and precisely using an SVMT, in which the independent variable is the bandpass filter gain (in steps of equal sharpness) applied to the comparison stimuli.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Supraja Anand
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Sophia M. Gifford
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University, Bloomington
| | - David A. Eddins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| |
Collapse
|
5
|
Kopf LM, Huh-Yoo J. A User-Centered Design Approach to Developing a Voice Monitoring System for Disorder Prevention. J Voice 2023; 37:48-59. [PMID: 33189486 DOI: 10.1016/j.jvoice.2020.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/22/2020] [Accepted: 10/23/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Many individuals will experience a voice disorder in their lifetime, especially occupational voice users. While a number of voice monitoring systems have been developed, most were designed with the clinician/researcher as the end user. For a patient to use these systems, they need field experts to help them interpret data from the system to understand its meaning. Most of these systems would have challenges in being used in a preventative context with the occupational voice user as the sole system user. OBJECTIVE The current study introduces a novel design approach: user-centered design (UCD) with paper prototypes in the creation of a voice monitoring system for voice disorder prevention (VDP). The goal of this design approach is to design systems that are engaging and intuitive for users so they will be interested in interacting with the system and be able to benefit from the system without the need of external support. METHODS The current study was conducted in two phases: an iterative design phase and a test phase. In the iterative design phase, 15 participants gave their opinions on the measures and feedback designs they felt would be the most beneficial to users. In the test phase, the researchers collected real voice data over multiple sessions for 18 additional participants and provided this data using the final feedback displays from the design phase. RESULTS By engaging in UCD, the researchers identified key design challenges for VDP: (1) educating the user, (2) balancing contextualization and granularity, and (3) addressing disconnection between user and system goals. CONCLUSION UCD holds promise for designing VDP systems that are both engaging and intuitive for occupational voice users.
Collapse
Affiliation(s)
- Lisa M Kopf
- Department of Communication Sciences and Disorders, University of Northern Iowa, Cedar Falls, Iowa; Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| | - Jina Huh-Yoo
- College of Computing and Informatics, Drexel University, Philadelphia, Pennsylvania
| |
Collapse
|
6
|
Shen J, Heller Murray E, Kulick ER. The Effect of Breathy Vocal Quality on Speech Intelligibility and Listening Effort in Background Noise. Trends Hear 2023; 27:23312165231206925. [PMID: 37817666 PMCID: PMC10566269 DOI: 10.1177/23312165231206925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 09/06/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
Speech perception is challenging under adverse conditions. However, there is limited evidence regarding how multiple adverse conditions affect speech perception. The present study investigated two conditions that are frequently encountered in real-life communication: background noise and breathy vocal quality. The study first examined the effects of background noise and breathiness on speech perception as measured by intelligibility. Secondly, the study tested the hypothesis that both noise and breathiness affect listening effort, as indicated by linear and nonlinear changes in pupil dilation. Low-context sentences were resynthesized to create three levels of breathiness (original, mild-moderate, and severe). The sentences were presented in a fluctuating nonspeech noise with two signal-to-noise ratios (SNRs) of -5 dB (favorable) and -9 dB (adverse) SNR. Speech intelligibility and pupil dilation data were collected from young listeners with normal hearing thresholds. The results demonstrated that a breathy vocal quality presented in noise negatively affected speech intelligibility, with the degree of breathiness playing a critical role. Listening effort, as measured by the magnitude of pupil dilation, showed significant effects with both severe and mild-moderate breathy voices that were independent of noise level. The findings contributed to the literature by demonstrating the impact of vocal quality on the perception of speech in noise. They also highlighted the complex dynamics between overall task demand and processing resources in understanding the combined impact of multiple adverse conditions.
Collapse
Affiliation(s)
- Jing Shen
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, PA, USA
| | - Elizabeth Heller Murray
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, PA, USA
| | - Erin R. Kulick
- Department of Epidemiology and Biostatistics, College of Public Health, Temple University, Philadelphia, PA, USA
| |
Collapse
|
7
|
Nagle KF. Clinical Use of the CAPE-V Scales: Agreement, Reliability and Notes on Voice Quality. J Voice 2022:S0892-1997(22)00366-6. [PMID: 36543606 DOI: 10.1016/j.jvoice.2022.11.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 11/09/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022]
Abstract
OBJECTIVES The CAPE-V is a widely used protocol developed to help standardize the evaluation of voice. Variability of voice quality ratings has prevented development of training protocols that might themselves improve interrater agreement among new clinicians. As part of a larger mixed methods project, this study examines agreement and reliability for experienced clinicians using the CAPE-V scales. STUDY DESIGN Observational. METHODS Experienced voice clinicians (N=20) provided ratings of recordings from 12 speakers representing a range of overall voice quality. Participants were instructed to rate the voices as they normally would, using the CAPE-V scales. Descriptive data were recorded and two levels of agreement were calculated. Single rater reliability was calculated using a 2-way random model of absolute agreement for intraclass correlations (ICC [2,1]). RESULTS Participants use of the CAPE-V scales varied considerably, although most rated overall severity, breathiness, roughness and strain. Data from one participant did not meet a priori agreement criteria. Because outcomes were significantly different without their data, agreement and reliability were analyzed based on the reduced data set from 19 participants. Interrater agreement and reliability were comparable to previous research; the mean range of ratings was at least 47mm for all dimensions of voice quality. CONCLUSIONS Results indicated differential use of the components of the CAPE-V form and scales in evaluating voice quality and severity of dysphonia, including categorical variability among ratings of all of the primary CAPE-V dimensions of voice quality that may complicate the clinical description of a voice as mildly, moderately or severely dysphonic.
Collapse
Affiliation(s)
- Kathleen F Nagle
- Department of Speech-Language Pathology, School of Health & Medical Science, Seton Hall University, Nutley, New Jersey.
| |
Collapse
|
8
|
Park Y, Anand S, Kopf LM, Shrivastav R, Eddins DA. Interactions Between Breathy and Rough Voice Qualities and Their Contributions to Overall Dysphonia Severity. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4071-4084. [PMID: 36260821 PMCID: PMC9940885 DOI: 10.1044/2022_jslhr-22-00012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
PURPOSE Dysphonic voices typically present multiple voice quality dimensions. This study investigated potential interactions between perceived breathiness and roughness and their contributions to overall dysphonia severity. METHOD Synthetic stimuli based on four talkers were created to systematically map out potential interactions. For each talker, a stimulus matrix composed of 49 stimuli (seven breathiness steps × seven roughness steps) was created by varying aspiration noise and open quotient to manipulate breathiness and superimposing amplitude modulation of varying depths to simulate roughness. One-dimensional matching (1DMA) and magnitude estimation (1DME) tasks were used to measure perceived breathiness, roughness, their potential interactions, and overall dysphonia severity. Additional 1DME tasks were used to assess a set of natural stimuli that varied along both breathiness and roughness. RESULTS For the synthetic stimuli, the 1DMA task indicated little interaction between the two voice qualities. For the 1DME task, breathiness magnitude was influenced by roughness step to a greater extent than roughness magnitude was influenced by breathiness step. The additive contributions of breathiness and roughness to overall severity gradually diminished with increasing breathiness and roughness steps, possibly reflecting a ceiling effect in the 1DME task. For the natural stimuli, little consistent interaction was observed between breathiness and roughness. CONCLUSIONS The matching task revealed minimal interaction between perceived breathiness and roughness, whereas the magnitude estimation task revealed some interaction between the two qualities and their cumulative contributions to overall dysphonia severity. Task differences are discussed in terms of differences in response bias and the role of perceptual anchors. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21313701.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Lisa M. Kopf
- Department of Speech, Language and Hearing Sciences, The George Washington University, Washington, DC
| | - Rahul Shrivastav
- Office of the Provost and Executive Vice President, Indiana University, Bloomington
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing
| |
Collapse
|
9
|
Park Y, Anand S, Ozmeral EJ, Shrivastav R, Eddins DA. Predicting Perceived Vocal Roughness Using a Bio-Inspired Computational Model of Auditory Temporal Envelope Processing. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2748-2758. [PMID: 35867607 PMCID: PMC9911094 DOI: 10.1044/2022_jslhr-22-00101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/14/2022] [Accepted: 04/25/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Vocal roughness is often present in many voice disorders but the assessment of roughness mainly depends on the subjective auditory-perceptual evaluation and lacks acoustic correlates. This study aimed to apply the concept of roughness in general sound quality perception to vocal roughness assessment and to characterize the relationship between vocal roughness and temporal envelop fluctuation measures obtained from an auditory model. METHOD Ten /ɑ/ recordings with a wide range of roughness were selected from an existing database. Ten listeners rated the roughness of the recordings in a single-variable matching task. Temporal envelope fluctuations of the recordings were analyzed with an auditory processing model of amplitude modulation that utilizes a modulation filterbank of different modulation frequencies. Pitch strength and the smoothed cepstral peak prominence were also obtained for comparison. RESULTS Individual simple regression models yielded envelope standard deviation from a modulation filter with a low center frequency (64.3 Hz) as a statistically significant predictor of vocal roughness with a strong coefficient of determination (r 2 = .80). Pitch strength and CPPS were not significant predictors of roughness. CONCLUSION This result supports the possible utility of envelope fluctuation measures from an auditory model as objective correlates of vocal roughness.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Erol J. Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Rahul Shrivastav
- Office of the Provost & Executive Vice President, Indiana University Bloomington
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| |
Collapse
|
10
|
Angelakis E, Kotsani N, Georgaki A. Towards a Singing Voice Multi-Sensor Analysis Tool: System Design, and Assessment Based on Vocal Breathiness. SENSORS 2021; 21:s21238006. [PMID: 34884019 PMCID: PMC8659512 DOI: 10.3390/s21238006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/14/2021] [Accepted: 11/19/2021] [Indexed: 11/16/2022]
Abstract
Singing voice is a human quality that requires the precise coordination of numerous kinetic functions and results in a perceptually variable auditory outcome. The use of multi-sensor systems can facilitate the study of correlations between the vocal mechanism kinetic functions and the voice output. This is directly relevant to vocal education, rehabilitation, and prevention of vocal health issues in educators; professionals; and students of singing, music, and acting. In this work, we present the initial design of a modular multi-sensor system for singing voice analysis, and describe its first assessment experiment on the ‘vocal breathiness’ qualitative characteristic. A system case study with two professional singers was conducted, utilizing signals from four sensors. Participants sung a protocol of vocal trials in various degrees of intended vocal breathiness. Their (i) vocal output, (ii) phonatory function, and (iii) respiratory behavior-per-condition were recorded through a condenser microphone (CM), an Electroglottograph (EGG), and thoracic and abdominal respiratory effort transducers (RET), respectively. Participants’ individual respiratory management strategies were studied through qualitative analysis of RET data. Microphone audio samples breathiness degree was rated perceptually, and correlation analysis was performed between sample ratings and parameters extracted from CM and EGG data. Smoothed Cepstral Peak Prominence (CPPS) and vocal folds’ Open Quotient (OQ), as computed with the Howard method (HOQ), demonstrated the higher correlation coefficients, when analyzed individually. DECOM method-computed OQ (DOQ) was also examined. Interestingly, the correlation coefficient of pitch difference between estimates from CM and EGG signals appeared to be (based on the Pearson correlation coefficient) statistically insignificant (a result that warrants investigation in larger populations). The study of multi-variate models revealed even higher correlation coefficients. Models studied were the Acoustic Breathiness Index (ABI) and the proposed multiple regression model CDH (CPPS, DOQ, and HOQ), which was attempted in order to combine analysis results from microphone and EGG signals. The model combination of ABI and the proposed CDH appeared to yield the highest correlation with perceptual breathiness ratings. Study results suggest potential for the use of a completed system version in vocal pedagogy and research, as the case study indicated system practicality, a number of pertinent correlations, and introduced topics with further research possibilities.
Collapse
|
11
|
Using Pitch Height and Pitch Strength to Characterize Type 1, 2, and 3 Voice Signals. J Voice 2021; 35:181-193. [DOI: 10.1016/j.jvoice.2019.08.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 08/05/2019] [Accepted: 08/08/2019] [Indexed: 11/19/2022]
|
12
|
Anand S, Bottalico P, Gray C. Vocal Fatigue in Prospective Vocal Professionals. J Voice 2021; 35:247-258. [DOI: 10.1016/j.jvoice.2019.08.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/15/2019] [Accepted: 08/16/2019] [Indexed: 11/30/2022]
|
13
|
Rubin AD, Jackson-Menaldi C, Kopf LM, Marks K, Skeffington J, Skowronski MD, Shrivastav R, Hunter EJ. Comparison of Pitch Strength With Perceptual and Other Acoustic Metric Outcome Measures Following Medialization Laryngoplasty. J Voice 2020; 33:795-800. [PMID: 29773324 DOI: 10.1016/j.jvoice.2018.03.019] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2017] [Accepted: 03/27/2018] [Indexed: 11/15/2022]
Abstract
INTRODUCTION The diagnoses of voice disorders, as well as treatment outcomes, are often tracked using visual (eg, stroboscopic images), auditory (eg, perceptual ratings), objective (eg, from acoustic or aerodynamic signals), and patient report (eg, Voice Handicap Index and Voice-Related Quality of Life) measures. However, many of these measures are known to have low to moderate sensitivity and specificity for detecting changes in vocal characteristics, including vocal quality. OBJECTIVE The objective of this study was to compare changes in estimated pitch strength (PS) with other conventionally used acoustic measures based on the cepstral peak prominence (smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and clinical judgments of voice quality (GRBAS [grade, roughness, breathiness, asthenia, strain] scale) following laryngeal framework surgery. METHODS This study involved post hoc analysis of recordings from 22 patients pretreatment and post treatment (thyroplasty and behavioral therapy). Sustained vowels and connected speech were analyzed using objective measures (PS, smoothed cepstral peak prominence, cepstral spectral index of dysphonia, and acoustic voice quality index), and these results were compared with mean auditory-perceptual ratings by expert clinicians using the GRBAS scale. RESULTS All four acoustic measures changed significantly in the direction that usually indicates improved voice quality following treatment (P < 0.005). Grade and breathiness correlated the strongest with the acoustic measures (|r| ~ 0.7) with strain being the least correlated. CONCLUSIONS Acoustic analysis on running speech highly correlates with judged ratings. PS is a robust, easily obtained acoustic measure of voice quality that could be useful in the clinical environment to follow treatment of voice disorders.
Collapse
Affiliation(s)
- Adam D Rubin
- Lakeshore Ear, Nose, and Throat Center, Lakeshore Professional Voice Center, St. Clair Shores, Michigan; Department of Surgery, Oakland University William Beaumont School of Medicine, Rochester, Michigan.
| | - Cristina Jackson-Menaldi
- Lakeshore Ear, Nose, and Throat Center, Lakeshore Professional Voice Center, St. Clair Shores, Michigan; Department of Otolaryngology, School of Medicine, Wayne State University, Detroit, Michigan
| | - Lisa M Kopf
- Department of Communication Sciences and Disorders, University of Northern Iowa, Cedar Falls, Iowa
| | - Katherine Marks
- Lakeshore Ear, Nose, and Throat Center, Lakeshore Professional Voice Center, St. Clair Shores, Michigan
| | - Jean Skeffington
- Lakeshore Ear, Nose, and Throat Center, Lakeshore Professional Voice Center, St. Clair Shores, Michigan
| | - Mark D Skowronski
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Rahul Shrivastav
- Office of the Vice President for Instruction, University of Georgia, Athens, Georgia
| | - Eric J Hunter
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| |
Collapse
|
14
|
Anand S, Kopf LM, Shrivastav R, Eddins DA. Objective Indices of Perceived Vocal Strain. J Voice 2019; 33:838-845. [DOI: 10.1016/j.jvoice.2018.06.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 06/06/2018] [Accepted: 06/07/2018] [Indexed: 10/28/2022]
|
15
|
Vojtech JM, Segina RK, Buckley DP, Kolin KR, Tardif MC, Noordzij JP, Stepp CE. Refining algorithmic estimation of relative fundamental frequency: Accounting for sample characteristics and fundamental frequency estimation method. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3184. [PMID: 31795681 PMCID: PMC6847943 DOI: 10.1121/1.5131025] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 10/07/2019] [Accepted: 10/08/2019] [Indexed: 05/26/2023]
Abstract
Relative fundamental frequency (RFF) is a promising acoustic measure for evaluating voice disorders. Yet, the accuracy of the current RFF algorithm varies across a broad range of vocal signals. The authors investigated how fundamental frequency (fo) estimation and sample characteristics impact the relationship between manual and semi-automated RFF estimates. Acoustic recordings were collected from 227 individuals with and 256 individuals without voice disorders. Common fo estimation techniques were compared to the autocorrelation method currently implemented in the RFF algorithm. Pitch strength-based categories were constructed using a training set (1158 samples), and algorithm thresholds were tuned to each category. RFF was then computed on an independent test set (291 samples) using category-specific thresholds and compared against manual RFF via mean bias error (MBE) and root-mean-square error (RMSE). Auditory-SWIPE' for fo estimation led to the greatest correspondence with manual RFF and was implemented in concert with category-specific thresholds. Refining fo estimation and accounting for sample characteristics led to increased correspondence with manual RFF [MBE = 0.01 semitones (ST), RMSE = 0.28 ST] compared to the unmodified algorithm (MBE = 0.90 ST, RMSE = 0.34 ST), reducing the MBE and RMSE of semi-automated RFF estimates by 88.4% and 17.3%, respectively.
Collapse
Affiliation(s)
- Jennifer M Vojtech
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, USA
| | - Roxanne K Segina
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Daniel P Buckley
- Department of Otolaryngology-Head and Neck Surgery, Boston University School of Medicine, 72 East Concord Street, Boston, Massachusetts 02118, USA
| | - Katharine R Kolin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Monique C Tardif
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - J Pieter Noordzij
- Department of Otolaryngology-Head and Neck Surgery, Boston University School of Medicine, 72 East Concord Street, Boston, Massachusetts 02118, USA
| | - Cara E Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
16
|
Park Y, Perkell JS, Matthies ML, Stepp CE. Categorization in the Perception of Breathy Voice Quality and Its Relation to Voice Production in Healthy Speakers. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3655-3666. [PMID: 31525305 PMCID: PMC7201331 DOI: 10.1044/2019_jslhr-s-19-0048] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/12/2019] [Accepted: 06/12/2019] [Indexed: 05/24/2023]
Abstract
Purpose Previous studies of speech articulation have shown that individuals who can perceive smaller differences between similar-sounding phonemes showed larger contrasts in their productions of those phonemes. Here, a similar relationship was examined between the perception and production of breathy voice quality. Method Twenty females with healthy voices were recruited to participate in both a voice production and a perception experiment. Each participant produced repetitions of a sustained vowel, and acoustic correlates of breathiness were calculated. Identification and discrimination tasks were performed with a series of synthetic stimuli along a breathiness continuum. Categorical boundary location and boundary width were obtained from the identification task as a measurement of perception of breathiness. Spearman's correlation analysis was performed to estimate associations between values of boundary location and width and the acoustic correlates of breathiness from the participants' voices. Results Significant correlations between boundary width (r = -.53 to -.6) and some acoustic correlates were found, but no significant relationships were observed between boundary location and the acoustic correlates. Conclusions Speakers with small boundary widths, which suggest higher perceptual precision in differentiating breathiness, had typical voices that were less breathy, as estimated with acoustic measures, compared to speakers with large boundary widths. Our findings may support a link between perception and production of breathy voice quality. Supplemental Material https://doi.org/10.23641/asha.9808478.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
| | - Joseph S. Perkell
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge
| | | | - Cara E. Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology–Head and Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
17
|
Aaen M, McGlashan J, Thu KT, Sadolin C. Assessing and Quantifying Air Added to the Voice by Means of Laryngostroboscopic Imaging, EGG, and Acoustics in Vocally Trained Subjects. J Voice 2019; 35:326.e1-326.e11. [PMID: 31628046 DOI: 10.1016/j.jvoice.2019.09.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 08/31/2019] [Accepted: 09/04/2019] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To assess and quantify singers' strategies for adding air to phonation to sound "breathy" in a healthy manner STUDY DESIGN: Case-control study with 20 professional singers. METHODS Twenty singers were recorded performing sustained vowels in the Complete Vocal Technique Neutral vocal mode with and without audible air added to the voice by means of laryngostroboscopic imaging using a videonasoendoscopic camera system, electroglottography, long-term average spectrum, as well as acoustic signals and audio perception. Singers completed Voice Handicap Index and Reflux Symptom Index questionnaires prior to examination. RESULTS Air added to the voice resulted in an expected glottal gap along the length of the vocal folds, with little to no further difference in the supraglottic area, as compared with the Neutral phonation. Air added resulted in lowered Qx, mean Sound Pressure Level, and Cepstral Peak Prominence, but higher Harmonics-to-Noise Ratio, Jitter, and Shimmer, with decreased energy at the fundamental frequency. Adding audible air to the phonation did not exhibit similar effects on acoustics for males and females. Also, for females, H1-H2 difference decreased with air added, while it increased for males. CONCLUSION Singers produce an audible airy phonation similar yet significantly different to the breathy phonation reported for both healthy and pathological speakers.
Collapse
Affiliation(s)
- Mathias Aaen
- Complete Vocal Institute, Copenhagen K, Denmark.
| | - Julian McGlashan
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, UK
| | - Khaing Thu Thu
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, UK
| | | |
Collapse
|
18
|
Anand S, Skowronski MD, Shrivastav R, Eddins DA. Perceptual and Quantitative Assessment of Dysphonia Across Vowel Categories. J Voice 2019; 33:473-481. [DOI: 10.1016/j.jvoice.2017.12.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Accepted: 12/21/2017] [Indexed: 10/16/2022]
|
19
|
Ferrer CA, Haderlein T, Maryn Y, de Bodt MS, Nöth E. Collinearity and Sample Coverage Issues in the Objective Measurement of Vocal Quality: The Case of Roughness and Breathiness. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1-24. [PMID: 29222538 DOI: 10.1044/2017_jslhr-s-17-0136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 07/27/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE The aim of the study was to address the reported inconsistencies in the relationship between objective acoustic measures and perceptual ratings of vocal quality. METHOD This tutorial moves away from the more widely examined problems related to obtaining the perceptual ratings and the acoustic measures and centers in less scrutinized issues regarding the procedure to establish the correspondence. Expressions for the most common measure of association between perceptual and acoustic measures (Pearson's r) are derived using a multiple linear regression model. The particular case where the multiple linear regression involves only roughness and breathiness is discussed to illustrate the issues. RESULTS Most problems reported regarding inconsistent findings in the relationship between given acoustic measures and particular perceptual ratings could be linked to sample properties not directly related to the actual relationship. The influential sample properties are the collinearity between the regressors in the multiple linear regression and their relative variances. Recommendations on how to rule out this possible cause of inconsistency are given, varying in scope from data collection, reporting, manipulation, and results interpretation. CONCLUSIONS The problems described can be extended to more general cases than the exemplified roughness and breathiness sample's coverage. Ruling out this possible cause of inconsistency would increase the validity of the results reported.
Collapse
Affiliation(s)
- Carlos A Ferrer
- Electrical Engineering Faculty, Central University of Las Villas, Santa Clara, Cuba
- Department of Computer Science, University Erlangen-Nuremberg
| | - Tino Haderlein
- Department of Computer Science, University Erlangen-Nuremberg
| | - Youri Maryn
- Department of Otorhinolaryngology and Head and Neck Surgery, European Institute for ORL, Sint-Augustinus General Hospital, Antwerp, Belgium
| | - Marc S de Bodt
- Department of Communication Disorders, Antwerp University Hospital, Edegem, Oost Vlaanderen, Belgium
| | - Elmar Nöth
- Department of Computer Science, University Erlangen-Nuremberg
| |
Collapse
|
20
|
Kopf LM, Skowronski MD, Anand S, Eddins DA, Shrivastav R. The Perception of Breathiness in the Voices of Pediatric Speakers. J Voice 2017; 33:204-213. [PMID: 29162356 DOI: 10.1016/j.jvoice.2017.09.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/27/2017] [Accepted: 09/28/2017] [Indexed: 10/18/2022]
Abstract
BACKGROUND The perception of pediatric voice quality has been investigated using clinical protocols developed for adult voices and acoustic analyses designed to identify important physical parameters associated with normal and dysphonic pediatric voices. Laboratory investigations of adult dysphonia have included sophisticated methods, including a psychoacoustic approach that involves a single-variable matching task (SVMT), characterized by high inter- and intra-listener reliability, and analyses that include bio-inspired models of auditory perception that have provided valuable information regarding adult voice quality. OBJECTIVES To establish the utility of a psychoacoustic approach to the investigation of voice quality perception in the context of pediatric voices? METHODS Six listeners judged the breathiness of 20 synthetic vowel stimuli using an SVMT. To support comparisons with previous data, stimuli were modeled after four pediatric speakers and synthesized using Klatt with five parameter settings that influence the perception of breathiness. The population average breathiness judgments were modeled with acoustic measures of loudness ratio, pitch strength, and cepstral peak. RESULTS Listeners reliably judged the perceived breathiness of pediatric voices, as with previous investigations of breathiness in adult dysphonic voices. Breathiness judgments were accurately modeled by loudness ratio (r2 = 0.93), pitch strength (r2 = 0.91), and cepstral peak (r2 = 0.82). Model accuracy was not affected significantly by including stimulus fundamental frequency and was slightly higher for pediatric than for adult voices. CONCLUSIONS The SVMT proved robust for pediatric voices spanning a wide range of breathiness. The data indicate that this is a promising approach for future investigation of pediatric voice quality.
Collapse
Affiliation(s)
- Lisa M Kopf
- Department of Communication Sciences and Disorders, University of Northern Iowa, Cedar Falls, Iowa
| | | | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - David A Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
| | - Rahul Shrivastav
- Office of the Vice President for Instruction, University of Georgia, Athens, Georgia
| |
Collapse
|
21
|
Pitch Strength as an Outcome Measure for Treatment of Dysphonia. J Voice 2017; 31:691-696. [PMID: 28318967 DOI: 10.1016/j.jvoice.2017.01.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/27/2017] [Accepted: 01/30/2017] [Indexed: 11/22/2022]
Abstract
BACKGROUND Measurement of treatment outcomes is critical for the spectrum of voice treatments (ie, surgical, behavioral, or pharmacological). Outcome measures typically include visual (eg, stroboscopic data), auditory (eg, Consensus Auditory-Perceptual Evaluation of Voice; Grade, Roughness, Breathiness, Asthenia, Strain), and objective correlates of vocal fold vibratory characteristics, such as acoustic signals (eg, harmonics-to-noise ratio, cepstral peak prominence) or patient self-reported questionnaires (eg, Voice Handicap Index, Voice-Related Quality of Life). Subjective measures often show high variability, whereas most acoustic measures of voice are only valid for signals where some degree of periodicity can be assumed. However, this assumption is often invalid for dysphonic voices where signal periodicity is suspect. Furthermore, many of these measures are not useful in isolation for diagnostic purposes. OBJECTIVE We evaluated a recently developed algorithm (Auditory Sawtooth Waveform Inspired Pitch Estimator-Prime [Auditory-SWIPE']) for estimating pitch and pitch strength for dysphonic voices. Whereas fundamental frequency is a physical attribute of a signal, pitch is its psychophysical correlate. As such, the perception of pitch can extend to most signals irrespective of their periodicity. METHODS Post hoc analyses were conducted for three groups of patients evaluated and treated for voice problems at a major voice center: (1) muscle tension dysphonia/functional dysphonia, (2) vocal fold mass(es), and (3) presbyphonia. All patients were recorded before and after surgical/behavioral treatment for voice disorders. Pitch and pitch strength for each speaker were computed with the Auditory-SWIPE' algorithm. RESULTS Comparison of pre- and posttreatment data provides support for pitch strength as a measure of treatment outcomes for dysphonic voices.
Collapse
|