1
|
van Leer E, Lewis B, Porcaro N. Effect of an iOS App on Voice Therapy Adherence and Motivation. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2021; 30:210-227. [PMID: 33476177 PMCID: PMC8740599 DOI: 10.1044/2020_ajslp-19-00213] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 12/17/2019] [Accepted: 09/28/2020] [Indexed: 05/22/2023]
Abstract
Purpose Patients commonly report difficulties adhering to voice therapy. An iOS app was developed in our lab that assists practice via reminder notifications, instructional recordings, and cepstral peak prominence analysis results. The purpose of this study was to assess the effect of such homework support modality on adherence behavior and associated motivation in a comparison of app support and written homework instructions and to assess the usability and utility of the app. Method Thirty-four individuals exhibiting adducted hyperfunction were randomized to receive either written homework instructions or the app when practicing resonant voice exercises for 3 weeks. All patients digitally audio-recorded all home practice, provided self-reported estimates of generalization, and completed weekly motivation scales. Results App support significantly increased practice frequency but did not affect self-reported generalization or motivation. Practice was significantly predicted by System Usability Scale scores. Utility of reminders and instructions were good, but cepstral peak prominence feedback was considered useful to only a subset of participants. Conclusion Interactive mobile therapy support can significantly increase practice of resonant voice homework without influencing motivation.
Collapse
Affiliation(s)
- Eva van Leer
- Department of Communication Sciences and Disorders, Georgia State University, Atlanta
| | - Brittney Lewis
- Autonomous Reanimation and Evacuation Program, The Geneva Foundation, San Antonio, TX
| | | |
Collapse
|
2
|
Labuschagne IB, Ciocca V. The effect of vocal tract parameters on aspiration noise discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1239. [PMID: 32113289 DOI: 10.1121/10.0000756] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/30/2020] [Indexed: 06/10/2023]
Abstract
Previous research showed that aspiration noise difference limens in moderately breathy /a/ vowels decreased as the spectral slope of the glottal source spectrum became increasingly steep [Kreiman and Gerratt, J. Acoust. Soc. Am. 131(1), 492-500 (2012)]. The current study investigated whether discrimination of aspiration noise levels was affected by differences in spectral shape due to vowel quality (/æ/ and /i/) and speaker identity (three male speakers) when the slope of the glottal source spectrum was fixed. The results showed that discrimination performance was worse overall for /i/ than /æ/, but the result may have resulted from relatively poor performance for the /i/ vowel of one speaker. Acoustic analyses of the stimuli were performed to estimate the association between acoustic properties and the perceptual outcomes. The results showed that both the smoothed cepstral peak prominence and the harmonic energy level between 2 and 5 kHz may account for the observed differences in aspiration noise discrimination among speakers within each vowel, but not for differences between vowel categories. It is possible that the relationship between the aspiration noise discrimination and aforementioned acoustic properties may be modulated by the spectral distribution of energy across frequency.
Collapse
Affiliation(s)
- Ilse B Labuschagne
- School of Audiology and Speech Sciences, The University of British Columbia, 2177 Wesbrook Mall, Vancouver, British Columbia V6T 1Z3, Canada
| | - Valter Ciocca
- School of Audiology and Speech Sciences, The University of British Columbia, 2177 Wesbrook Mall, Vancouver, British Columbia V6T 1Z3, Canada
| |
Collapse
|
3
|
Ferrer CA, Haderlein T, Maryn Y, de Bodt MS, Nöth E. Collinearity and Sample Coverage Issues in the Objective Measurement of Vocal Quality: The Case of Roughness and Breathiness. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1-24. [PMID: 29222538 DOI: 10.1044/2017_jslhr-s-17-0136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 07/27/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE The aim of the study was to address the reported inconsistencies in the relationship between objective acoustic measures and perceptual ratings of vocal quality. METHOD This tutorial moves away from the more widely examined problems related to obtaining the perceptual ratings and the acoustic measures and centers in less scrutinized issues regarding the procedure to establish the correspondence. Expressions for the most common measure of association between perceptual and acoustic measures (Pearson's r) are derived using a multiple linear regression model. The particular case where the multiple linear regression involves only roughness and breathiness is discussed to illustrate the issues. RESULTS Most problems reported regarding inconsistent findings in the relationship between given acoustic measures and particular perceptual ratings could be linked to sample properties not directly related to the actual relationship. The influential sample properties are the collinearity between the regressors in the multiple linear regression and their relative variances. Recommendations on how to rule out this possible cause of inconsistency are given, varying in scope from data collection, reporting, manipulation, and results interpretation. CONCLUSIONS The problems described can be extended to more general cases than the exemplified roughness and breathiness sample's coverage. Ruling out this possible cause of inconsistency would increase the validity of the results reported.
Collapse
Affiliation(s)
- Carlos A Ferrer
- Electrical Engineering Faculty, Central University of Las Villas, Santa Clara, Cuba
- Department of Computer Science, University Erlangen-Nuremberg
| | - Tino Haderlein
- Department of Computer Science, University Erlangen-Nuremberg
| | - Youri Maryn
- Department of Otorhinolaryngology and Head and Neck Surgery, European Institute for ORL, Sint-Augustinus General Hospital, Antwerp, Belgium
| | - Marc S de Bodt
- Department of Communication Disorders, Antwerp University Hospital, Edegem, Oost Vlaanderen, Belgium
| | - Elmar Nöth
- Department of Computer Science, University Erlangen-Nuremberg
| |
Collapse
|
4
|
Kopf LM, Skowronski MD, Anand S, Eddins DA, Shrivastav R. The Perception of Breathiness in the Voices of Pediatric Speakers. J Voice 2017; 33:204-213. [PMID: 29162356 DOI: 10.1016/j.jvoice.2017.09.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/27/2017] [Accepted: 09/28/2017] [Indexed: 10/18/2022]
Abstract
BACKGROUND The perception of pediatric voice quality has been investigated using clinical protocols developed for adult voices and acoustic analyses designed to identify important physical parameters associated with normal and dysphonic pediatric voices. Laboratory investigations of adult dysphonia have included sophisticated methods, including a psychoacoustic approach that involves a single-variable matching task (SVMT), characterized by high inter- and intra-listener reliability, and analyses that include bio-inspired models of auditory perception that have provided valuable information regarding adult voice quality. OBJECTIVES To establish the utility of a psychoacoustic approach to the investigation of voice quality perception in the context of pediatric voices? METHODS Six listeners judged the breathiness of 20 synthetic vowel stimuli using an SVMT. To support comparisons with previous data, stimuli were modeled after four pediatric speakers and synthesized using Klatt with five parameter settings that influence the perception of breathiness. The population average breathiness judgments were modeled with acoustic measures of loudness ratio, pitch strength, and cepstral peak. RESULTS Listeners reliably judged the perceived breathiness of pediatric voices, as with previous investigations of breathiness in adult dysphonic voices. Breathiness judgments were accurately modeled by loudness ratio (r2 = 0.93), pitch strength (r2 = 0.91), and cepstral peak (r2 = 0.82). Model accuracy was not affected significantly by including stimulus fundamental frequency and was slightly higher for pediatric than for adult voices. CONCLUSIONS The SVMT proved robust for pediatric voices spanning a wide range of breathiness. The data indicate that this is a promising approach for future investigation of pediatric voice quality.
Collapse
Affiliation(s)
- Lisa M Kopf
- Department of Communication Sciences and Disorders, University of Northern Iowa, Cedar Falls, Iowa
| | | | - Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - David A Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
| | - Rahul Shrivastav
- Office of the Vice President for Instruction, University of Georgia, Athens, Georgia
| |
Collapse
|
5
|
Araújo F, Filho J, Klautau A. Genetic algorithm to estimate the input parameters of Klatt and HLSyn formant-based speech synthesizers. Biosystems 2016; 150:190-193. [PMID: 27769749 DOI: 10.1016/j.biosystems.2016.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Revised: 10/09/2016] [Accepted: 10/10/2016] [Indexed: 10/20/2022]
Abstract
Voice imitation basically consists in estimating a synthesizer's input parameters to mimic a target speech signal. This is a difficult inverse problem because the mapping is time-varying, non-linear and from many to one. It typically requires considerable amount of time to be done manually. This work presents the evolution of a system based on a genetic algorithm (GA) to automatically estimate the input parameters of the Klatt and HLSyn formant synthesizers using an analysis-by-synthesis process. Results are presented for natural (human-generated) speech for three male speakers. The results obtained with the GA-based system outperform those obtained with the baseline Winsnoori with respect to four objective figures of merit and a subjective test. The GA with Klatt synthesizer generated similar voices to the target and the subjective tests indicate an improvement in the quality of the synthetic voices when compared to the ones produced by the baseline.
Collapse
Affiliation(s)
- Fabíola Araújo
- Signal Processing Laboratory (LaPS) - Federal University of Pará, Rua Augusto Corrêa 01, Belém, PA, Brazil.
| | - José Filho
- Signal Processing Laboratory (LaPS) - Federal University of Pará, Rua Augusto Corrêa 01, Belém, PA, Brazil.
| | - Aldebaro Klautau
- Signal Processing Laboratory (LaPS) - Federal University of Pará, Rua Augusto Corrêa 01, Belém, PA, Brazil.
| |
Collapse
|
6
|
Garellek M, Samlan R, Gerratt BR, Kreiman J. Modeling the voice source in terms of spectral slopes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:1404-10. [PMID: 27036277 PMCID: PMC4818273 DOI: 10.1121/1.4944474] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 12/24/2015] [Accepted: 03/03/2016] [Indexed: 05/20/2023]
Abstract
A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1-H2), the second and fourth harmonics (H2-H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4-2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz-5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1-H2, H2-H4, and H4-2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4-2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality.
Collapse
Affiliation(s)
- Marc Garellek
- Department of Linguistics, University of California, San Diego, 9500 Gilman Drive #0108, La Jolla, California 92023-0108, USA
| | - Robin Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA
| |
Collapse
|
7
|
Erickson ML. Acoustic Properties of the Voice Source and the Vocal Tract: Are They Perceptually Independent? J Voice 2016; 30:772.e9-772.e22. [PMID: 26822389 DOI: 10.1016/j.jvoice.2015.11.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 11/13/2015] [Indexed: 10/22/2022]
Abstract
OBJECTIVE/HYPOTHESIS This study sought to determine whether the properties of the voice source and vocal tract are perceptually independent. STUDY DESIGN Within-subjects design. METHODS This study employed a paired-comparison paradigm where listeners heard synthetic voices and rated them as same or different using a visual analog scale. Stimuli were synthesized using three different source slopes and two different formant patterns (mezzo-soprano and soprano) on the vowel /a/ at four pitches: A3, C4, B4, and F5. RESULTS Whereas formant pattern was the strongest effect, difference in source slope also affected perceived quality difference. Source slope and formant pattern were not independently perceived. CONCLUSION These results suggest that when judging laryngeal adduction using perceptual information, judgments may not be accurate when the stimuli are of differing formant patterns.
Collapse
Affiliation(s)
- Molly L Erickson
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee..
| |
Collapse
|
8
|
Schoentgen J, Fraj S, Lucero JC. Testing the reliability of Grade, Roughness and Breathiness scores by means of synthetic speech stimuli. LOGOP PHONIATR VOCO 2013; 40:5-13. [PMID: 24117123 DOI: 10.3109/14015439.2013.837502] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
This article describes a synthesizer of disordered voices and reports a test of the reliability of Grade, Roughness, and Breathiness scores assigned to synthetic stimuli by eight expert listeners in two sessions. Speech stimuli [a], [i], [u], [ai], and [ia] were synthesized with three values of vocal frequency and four levels of vocal jitter and pulsatile additive noise each. The agreement and correlation of scores assigned by the same rater in different sessions, or by different raters in the same session, accord with published data. Only a small part of the variance of the arithmetic differences between the scores that are assigned to the same stimulus is explained by the stimuli properties. The conclusion is that differences between scores that are assigned to the same stimulus are not attributable to biases of individual raters; such biases would shift all the scores assigned on a scale, and the shift would be interpretable in terms of the properties of the stimuli.
Collapse
Affiliation(s)
- Jean Schoentgen
- Department of Signals, Images and Telecommunication Devices, Faculty of Applied Sciences, CP 165/51, Université Libre de Bruxelles, 50, Av. F.-D. Roosevelt , B-1050 Brussels , Belgium
| | | | | |
Collapse
|
9
|
Garellek M, Keating P, Esposito CM, Kreiman J. Voice quality and tone identification in White Hmong. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:1078-89. [PMID: 23363123 PMCID: PMC3574099 DOI: 10.1121/1.4773259] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Revised: 11/18/2012] [Accepted: 11/29/2012] [Indexed: 05/24/2023]
Abstract
This study investigates the importance of source spectrum slopes in the perception of phonation by White Hmong listeners. In White Hmong, nonmodal phonation (breathy or creaky voice) accompanies certain lexical tones, but its importance in tonal contrasts is unclear. In this study, native listeners participated in two perceptual tasks, in which they were asked to identify the word they heard. In the first task, participants heard natural stimuli with manipulated F0 and duration (phonation unchanged). Results indicate that phonation is important in identifying the breathy tone, but not the creaky tone. Thus, breathiness can be viewed as contrastive in White Hmong. Next, to understand which parts of the source spectrum listeners use to perceive contrastive breathy phonation, source spectrum slopes were manipulated in the second task to create stimuli ranging from modal to breathy sounding, with F0 held constant. Results indicate that changes in H1-H2 (difference in amplitude between the first and second harmonics) and H2-H4 (difference in amplitude between the second and fourth harmonics) are independently important for distinguishing breathy from modal phonation, consistent with the view that the percept of breathiness is influenced by a steep drop in harmonic energy in the lower frequencies.
Collapse
Affiliation(s)
- Marc Garellek
- Phonetics Laboratory, Department of Linguistics, University of California, Los Angeles, California 90095-1543, USA.
| | | | | | | |
Collapse
|
10
|
Anand S, Shrivastav R, Wingate JM, Chheda NN. An Acoustic-Perceptual Study of Vocal Tremor. J Voice 2012; 26:811.e1-7. [DOI: 10.1016/j.jvoice.2012.02.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 02/27/2012] [Indexed: 10/28/2022]
|
11
|
Fraj S, Schoentgen J, Grenez F. Development and perceptual assessment of a synthesizer of disordered voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:2603-2615. [PMID: 23039453 DOI: 10.1121/1.4751536] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
A synthesizer is based on a nonlinear wave-shaping model of the glottal area, an algebraic model of the glottal aerodynamics as well as concatenated-tube models of the trachea and vocal tract. Voice disorders are simulated by way of models of vocal frequency jitter and tremor, vocal amplitude shimmer and tremor, as well as pulsatile additive noise. Six experiments have been carried out to assess the synthesizer perceptually. Three experiments involve the perceptual categorization of male synthetic and human stimuli and one the auditory discrimination between synthetic and human tokens. A fifth experiment reports the auditory discrimination between synthetic tokens with different levels of additive and modulation noise. A sixth experiment reports the scoring by expert listeners of male synthetic stimuli on equal-appearing interval scales grade-roughness-breathiness (GRB). A first objective is to demonstrate the ability of the synthesizer to simulate vowel sounds that are valid exemplars of speech sounds produced by humans with voice disorders. A second objective is to learn how human expert raters perceptually map vocal frequency, additive and modulation noise as well as vowel categories into scores on GRB scales.
Collapse
Affiliation(s)
- Samia Fraj
- Laboratory of Signals Images and Telecommunication Devices, CP 165/51, Faculty of Applied Sciences, Université Libre de Bruxelles, 50 Avenue F.-D. Roosevelt, B-1050 Brussels, Belgium
| | | | | |
Collapse
|
12
|
Brunelle M. Dialect experience and perceptual integrality in phonological registers: fundamental frequency, voice quality and the first formant in Cham. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:3088-3102. [PMID: 22501082 DOI: 10.1121/1.3693651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The perceptual integrality of f0, F1 and voice quality is investigated by looking at register, a phonological contrast that relies on these three properties in three dialects of Cham, an Austronesian language of Mainland Southeast Asia. The results of a Garner classification experiment confirm that the three acoustic properties integrate perceptually and that their patterns of integrality are similar in the three dialects. Moreover, they show that dialect-specific sensitivity to acoustic properties can cause salient dimensions to override weaker ones. Finally, the patterns of integrality found in Cham suggest that auditory integrality is not limited to acoustically similar properties.
Collapse
Affiliation(s)
- Marc Brunelle
- Department of Linguistics, University of Ottawa, 70 Laurier East, office 429, Ottawa K1N 6N5, Canada.
| |
Collapse
|
13
|
Patel S, Shrivastav R, Eddins DA. Developing a single comparison stimulus for matching breathy voice quality. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:639-47. [PMID: 22215034 PMCID: PMC3612287 DOI: 10.1044/1092-4388(2011/10-0337)] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
PURPOSE In this experiment, a single comparison stimulus was developed as a reference in a perceptual matching task for the quantification of breathy voice quality. Perceptual judgments of a set of synthetic voice samples were compared to previous data obtained using multiple comparison stimuli "customized" for different voices (Patel, Shrivastav, & Eddins, 2010). METHOD Five male and 5 female samples of the vowel /a/ were selected from the Kay Elemetrics Disordered Voice Database and resynthesized using a Klatt synthesizer. Eleven samples were created for each base voice by manipulating the aspiration noise level. Five samples from each continuum were evaluated in a perceptual matching task in which a single sawtooth and noise comparison stimulus was used to obtain breathiness judgments. Linear regression was used to compare measurements obtained using the new comparison stimulus against the customized comparison stimuli. RESULTS Results indicated that the noncustomized sawtooth comparison provides reliability and perceptual distances between stimuli similar to those obtained using customized comparison stimuli. CONCLUSION A single-variable matching task using a single comparison stimulus can be used to obtain perceptual estimates of breathiness across voices and experiments in a laboratory setting. This technique will help develop models of voice-quality perception.
Collapse
Affiliation(s)
- Sona Patel
- University of Florida, Gainesville, FL, USA.
| | | | | |
Collapse
|
14
|
Kreiman J, Gerratt BR. Perceptual interaction of the harmonic source and noise in voice. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:492-500. [PMID: 22280610 PMCID: PMC3283904 DOI: 10.1121/1.3665997] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Although the amount of inharmonic energy (noise) present in a human voice is an important determinant of vocal quality, little is known about the perceptual interaction between harmonic and inharmonic aspects of the voice source. This paper reports three experiments investigating this issue. Results indicate that perception of the harmonic slope and of noise levels are both influenced by complex interactions between the spectral shape and relative levels of harmonic and noise energy in the voice source. Just-noticeable differences (JNDs) for the noise-to-harmonics ratio (NHR) varied significantly with the NHR and harmonic spectral slope, but NHR had no effect on JNDs for NHR when harmonic slopes were steepest, and harmonic slope had no effect when NHRs were highest. Perception of changes in the harmonic source slope depended on NHR and on the harmonic source slope: JNDs increased when spectra rolled off steeply, with this effect in turn depending on NHR. Finally, all effects were modulated by the shape of the noise spectrum. It thus appears that, beyond masking, understanding perception of individual parameters requires knowledge of the acoustic context in which they function, consistent with the view that voices are integral patterns that resist decomposition.
Collapse
Affiliation(s)
- Jody Kreiman
- Division of Head and Neck Surgery, UCLA School of Medicine, 31-24 Rehab Center, Los Angeles, California 90095-1794, USA.
| | | |
Collapse
|
15
|
Esposito CM. The perception of pathologically-disordered phonation by Gujarati, English, and Spanish listeners. LANGUAGE AND SPEECH 2011; 54:415-430. [PMID: 22070046 DOI: 10.1177/0023830911402605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
This study investigates the influence of linguistic experience on the perception of pathologically-disordered voices using 18 listeners of American English, which has allophonic breathiness, 12 listeners of Gujarati, which contrasts breathy and modal vowels, and 18 listeners of Spanish, which has neither allophonic nor phonemic breathiness. Listeners rated the similarity of pairs of pathologically-disordered voices. Multidimensional scaling was used to determine the properties that were most correlated with perception for each listener group. Results showed that Gujaratis' perception was correlated with the difference between the amplitude of the first (HI*) and second (H2*) harmonic (HI*-H2*), which is associated with the production of phonation in Gujarati. English listeners' judgments were correlated with the measure HI*-H2* and cepstral peak prominence, and Spanish listeners' judgments were correlated with HI*-H2 and HI*-AI* (the amplitude of the principal harmonic near the first formant). When compared to Esposito (2006), which asked the same listeners to rate the similarity of breathy and modal vowels from Mazatec, results showed that Gujarati listeners classified the pathologically-disordered stimuli in the same way that they classified the Mazatec stimuli, while English and Spanish listeners perceived the pathologically-disordered stimuli and the Mazatec stimuli in slightly different ways.
Collapse
Affiliation(s)
- Christina M Esposito
- Department of Linguistics, Macalester College, 1600 Grand Ave., St. Paul, MN 55105, USA.
| |
Collapse
|
16
|
Shrivastav R, Camacho A, Patel S, Eddins DA. A model for the prediction of breathiness in vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1605-15. [PMID: 21428523 PMCID: PMC3077964 DOI: 10.1121/1.3543993] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Revised: 10/29/2010] [Accepted: 12/29/2010] [Indexed: 05/19/2023]
Abstract
The perception of breathiness in vowels is cued by multiple acoustic cues, including changes in aspiration noise (AH) and the open quotient (OQ) [Klatt and Klatt, J. Acoust. Soc. Am. 87(2), 820-857 (1990)]. A loudness model can be used to determine the extent to which AH masks the harmonic components in voice. The resulting "partial loudness" (PL) and loudness of AH ["noise loudness" (NL)] have been shown to be good predictors of perceived breathiness [Shrivastav and Sapienza, J. Acoust. Soc. Am. 114(1), 2217-2224 (2003)]. The levels of AH and OQ were systematically manipulated for ten synthetic vowels. Perceptual judgments of breathiness were obtained and regression functions to predict breathiness from the ratio of NL to PL (η) were derived. Results show that breathiness can be modeled as a power function of η. The power parameter of this function appears to be affected by the fundamental frequency of the vowel. A second experiment was conducted to determine if the resulting power function could estimate breathiness in a different set of voices. The breathiness of these stimuli, both natural and synthetic, was determined in a listening test. The model estimates of breathiness were highly correlated with perceptual data but the absolute predicted values showed some discrepancies.
Collapse
Affiliation(s)
- Rahul Shrivastav
- Department of Communication Sciences and Disorders, University of Florida and Malcom Randall VAMC, Dauer Hall, P.O. Box 117420, Gainesville, Florida 32611, USA.
| | | | | | | |
Collapse
|
17
|
Kreiman J, Gerratt BR, Khan SUD. Effects of native language on perception of voice quality. JOURNAL OF PHONETICS 2010; 38:588-593. [PMID: 21152109 PMCID: PMC2997695 DOI: 10.1016/j.wocn.2010.08.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Little is known about how listeners judge phonemic versus allophonic (or freely varying) versus post-lexical variations in voice quality, or about which acoustic attributes serve as perceptual cues in specific contexts. To address this issue, native speakers of Gujarati, Thai, and English discriminated among pairs of voices that differed only in the relative amplitudes of the first versus second harmonics (H1-H2). Results indicate that speakers of Gujarati (which contrasts H1-H2 phonemically) were more sensitive to changes than are speakers of Thai or English. Further, sensitivity was not affected by the overall source spectral slope for Gujarati speakers, unlike Thai and English speakers, who were most sensitive when the spectrum fell away steeply. In combination with previous findings from Mandarin speakers, these results suggest a continuum of sensitivity to H1-H2. In Gujarati, the independence of sensitivity and spectral context is consistent with use of H1-H2 as a cue to the language's phonemic phonation contrast. Speakers of Mandarin, in which creaky phonation occurs in conjunction with the low dipping Tone 3, apparently also learn to hear these contrasts, but sensitivity is conditioned by spectral context. Finally, for Thai and English speakers, who vary phonation only post-lexically, sensitivity is both lower and contextually-determined, reflecting the smaller role of H1-H2 in these languages.
Collapse
|
18
|
Kreiman J, Gerratt BR. Perceptual sensitivity to first harmonic amplitude in the voice source. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:2085-9. [PMID: 20968379 PMCID: PMC2981120 DOI: 10.1121/1.3478784] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Revised: 06/18/2010] [Accepted: 07/19/2010] [Indexed: 05/21/2023]
Abstract
Little is known about the perceptual importance of changes in the shape of the source spectrum, although many measures have been proposed and correlations with different vocal qualities (breathiness, roughness, nasality, strain...) have frequently been reported. This study investigated just-noticeable differences in the relative amplitudes of the first two harmonics (H1-H2) for speakers of Mandarin and English. Listeners heard pairs of vowels that differed only in the amplitude of the first harmonic and judged whether or not the voice tokens were identical in voice quality. Across voices and listeners, just-noticeable-differences averaged 3.18 dB. This value is small relative to the range of values across voices, indicating that H1-H2 is a perceptually valid acoustic measure of vocal quality. For both groups of listeners, differences in the amplitude of the first harmonic were easier to detect when the source spectral slope was steeply falling so that F0 dominated the spectrum. Mandarin speakers were significantly more sensitive (by about 1 dB) to differences in first harmonic amplitudes than were English speakers. Two explanations for these results are possible: Mandarin speakers may have learned to hear changes in harmonic amplitudes due to changes in voice quality that are correlated with the tones of Mandarin; or Mandarin speakers' experience with tonal contrasts may increase their sensitivity to small differences in the amplitude of F0 (which is also the first harmonic).
Collapse
Affiliation(s)
- Jody Kreiman
- Division of Head and Neck Surgery, School of Medicine, University of California, Los Angeles, 31-24 Rehab Center, Los Angeles, California 90095-1794, USA.
| | | |
Collapse
|
19
|
Helou LB, Solomon NP, Henry LR, Coppit GL, Howard RS, Stojadinovic A. The role of listener experience on Consensus Auditory-perceptual Evaluation of Voice (CAPE-V) ratings of postthyroidectomy voice. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2010; 19:248-258. [PMID: 20484704 DOI: 10.1044/1058-0360(2010/09-0012)] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
PURPOSE To determine whether experienced and inexperienced listeners rate postthyroidectomy voice samples similarly using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). METHOD Prospective observational study of voice quality ratings of randomized and blinded voice samples was performed. Twenty-one postthyroidectomy patients' voices, representing a wide range of severities, were rated using a custom-automated version of the CAPE-V. Ten male and 11 female voices were rated by 10 experienced and 10 inexperienced listeners. Experienced listeners consisted of 5 otolaryngologists (ENTs) and 5 speech-language pathologists (SLPs); inexperienced listeners were medical professionals with no formal training or experience in voice disorders. RESULTS Inexperienced listeners rated voices as more severely impaired than experienced listeners for all CAPE-V parameters (p < or = .003). Those without experience in voice disorders had lower intra- and interrater reliability (e.g., r = .838 and .528, respectively, for overall severity) than those with experience in voice disorders (e.g., r = .911 and .722, respectively, for overall severity) for all parameters. Among experienced listeners, ENTs and SLPs rated voices similarly for most parameters. CONCLUSIONS Experienced and inexperienced listeners judged voice quality differently given minimal training with the use of the CAPE-V. SLPs and ENTs rated postthyroidectomy voice quality similarly. These findings indicate that the CAPE-V can be used reliably and similarly by professionals who specialize in voice disorders.
Collapse
Affiliation(s)
- Leah B Helou
- Walter Reed Army Medical Center, Washington, DC, USA.
| | | | | | | | | | | |
Collapse
|
20
|
The Effect of Musical Background on Judgments of Dysphonia. J Voice 2010; 24:93-101. [DOI: 10.1016/j.jvoice.2008.04.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Accepted: 04/30/2008] [Indexed: 11/22/2022]
|
21
|
Shrivastav R, Camacho A. A computational model to predict changes in breathiness resulting from variations in aspiration noise level. J Voice 2009; 24:395-405. [PMID: 19896328 DOI: 10.1016/j.jvoice.2008.12.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Accepted: 12/01/2008] [Indexed: 10/20/2022]
Abstract
Perception of breathy voice quality is cued by a number of acoustic changes including an increase in aspiration noise level (AH) and spectral slope. Changes in AH in a vowel may be evaluated through measures such as the harmonic-to-noise ratio, cepstral peak prominence (CPP), or via auditory measures such as the partial loudness of harmonic energy and loudness of aspiration noise. Although a number of experiments have reported high correlation between such measures and ratings of perceived breathiness, a formal model to predict breathiness of a vowel has not been proposed. This research describes two computational models to predict changes in breathiness resulting from variations in AH. One model uses auditory measures, whereas the other uses CPP as independent variables to predict breathiness. For both cases, a translated and truncated power function is required to predict breathiness. Some parameters in both of these models were observed to be pitch dependent. The "unified" model based on auditory measures was observed to be more accurate than one based on CPP.
Collapse
Affiliation(s)
- Rahul Shrivastav
- Department of Communication Sciences and Disorders, University of Florida, Gainesville, Florida 32611, USA.
| | | |
Collapse
|
22
|
Castillo-Guerra E, Ruiz A. Automatic Modeling of Acoustic Perception of Breathiness in Pathological Voices. IEEE Trans Biomed Eng 2009; 56:932-40. [DOI: 10.1109/tbme.2008.2007910] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
23
|
Kreiman J, Gerratt BR, Ito M. When and why listeners disagree in voice quality assessment tasks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 122:2354-64. [PMID: 17902870 DOI: 10.1121/1.2770547] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Modeling sources of listener variability in voice quality assessment is the first step in developing reliable, valid protocols for measuring quality, and provides insight into the reasons that listeners disagree in their quality assessments. This study examined the adequacy of one such model by quantifying the contributions of four factors to interrater variability: instability of listeners' internal standards for different qualities, difficulties isolating individual attributes in voice patterns, scale resolution, and the magnitude of the attribute being measured. One hundred twenty listeners in six experiments assessed vocal quality in tasks that differed in scale resolution, in the presence/absence of comparison stimuli, and in the extent to which the comparison stimuli (if present) matched the target voices. These factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their assessments. Providing listeners with comparison stimuli that matched the target voices doubled the likelihood that they would agree exactly. Listeners also agreed significantly better when assessing quality on continuous versus six-point scales. These results indicate that interrater variability is an issue of task design, not of listener unreliability.
Collapse
Affiliation(s)
- Jody Kreiman
- Division of Head and Neck Surgery, UCLA School of Medicine, 31-24 Rehab Center, Los Angeles, California 90095, USA.
| | | | | |
Collapse
|