1
|
Crocker C, Toles LE, Morrison RA, Shembel AC. Relationships Between Vocal Fold Adduction Patterns, Vocal Acoustic Quality, and Vocal Effort in Individuals With and Without Hyperfunctional Voice Disorders. J Voice 2024:S0892-1997(23)00405-8. [PMID: 38195336 DOI: 10.1016/j.jvoice.2023.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 01/11/2024]
Abstract
OBJECTIVES/HYPOTHESIS Increased vocal effort and aberrant vocal quality are often attributed to vocal fold hyperadduction in hyperfunctional voice disorders. However, there are currently no established methods to quantify vocal fold adduction beyond subjective descriptors in this clinical population. Furthermore, relationships between vocal fold adduction patterns, vocal effort severity, and vocal quality are not well characterized. Therefore, the objectives of this study were to (1) quantify vocal fold adduction, applying a previously validated method developed for patients with vocal fold paralysis, and (2) correlate these measures with acoustic vocal quality and self-perceived measures of vocal effort severity. METHODS A deep learning program, Automated Glottic Action Tracking using artificial Intelligence, was used to track glottic angle configurations and vocal fold adduction velocities on laryngoscopic videos across 60 laryngoscopies (20 primary muscle tension dysphonia [pMTD], 20 phonotraumatic lesions, and 20 healthy controls). Voice samples were also acquired, and cepstral peak prominence (CPP) and H1-H2 acoustic measures were used to quantify vocal quality. Participants were also asked to rate their vocal effort on a 100 mm visual analog scale. RESULTS There were no significant group differences in glottic angle configurations or vocal fold adduction velocities, although there were trends toward increased peak vocal fold adduction velocities in patients with hyperfunctional voice disorders compared to controls. Vocal effort was significantly higher in the two hyperfunctional groups compared to controls. CPP was significantly lower in the pMTD group, but there were no group differences in acoustic parameters between any of the other groups or for H1-H2 values. CONCLUSION Despite significantly more vocal effort reported in patients with hyperfunctional voice disorders, there were no significant group differences in vocal fold adduction patterns. These findings suggest other physiologic mechanisms may also be responsible for the symptoms and genesis of pMTD and benign vocal fold lesions.
Collapse
Affiliation(s)
- Caroline Crocker
- School of Behavioral and Brain Sciences, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, Texas
| | - Laura E Toles
- Department of Otolaryngology-Head and Neck Surgery, Voice Center, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Robert A Morrison
- School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas
| | - Adrianna C Shembel
- School of Behavioral and Brain Sciences, Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, Texas; Department of Otolaryngology-Head and Neck Surgery, Voice Center, University of Texas Southwestern Medical Center, Dallas, Texas.
| |
Collapse
|
2
|
Baker CP, Brockmann-Bauser M, Purdy SC, Rakena TO. High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures. J Voice 2023:S0892-1997(23)00316-8. [PMID: 37925330 DOI: 10.1016/j.jvoice.2023.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/05/2023] [Accepted: 10/05/2023] [Indexed: 11/06/2023]
Abstract
OBJECTIVES This in silico study explored the effects of a wide range of fundamental frequency (fo), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures. METHOD Using 53 synthesized tones produced in Madde, the effects of stepwise increases in fo, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any fo effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation. RESULTS Increasing fo resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored fo range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde fo condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0.05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05). CONCLUSION These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying fo, SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and postintervention data using these measures.
Collapse
Affiliation(s)
- Calvin Peter Baker
- Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand; School of Music, University of Auckland, Auckland, New Zealand.
| | - Meike Brockmann-Bauser
- Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Suzanne C Purdy
- Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand
| | - Te Oti Rakena
- School of Music, University of Auckland, Auckland, New Zealand
| |
Collapse
|
3
|
Baker CP, Sundberg J, Purdy SC, Rakena TO. Female adolescent singing voice characteristics: an exploratory study using LTAS and inverse filtering. LOGOP PHONIATR VOCO 2022:1-13. [PMID: 36322641 DOI: 10.1080/14015439.2022.2140455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/18/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
Background and Aim: To date, little research is available that objectively quantifies female adolescent singing-voice characteristics in light of the physiological and functional developments that occur from puberty to adulthood. This exploratory study sought to augment the pool of data available that offers objective voice analysis of female singers in late adolescence.Methods: Using long-term average spectra (LTAS) and inverse filtering techniques, dynamic range and voice-source characteristics were determined in a cohort of vocally healthy cis-gender female adolescent singers (17 to 19 years) from high-school choirs in Aotearoa New Zealand. Non-parametric statistics were used to determine associations and significant differences.Results: Wide intersubject variation was seen between dynamic range, spectral measures of harmonic organisation (formant cluster prominence, FCP), noise components in the spectrum (high-frequency energy ratio, HFER), and the normalised amplitude quotient (NAQ) suggesting great variability in ability to control phonatory mechanisms such as subglottal pressure (Psub), glottal configuration and adduction, and vocal tract shaping. A strong association between the HFER and NAQ suggest that these non-invasive measures may offer complimentary insights into vocal function, specifically with regard to glottal adduction and turbulent noise in the voice signal.Conclusion: Knowledge of the range of variation within healthy adolescent singers is necessary for the development of effective and inclusive pedagogical practices, and for vocal-health professionals working with singers of this age. LTAS and inverse filtering are useful non-invasive tools for determining such characteristics.
Collapse
Affiliation(s)
- Calvin P Baker
- Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand
- School of Music, University of Auckland, Auckland, New Zealand
| | - Johan Sundberg
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH (Royal Institute of Technology), Stockholm, Sweden
- Department of Linguistics, Stockholm University, Stockholm, Sweden
- University College of Music Education Stockholm, Stockholm, Sweden
| | - Suzanne C Purdy
- School of Psychology, University of Auckland, Auckland Central, Auckland, New Zealand
| | - Te Oti Rakena
- School of Music, University of Auckland, Auckland, New Zealand
| |
Collapse
|
4
|
Chai Y, Garellek M. On H1-H2 as an acoustic measure of linguistic phonation type. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1856. [PMID: 36182308 DOI: 10.1121/10.0014175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/01/2022] [Indexed: 06/16/2023]
Abstract
The measure H1-H2, the difference in amplitude between the first and second harmonics, is frequently used to distinguish phonation types and to characterize differences across voices and genders. While H1-H2 can differentiate voices and is used by listeners to perceive changes in voice quality, its relation to voice articulation is less straightforward. Its calculation also involves practical issues with error propagation. This paper highlights some developments in the use of H1-H2 and proposes a new measure that we call "residual H1." In residual H1, the amplitude of the first harmonic is normalized against the overall sound energy (as measured by root mean square energy) instead of against H2. Residual H1 may mitigate some of the issues with using H1-H2. The current study tests the correlation between residual H1 and electroglottographic contact quotient (CQ) and compares the ability of residual H1 vs H1-H2 to differentiate statistically across phonation types in !Xóõ and utterance-level changes in phonatory quality in Mandarin. The results show that residual H1 has a stronger correlation with CQ and differentiates contrastive and allophonic phonatory quality better than H1-H2, particularly for more constricted phonation types.
Collapse
Affiliation(s)
- Yuan Chai
- Department of Linguistics, University of California San Diego, La Jolla, California 92093, USA
| | - Marc Garellek
- Department of Linguistics, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
5
|
Lee SH, Lee GS. Long-term Average Spectrum and Nasal Accelerometry in Sentences of Differing Nasality and Forward-Focused Vowel Productions Under Altered Auditory Feedback. J Voice 2022:S0892-1997(22)00228-4. [PMID: 36050247 DOI: 10.1016/j.jvoice.2022.07.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Accepted: 07/27/2022] [Indexed: 10/14/2022]
Abstract
OBJECTIVES AND BACKGROUND To investigate whether voice focus adjustments can alter the audio-vocal feedback and consequently modulate speech/voice motor control. Speaking with a forward-focused voice was expected to enhance audio-vocal feedback and thus decrease the variability of vocal fundamental frequency (F0). MATERIALS AND METHOD Twenty-two healthy, untrained adults (10 males and 12 females) were requested to sustain vowel /a/ with their natural focus and a forward focus and to naturally read the nasal, oral, and mixed oral-nasal sentences in normal noise-masked auditory conditions. Meanwhile, a miniature accelerometer was externally attached on the noise to detect the nasal vibrations during vocalization. Audio recordings were made and analyzed using the long-term average spectrum (LTAS) and power spectral analysis of F0. RESULTS Compared with naturally-focused vowel production and oral sentences, forward-focused vowel productions and nasal sentences both showed significant increases in nasal accelerometric amplitude and the spectral power within the range of 200∼300 Hz, and significantly decreased the F0 variability below 3 Hz, which has been reported to be associated with enhanced auditory feedback in our previous research. The auditory masking not only significantly increased the low-frequency F0 variability, but also significantly decreased the ratio of the spectral power within 200∼300 Hz to the power within 300∼1000 Hz for the vowel and sentence productions. Gender differences were found in the correlations between the degree of nasal coupling and F0 stability as well as in the LTAS characteristics in response to noise. CONCLUSIONS Variations in nasal-oral acoustic coupling not only change the formant features of speech signals, but involuntarily influence the auditory feedback control of vocal fold vibrations. Speakers tend to show improved F0 stability in response to a forward-focused voice adjustment.
Collapse
Affiliation(s)
- Shao-Hsuan Lee
- Department of Audiology and Speech Language Pathology, Mackay Medical College, New Taipei City, and Department of Speech Language Pathology and Audiology, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| | - Guo-She Lee
- School of Medicine, College of Medicine, Yangming Campus, National Yang Ming Chiao Tung University, and Department of Otorhinolaryngology, Taipei City Hospital Renai Branch, Taipei, Taiwan.
| |
Collapse
|
6
|
Motie-Shirazi M, Zañartu M, Peterson SD, Mehta DD, Hillman RE, Erath BD. Collision Pressure and Dissipated Power Dose in a Self-Oscillating Silicone Vocal Fold Model With a Posterior Glottal Opening. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2829-2845. [PMID: 35914018 PMCID: PMC9911124 DOI: 10.1044/2022_jslhr-21-00471] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 01/24/2022] [Accepted: 05/04/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE The goal of this study was to experimentally evaluate how compensating for the adverse acoustic effects of a posterior glottal opening (PGO) by increasing subglottal pressure and changing supraglottal compression, as have been associated with vocal hyperfunction, influences the risk of vocal fold (VF) trauma. METHOD A self-oscillating synthetic silicone model of the VFs with an airflow bypass that modeled a PGO was investigated in a hemilaryngeal flow facility. The influence of compensatory mechanisms on collision pressure and dissipated collision power was investigated for different PGO areas and supraglottal compression. Compensatory behaviors were mimicked by increasing the subglottal pressure to achieve a target sound pressure level (SPL). RESULTS Increasing the subglottal pressure to compensate for decreased SPL due to a PGO produced higher values for both collision pressure and dissipated collision power. Whereas a 10-mm2 PGO area produced a 12% increase in the peak collision pressure, the dissipated collision power increased by 122%, mainly due to an increase in the magnitude of the collision velocity. This suggests that the value of peak collision pressure may not fully capture the mechanisms by which phonotrauma occurs. It was also found that an optimal value of supraglottal compression exists that maximizes the radiated SPL, indicating the potential utility of supraglottal compression as a compensatory mechanism. CONCLUSIONS Larger PGO areas are expected to increase the risk of phonotrauma due to the concomitant increase in dissipated collision power associated with maintaining SPL. Furthermore, the risk of VF damage may not be fully characterized by only the peak collision pressure.
Collapse
Affiliation(s)
- Mohsen Motie-Shirazi
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, NY
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Sean D. Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario, Canada
| | - Daryush D. Mehta
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston
| | - Robert E. Hillman
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston
| | - Byron D. Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, NY
| |
Collapse
|
7
|
Loakes D, Gregory A. Voice quality in Australian English. JASA EXPRESS LETTERS 2022; 2:085201. [PMID: 37311190 DOI: 10.1121/10.0012994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 07/08/2022] [Indexed: 06/15/2023]
Abstract
This study is an acoustic investigation of voice quality in Australian English. The speech of 33 Indigenous Australians (Aboriginal English speakers) is compared to that of 28 Anglo Australians [Mainstream Australian English (MAE) speakers] from two rural locations in Victoria. Analysis of F0 and H1*-H2* reveals that pitch and voice quality differ significantly for male speakers according to dialect and for female speakers according to location. This study highlights previously undescribed phonetic and sociophonetic variability in voice quality in Australian English.
Collapse
Affiliation(s)
- Debbie Loakes
- ARC Centre of Excellence for the Dynamics of Language, Research Hub for Language in Forensic Evidence, School of Languages and Linguistics, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Adele Gregory
- School of Languages and Linguistics, University of Melbourne, Parkville, Victoria 3010, Australia ,
| |
Collapse
|
8
|
Soleimanifar S, Staisloff HE, Aronoff JM. The effect of simulated insertion depth differences on the vocal pitches of cochlear implant users. JASA EXPRESS LETTERS 2022; 2:044401. [PMID: 36154233 DOI: 10.1121/10.0010243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Cochlear implant (CI) users often produce different vocal pitches when using their left versus right CI. One possible explanation for this is that insertion depth differs across the two CIs. The goal of this study was to investigate the role of electrode insertion depth in the production of vocal pitch. Eleven individuals with bilateral CIs used maps simulating differences in insertion depth. Participants produced a sustained vowel and sang Happy Birthday. Approximately half the participants significantly shifted the pitch of their voice in response to different simulated insertion depths. The results suggest insertion depth differences can alter produced vocal pitch.
Collapse
Affiliation(s)
- Simin Soleimanifar
- Speech and Hearing Science Department, University of Illinois at Urbana-Champaign, 901 South 6th Street, Champaign, Illinois 61801, USA , ,
| | - Hannah E Staisloff
- Speech and Hearing Science Department, University of Illinois at Urbana-Champaign, 901 South 6th Street, Champaign, Illinois 61801, USA , ,
| | - Justin M Aronoff
- Speech and Hearing Science Department, University of Illinois at Urbana-Champaign, 901 South 6th Street, Champaign, Illinois 61801, USA , ,
| |
Collapse
|
9
|
A Longitudinal Study of Speech Acoustics in Older French Females: Analysis of the Filler Particle euh across Utterance Positions. LANGUAGES 2021. [DOI: 10.3390/languages6040211] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Aging in speech production is a multidimensional process. Biological, cognitive, social, and communicative factors can change over time, stay relatively stable, or may even compensate for each other. In this longitudinal work, we focus on stability and change at the laryngeal and supralaryngeal levels in the discourse particle euh produced by 10 older French-speaking females at two times, 10 years apart. Recognizing the multiple discourse roles of euh, we divided out occurrences according to utterance position. We quantified the frequency of euh, and evaluated acoustic changes in formants, fundamental frequency, and voice quality across time and utterance position. Results showed that euh frequency was stable with age. The only acoustic measure that revealed an age effect was harmonics-to-noise ratio, showing less noise at older ages. Other measures mostly varied with utterance position, sometimes in interaction with age. Some voice quality changes could reflect laryngeal adjustments that provide for airflow conservation utterance-finally. The data suggest that aging effects may be evident in some prosodic positions (e.g., utterance-final position), but not others (utterance-initial position). Thus, it is essential to consider the interactions among these factors in future work and not assume that vocal aging is evident throughout the signal.
Collapse
|
10
|
Li J, Hasegawa-Johnson M, McElwain NL. Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations. SPEECH COMMUNICATION 2021; 133:41-61. [PMID: 36062214 PMCID: PMC9435967 DOI: 10.1016/j.specom.2021.07.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Classification of infant and parent vocalizations, particularly emotional vocalizations, is critical to understanding how infants learn to regulate emotions in social dyadic processes. This work is an experimental study of classifiers, features, and data augmentation strategies applied to the task of classifying infant and parent vocalization types. Our data were recorded both in the home and in the laboratory. Infant vocalizations were manually labeled as cry, fus (fuss), lau (laugh), bab (babble) or scr (screech), while parent (mostly mother) vocalizations were labeled as ids (infant-directed speech), ads (adult-directed speech), pla (playful), rhy (rhythmic speech or singing), lau (laugh) or whi (whisper). Linear discriminant analysis (LDA) was selected as a baseline classifier, because it gave the highest accuracy in a previously published study covering part of this corpus. LDA was compared to two neural network architectures: a two-layer fully-connected network (FCN), and a convolutional neural network with self-attention (CNSA). Baseline features extracted using the OpenSMILE toolkit were augmented by extra voice quality, phonetic, and prosodic features, each targeting perceptual features of one or more of the vocalization types. Three web data augmentation and transfer learning methods were tested: pre-training of network weights for a related task (adult emotion classification), augmentation of under-represented classes using data uniformly sampled from other corpora, and augmentation of under-represented classes using data selected by a minimum cross-corpus information difference criterion. Feature selection using Fisher scores and experiments of using weighted and unweighted samplers were also tested. Two datasets were evaluated: a benchmark dataset (CRIED) and our own corpus. In terms of unweighted-average recall of CRIED dataset, the CNSA achieved the best UAR compared with previous studies. In terms of classification accuracy, weighted F1, and macro F1 of our own dataset, the neural networks both significantly outperformed LDA; the FCN slightly (but not significantly) outperformed the CNSA. Cross-examining features selected by different feature selection algorithms permits a type of post-hoc feature analysis, in which the most important acoustic features for each binary type discrimination are listed. Examples of each vocalization type of overlapped features were selected, and their spectrograms are presented, and discussed with respect to the type-discriminative acoustic features selected by various algorithms. MFCC, log Mel Frequency Band Energy, LSP frequency, and F1 are found to be the most important spectral envelope features; F0 is found to be the most important prosodic feature.
Collapse
Affiliation(s)
- Jialu Li
- Beckman Institute, University of Illinois, Urbana, IL 61801, USA
- Department of Electrical and Computer Engineering, USA
| | - Mark Hasegawa-Johnson
- Beckman Institute, University of Illinois, Urbana, IL 61801, USA
- Department of Electrical and Computer Engineering, USA
| | - Nancy L. McElwain
- Beckman Institute, University of Illinois, Urbana, IL 61801, USA
- Department of Human Development and Family Studies, USA
| |
Collapse
|
11
|
Nip ISB, Garellek M. Voice Quality of Children With Cerebral Palsy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3051-3059. [PMID: 34260269 PMCID: PMC8740668 DOI: 10.1044/2021_jslhr-20-00633] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 02/08/2021] [Accepted: 03/29/2021] [Indexed: 05/19/2023]
Abstract
Purpose Many children with cerebral palsy (CP) are described as having altered vocal quality. The current study utilizes psychoacoustic measures, namely, low-amplitude (H1*-H2*) and high-amplitude (H1*-A2*) spectral tilt and cepstral peak prominence (CPP), to identify the vocal fold articulation characteristics in this population. Method Eight children with CP and eight typically developing (TD) peers produced vowel singletons [i, ɑ, u] and a story retell task with the same vowels in the words "beets, Bobby, boots." H1*-H2*, H1*-A2*, and CPP were extracted from each vowel. Results were analyzed with mixed linear models to identify the effect of Group (CP, TD), Task (vowel singleton, story retell), and Vowel [i, ɑ, u] on the dependent variables. Results Children with CP have lower spectral tilt values (H1*-H2* and H1*-A2*) and lower CPP values than their TD peers. For both groups, vowel singletons were associated with lower CPP values as compared to story retell. Finally, the vowel [ɑ] was associated with higher spectral tilt and higher CPP values as compared to [i, u]. Conclusions Children with CP have more constricted and creaky vocal quality due to lower spectral tilt and greater noise. Unlike adults, children demonstrate poorer vocal fold articulation when producing vowel singletons as compared to story retell. Finally, low vowels like [ɑ] seem to be produced with less constriction and noise as compared to high vowels.
Collapse
|
12
|
Labuschagne IB, Ciocca V. Noise thresholds in harmonic series maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2492. [PMID: 33940897 DOI: 10.1121/10.0004130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/15/2021] [Indexed: 06/12/2023]
Abstract
The presence of noise is a salient cue to the perception of breathiness and aspiration in speech sounds. The detection of noise within harmonic series (maskers) composed of unresolved components was found to depend on the fundamental frequency (fo) and the overall level of the masker [Gockel, Moore, and Patterson (2002). J. Acoust. Soc. Am., 111 (6), 2759-2770]. In the present study, noise detection thresholds were measured as a function of the frequency range, the fo, and the overall level of harmonic maskers. Frequency range was specified in equivalent rectangular bandwidth (ERB) units (3-13, 13-23, 23-33, or 3-33 ERBs). The results were consistent with the idea that listeners rely on spectral cues when maskers comprise only resolved components (3-13 ERBs), and on temporal (dip listening) cues when maskers contain only unresolved components (23-33 ERBs). Noise detection thresholds were generally lower when masker level was high (70 dBA) than when it was low (50 dBA). Masker fo affected thresholds only when listeners relied on spectral cues for noise detection. With the wideband (3-33 ERBs) masker, listeners likely detected noise by focusing on the frequency band (23-33 ERBs) with the most advantageous noise-to-harmonic ratio.
Collapse
Affiliation(s)
- Ilse B Labuschagne
- School of Audiology and Speech Sciences, The University of British Columbia, 2177 Wesbrook Mall, Vancouver, British Columbia, V6T 1Z3, Canada
| | - Valter Ciocca
- School of Audiology and Speech Sciences, The University of British Columbia, 2177 Wesbrook Mall, Vancouver, British Columbia, V6T 1Z3, Canada
| |
Collapse
|
13
|
Xue Y, Marxen M, Akagi M, Birkholz P. Acoustic and articulatory analysis and synthesis of shouted vowels. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2020.101156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Using Pitch Height and Pitch Strength to Characterize Type 1, 2, and 3 Voice Signals. J Voice 2021; 35:181-193. [DOI: 10.1016/j.jvoice.2019.08.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 08/05/2019] [Accepted: 08/08/2019] [Indexed: 11/19/2022]
|
15
|
Circumspection in using automated measures: Talker gender and addressee affect error rates for adult speech detection in the Language ENvironment Analysis (LENA) system. Behav Res Methods 2020; 53:113-138. [DOI: 10.3758/s13428-020-01419-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
Shi M, Chen Y, Mous M. Tonal split and laryngeal contrast of onset consonant in Lili Wu Chinese. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:2901. [PMID: 32359279 DOI: 10.1121/10.0001000] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 11/26/2019] [Indexed: 06/11/2023]
Abstract
This study examines the acoustic properties concerning tonal split and stop onsets in an under-documented Wu Chinese variety, Lili Wu, using speech production data collected from field research. Lili Wu Chinese has been reported to demonstrate an unusual tonal split phenomenon known as "aspiration-induced tonal split" (ATS). ATS refers to the distinct lowering of f0 of a lexical tone over syllables beginning with a voiceless aspirated obstruent, compared to that of syllables beginning with an unaspirated obstruent. Two debates lingering in the existing literature are discussed: (i) is ATS an on-going change or a completed change? and (ii) is it onset aspiration or vowel breathiness that directly triggers ATS? Results suggest that ATS is a completed change, which, however, is conditioned by tonal contexts. Regarding the second debate, results suggest that neither aspiration nor breathiness serves as the direct trigger for tonal split. Moreover, one unexpected on-going sound change was observed: The breathiness of vowels after voiced onsets seems to be disappearing among the younger generation. These findings extend the understanding of the acoustic properties of tonal development in a complex system and highlight the importance of experimental methods in understanding the sound structure and changes of under-documented languages.
Collapse
Affiliation(s)
- Menghui Shi
- Leiden University Centre for Linguistics, Leiden University, Van Wijkplaats 4, Witte Singel-complex, Leiden, 2300 RA, The Netherlands
| | - Yiya Chen
- Leiden Institute for Brain and Cognition, Leiden University, Wassenaarseweg 52, Leiden, 2300 RC, The Netherlands
| | - Maarten Mous
- Leiden University Centre for Linguistics, Leiden University, Van Wijkplaats 4, Witte Singel-complex, Leiden, 2300 RA, The Netherlands
| |
Collapse
|
17
|
Kelterer A, Schuppler B. Phonation type contrasts and tone in Chichimec. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3043. [PMID: 32359325 DOI: 10.1121/10.0001015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 03/16/2020] [Indexed: 06/11/2023]
Abstract
Chichimec (Otomanguean) has two tones, high and low, and a phonological three-way phonation contrast: modal /V/, breathy /V¨/, and creaky /Ṽ/. Tone and phonation type contrasts are used independently. This paper investigates the acoustic realization of modal, breathy, and creaky vowels; the timing of phonation in non-modal vowels; and the production of tone in combination with different phonation types. The results of cepstral peak prominence and three spectral tilt measures showed that phonation type contrasts are not distinguished by the same acoustic measures for women and men. In line with expectations for laryngeally complex languages, phonetic modal and non-modal phonation are sequenced in phonological breathy and creaky vowels. With respect to the timing pattern, however, the results show that non-modal phonation is not, as previously reported, mainly located in the middle of the vowel. Non-modal phonation is, instead, predominantly realized in the second half of phonological breathy and creaky vowels. Tone is distinguished in all three phonation types, and non-modal vowels do not exhibit distinct F0 ranges except for creaky vowels produced by women in which F0 declines in the creaky portion. The results of the acoustic analysis provide additional insights to phonological accounts of laryngeal complexity in Chichimec.
Collapse
Affiliation(s)
- Anneliese Kelterer
- Department of Linguistics, University of Graz, Merangasse 70/III, 8010 Graz, Austria
| | - Barbara Schuppler
- Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 16c, 8010 Graz, Austria
| |
Collapse
|
18
|
Oren L, Khosla S, Farbos de Luzan C, Gutmark E. Effects of False Vocal Folds on Intraglottal Velocity Fields. J Voice 2020; 35:695-702. [PMID: 32147314 DOI: 10.1016/j.jvoice.2020.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 01/31/2020] [Accepted: 02/03/2020] [Indexed: 10/24/2022]
Abstract
Previous models have theorized that, during phonation, skewing of the glottal waveform (which is correlated with acoustic intensity) occurred because of inertance of the vocal tract. Later, we reported that skewing of the flow rate waveform can occur without the presence of a vocal tract in an excised canine larynx. We hypothesized that in the absence of a vocal tract, the skewing formed when dynamic pressures acted on the glottal wall during the closing phase; such pressures were greatly affected by formation of intraglottal vortices. In this study, we aim to identify how changes in false vocal folds constriction can affect the acoustics and intraglottal flow dynamics. The intraglottal flow measurements were made using particle image velocimetry in an excised canine larynx where a vocal tract model was placed above the larynx and the constriction between the false vocal folds was varied. Our results show that for similar values of subglottal pressures, the skewing of the glottal waveform, strength of the intraglottal vortices, and acoustic energy increased as the constriction between the false vocal folds was increased. These preliminary findings suggest that acoustic intensity during phonation can be increased by the addition of a vocal tract with false fold constriction.
Collapse
Affiliation(s)
- Liran Oren
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio.
| | - Sid Khosla
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio
| | - Charles Farbos de Luzan
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio
| | - Ephraim Gutmark
- Department of Aerospace Engineering and Engineering Mechanics, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
19
|
Barreira RR, Ling LL. Kullback–Leibler divergence and sample skewness for pathological voice quality assessment. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101697] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
20
|
Labuschagne IB, Ciocca V. The effect of vocal tract parameters on aspiration noise discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1239. [PMID: 32113289 DOI: 10.1121/10.0000756] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/30/2020] [Indexed: 06/10/2023]
Abstract
Previous research showed that aspiration noise difference limens in moderately breathy /a/ vowels decreased as the spectral slope of the glottal source spectrum became increasingly steep [Kreiman and Gerratt, J. Acoust. Soc. Am. 131(1), 492-500 (2012)]. The current study investigated whether discrimination of aspiration noise levels was affected by differences in spectral shape due to vowel quality (/æ/ and /i/) and speaker identity (three male speakers) when the slope of the glottal source spectrum was fixed. The results showed that discrimination performance was worse overall for /i/ than /æ/, but the result may have resulted from relatively poor performance for the /i/ vowel of one speaker. Acoustic analyses of the stimuli were performed to estimate the association between acoustic properties and the perceptual outcomes. The results showed that both the smoothed cepstral peak prominence and the harmonic energy level between 2 and 5 kHz may account for the observed differences in aspiration noise discrimination among speakers within each vowel, but not for differences between vowel categories. It is possible that the relationship between the aspiration noise discrimination and aforementioned acoustic properties may be modulated by the spectral distribution of energy across frequency.
Collapse
Affiliation(s)
- Ilse B Labuschagne
- School of Audiology and Speech Sciences, The University of British Columbia, 2177 Wesbrook Mall, Vancouver, British Columbia V6T 1Z3, Canada
| | - Valter Ciocca
- School of Audiology and Speech Sciences, The University of British Columbia, 2177 Wesbrook Mall, Vancouver, British Columbia V6T 1Z3, Canada
| |
Collapse
|
21
|
Hejná M, Šturm P, Tylečková L, Bořil T. Normophonic Breathiness in Czech and Danish: Are Females Breathier Than Males? J Voice 2020; 35:498.e1-498.e22. [PMID: 31902679 DOI: 10.1016/j.jvoice.2019.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 10/30/2019] [Accepted: 10/31/2019] [Indexed: 10/25/2022]
Abstract
The present study compares the voice quality of female and male speech in two languages: Czech, a Slavic language, and Danish, a Germanic language. For both languages, the results based on a total of 120 vocally healthy speakers are in line with the claim that females are universally breathier than males. This was supported by the Cepstral Peak Prominence (CPP) and H1*-H2* measures, which are generally known as the most robust correlates of breathiness, and also by the H1*-A3* measure. However, the sex distinction was unsupported or even contradictory when using some other measures suggested to reflect breathiness, which provides an incentive to insist on employing a number of acoustic measures in future voice research. The perceptual component of the study nevertheless suggests that these contradictory findings are due to differences in perceived roughness rather than breathiness, and that CPP and H1*-H2* do reflect breathiness differences, and CPP in particular. We therefore conclude that it is indeed the case that female speakers are breathier than male speakers. Finally, in terms of the two robust measures (CPP and H1*-H2*), no language-specific differences in the magnitude of the effect of sex on breathiness were found.
Collapse
Affiliation(s)
- Míša Hejná
- Department of English, Aarhus University, Aarhus C, Denmark
| | - Pavel Šturm
- Institute of Phonetics, Charles University, Praha 1, Czech Republic.
| | - Lea Tylečková
- Institute of Phonetics, Charles University, Praha 1, Czech Republic
| | - Tomáš Bořil
- Institute of Phonetics, Charles University, Praha 1, Czech Republic
| |
Collapse
|
22
|
Park Y, Perkell JS, Matthies ML, Stepp CE. Categorization in the Perception of Breathy Voice Quality and Its Relation to Voice Production in Healthy Speakers. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3655-3666. [PMID: 31525305 PMCID: PMC7201331 DOI: 10.1044/2019_jslhr-s-19-0048] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/12/2019] [Accepted: 06/12/2019] [Indexed: 05/24/2023]
Abstract
Purpose Previous studies of speech articulation have shown that individuals who can perceive smaller differences between similar-sounding phonemes showed larger contrasts in their productions of those phonemes. Here, a similar relationship was examined between the perception and production of breathy voice quality. Method Twenty females with healthy voices were recruited to participate in both a voice production and a perception experiment. Each participant produced repetitions of a sustained vowel, and acoustic correlates of breathiness were calculated. Identification and discrimination tasks were performed with a series of synthetic stimuli along a breathiness continuum. Categorical boundary location and boundary width were obtained from the identification task as a measurement of perception of breathiness. Spearman's correlation analysis was performed to estimate associations between values of boundary location and width and the acoustic correlates of breathiness from the participants' voices. Results Significant correlations between boundary width (r = -.53 to -.6) and some acoustic correlates were found, but no significant relationships were observed between boundary location and the acoustic correlates. Conclusions Speakers with small boundary widths, which suggest higher perceptual precision in differentiating breathiness, had typical voices that were less breathy, as estimated with acoustic measures, compared to speakers with large boundary widths. Our findings may support a link between perception and production of breathy voice quality. Supplemental Material https://doi.org/10.23641/asha.9808478.
Collapse
Affiliation(s)
- Yeonggwang Park
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
| | - Joseph S. Perkell
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge
| | | | - Cara E. Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology–Head and Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
23
|
Mehta DD, Espinoza VM, Van Stan JH, Zañartu M, Hillman RE. The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface accelerometer signals during phonation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL386. [PMID: 31153299 PMCID: PMC6520097 DOI: 10.1121/1.5100909] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Miniature high-bandwidth accelerometers on the anterior neck surface are used in laboratory and ambulatory settings to obtain vocal function measures. This study compared the widely applied L1-L2 measure (historically, H1-H2)-the difference between the log-magnitude of the first and second harmonics-computed from the glottal airflow waveform with L1-L2 derived from the raw neck-surface acceleration signal in 79 vocally healthy female speakers. Results showed a significant correlation (r = 0.72) between L1-L2 values estimated from both airflow and accelerometer signals, suggesting that raw accelerometer-based estimates of L1-L2 may be interpreted as reflecting glottal physiological parameters and voice quality attributes during phonation.
Collapse
Affiliation(s)
- Daryush D Mehta
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Víctor M Espinoza
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, , , , ,
| | - Jarrad H Van Stan
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, , , , ,
| | - Robert E Hillman
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| |
Collapse
|
24
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
25
|
Buder EH, McDaniel VF, Bene ER, Ladmirault J, Oller DK. Registers in Infant Phonation. J Voice 2019; 33:382.e21-382.e32. [DOI: 10.1016/j.jvoice.2017.12.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/20/2017] [Indexed: 11/29/2022]
|
26
|
Zhang J, Honda K, Wei J, Kitamura T. Morphological characteristics of male and female hypopharynx: A magnetic resonance imaging-based study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:734. [PMID: 30823823 DOI: 10.1121/1.5089220] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/15/2019] [Indexed: 06/09/2023]
Abstract
Studies with three-dimensional (3D) vocal tract visualization using magnetic resonance imaging (MRI) have suggested that hypopharyngeal cavities, i.e., laryngeal cavity and bilateral piriform fossa, may be the acoustic loci to express speaker characteristics in male speech sounds. Previous studies mainly investigated the hypopharynx in males but few for females. This study explored the hypopharynx morphological characteristics for the subjects in both genders by MRI. 3D numerical vocal tracts at vowels were reconstructed from the MRI datasets of three male and four female Chinese subjects. Geometrical measurements were conducted for the hypopharyngeal cavities. Morphological observations and statistical analyses revealed both commonalities and differences between the male and female subjects. The laryngeal cavity shapes in females were found similar to males resembling a Helmholtz resonator rather than a simple straight closed tube, and the bilateral piriform fossa cavities showed an asymmetry: the right is longer and wider than the left in both genders. As for the cavity size across vowels, for both the male and female subjects the laryngeal cavity and piriform fossa in /i/ were observed larger than those in /a/. To summarize gender characteristics, the female subjects were characterized by the smaller laryngeal cavity and piriform fossa compared with males.
Collapse
Affiliation(s)
- Ju Zhang
- School of Computer Science and Technology, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin 300350, China
| | - Kiyoshi Honda
- School of Computer Science and Technology, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin 300350, China
| | - Jianguo Wei
- School of Computer Software, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin 300350, China
| | - Tatsuya Kitamura
- Faculty of Intelligence and Informatics, Konan University, 8-9-1 Okamoto, Higashinada, Kobe 658-8501, Japan
| |
Collapse
|
27
|
The Effect of Supraclavicular Radiotherapy on Acoustic Voice Quality Index (AVQI), Spectral Amplitude and Perturbation Values. J Voice 2019; 34:649.e7-649.e13. [PMID: 30686632 DOI: 10.1016/j.jvoice.2019.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 11/24/2018] [Accepted: 01/03/2019] [Indexed: 12/24/2022]
Abstract
OBJECTIVE The Acoustic Voice Quality Index (AVQI), spectral amplitude, and voice perturbation parameters are objective assessment methods that are used in clinical settings and for research purposes. The aim of this study was to demonstrate the effect of supraclavicular RT on the physiology and function of the vocal fold. METHODS A total of 29 female patients were included in the study. The voices of the patients, who were diagnosed with breast cancer and underwent supraclavicular RT, were recorded before and after the treatment (1 and 6 months). AVQI, spectral amplitude (H1-H2, H1-A1, H1-A2, H1-A3) and acoustic analyses of the voice perturbation parameters were performed. RESULTS AVQI was significantly higher in the first month (P < 0.05). Of the voice perturbation parameters, shimmer was found to be significantly high in the first month (P < 0.05). However, not all spectral amplitude values showed a significant change (P > 0.05). CONCLUSION In this study, AVQI and shimmer values were found to be higher following the application of supraclavicular RT. These results showed that nonlaryngeal RT might cause changes in the acoustic values of the voice in the early stage.
Collapse
|
28
|
Gangamohan P, Gangashetty SV, Yegnanarayana B. Subsegmental level analysis of high arousal speech using the zero-time windowing method. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:551. [PMID: 30710923 DOI: 10.1121/1.5087816] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 01/01/2019] [Indexed: 06/09/2023]
Abstract
Speech produced by a speaker in emotionally charged situations, such as anger, happiness, and shout corresponds to high arousal speech. Changes in the production characteristics such as increase in the subglottal air pressure, increase in the glottal closed phase in each cycle, and increase in the rate of glottal vibration are observed in the high arousal speech. Acoustic parameters such as glottal closed quotient and fundamental frequency (F0) are used to characterize the high arousal speech. In this paper, high arousal is characterized by features extracted using the zero-time windowing (ZTW) method. The spectrum derived from the ZTW method emphasizes the instantaneous spectral characteristics in the speech signal. In the glottal open region, changes are clearly observed in the lower frequency range of the spectrum. Distinctive spectral features are observed during the glottal open region in the case of high arousal speech, when compared to neutral speech. These features are used to develop a method for identification of high arousal speech. Simple and maybe somewhat ad hoc rules, based on these features seem to give good performance in the identification of high arousal speech, even without using neutral speech as reference.
Collapse
Affiliation(s)
- P Gangamohan
- Speech Processing Laboratory, International Institute of Information Technology-Hyderabad (IIIT-H), India
| | - Suryakanth V Gangashetty
- Speech Processing Laboratory, International Institute of Information Technology-Hyderabad (IIIT-H), India
| | - B Yegnanarayana
- Speech Processing Laboratory, International Institute of Information Technology-Hyderabad (IIIT-H), India
| |
Collapse
|
29
|
Yanushevskaya I, Gobl C, Ní Chasaide A. Cross-language differences in how voice quality and f 0 contours map to affect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2730. [PMID: 30522326 DOI: 10.1121/1.5066448] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 10/12/2018] [Indexed: 06/09/2023]
Abstract
The relationship between prosody and perceived affect involves multiple variables. This paper explores the interplay of three: voice quality, f 0 contour, and the hearer's language background. Perception tests were conducted with speakers of Irish English, Russian, Spanish, and Japanese using three types of synthetic stimuli: (1) stimuli varied in voice quality, (2) stimuli of uniform (modal) voice quality incorporating affect-related f 0 contours, and (3) stimuli combining specific non-modal voice qualities with the affect-related f 0 contours of (2). The participants rated the stimuli for the presence/strength of affective colouring on six bipolar scales, e.g., happy-sad. The results suggest that stimuli incorporating non-modal voice qualities, with or without f 0 variation, are generally more effective in affect cueing than stimuli varying only in f 0. Along with similarities in the affective responses across these languages, many points of divergence were found, both in terms of the range and strength of affective responses overall and in terms of specific stimulus-to-affect associations. The f 0 contour may play a more important role, and tense voice a lesser role in affect signalling in Japanese and Spanish than in Irish English and Russian. The greatest cross-language differences emerged for the affects intimate, formal, stressed, and relaxed.
Collapse
Affiliation(s)
- Irena Yanushevskaya
- Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Trinity College Dublin, Dublin, Ireland
| | - Christer Gobl
- Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Trinity College Dublin, Dublin, Ireland
| | - Ailbhe Ní Chasaide
- Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
30
|
Davidson L. The Effects of Pitch, Gender, and Prosodic Context on the Identification of Creaky Voice. PHONETICA 2018; 76:235-262. [PMID: 30016778 DOI: 10.1159/000490948] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 06/15/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND/AIMS Creaky voice in American English plays both a prosodic role, as a phrase-final marker, and a sociolinguistic one, but it is unclear how accurately naïve listeners can identify creak, and what factors facilitate or hinder ist identification. METHODS In this study, American listeners are presented with 2 experiments containing stimuli from both high- and low-pitched male and female speakers. Other manipulations include whether the auditory stimulus is a full sentence or a sentence fragment, and whether it is completely modally voiced, completely creaky, or partially creaky (final 40-50% of the utterance). RESULTS Accuracy is lowest on partial creak, suggesting that creaky voice is least salient when it serves as an utterance-final marker. There are no strong gender effects aside from a weak tendency to identify creak more often in females than males in the whole creak condition in one experiment. In contrast, when no creak is present, listeners false alarm on the low-pitched males. CONCLUSION Rates of identifying creak in male and female speakers are similar, suggesting that listeners have a comparable ability to hear creaky voice in all speakers.
Collapse
|
31
|
Park SJ, Yeung G, Vesselinova N, Kreiman J, Keating PA, Alwan A. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:375. [PMID: 30075658 PMCID: PMC6062421 DOI: 10.1121/1.5045323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 05/21/2018] [Accepted: 06/17/2018] [Indexed: 05/29/2023]
Abstract
Little is known about human and machine speaker discrimination ability when utterances are very short and the speaking style is variable. This study compares text-independent speaker discrimination ability of humans and machines based on utterances shorter than 2 s in two different speaking styles (read sentences and speech directed towards pets, characterized by exaggerated prosody). Recordings of 50 female speakers drawn from the UCLA Speaker Variability Database were used as stimuli. Performance of 65 human listeners was compared to i-vector-based automatic speaker verification systems using mel-frequency cepstral coefficients, voice quality features, which were inspired by a psychoacoustic model of voice perception, or their combination by score-level fusion. Humans always outperformed machines, except in the case of style-mismatched pairs from perceptually-marked speakers. Speaker representations by humans and machines were compared using multi-dimensional scaling (MDS). Canonical correlation analysis showed a weak correlation between machine and human MDS spaces. Multiple regression showed that means of voice quality features could represent the most important human MDS dimension well, but not the dimensions from machines. These results suggest that speaker representations by humans and machines are different, and machine performance might be improved by better understanding how different acoustic features relate to perceived speaker identity.
Collapse
Affiliation(s)
- Soo Jin Park
- Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Gary Yeung
- Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Neda Vesselinova
- Department of Head and Neck Surgery, School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Patricia A Keating
- Department of Linguistics, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Abeer Alwan
- Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
32
|
Khattab G, Al-Tamimi J, Alsiraih W. Nasalisation in the Production of Iraqi Arabic Pharyngeals. PHONETICA 2018; 75:310-348. [PMID: 29966129 DOI: 10.1159/000487806] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 01/08/2018] [Indexed: 06/08/2023]
Abstract
AIM This paper presents the auditory and acoustic investigations of pharyngeal consonants in Iraqi Arabic (IA). While the contested place and manner of articulation of these sounds have been the subject of investigation in many studies, the focus here is novel: we set out to investigate the extent to which pharyngeals in IA are accompanied by auditory nasalisation and how widespread the effect is across oral and nasal contexts. METHODS Auditory and acoustic properties of nasalization, as produced by nine male speakers of IA, were investigated in target words with oral, nasal, and pharyngeal environments. RESULTS When combined with oral consonants, pharyngeals exhibit little or no nasalisation; however, when pharyngeal are combined with nasals, they exhibit various degrees of nasalisation, sometimes beyond what is found for a nasal environment alone. This is especially so for voiced pharyngeals, which display more nasalisation than their voiceless counterparts. A principle component analysis combining all the acoustic correlates examined demonstrates a definite contribution of pharyngeals to the presence of nasalisation. CONCLUSION The epilaryngeal constriction and variability in the articulation of pharyngeals are thought to be responsible for the nasalisation effect and may act as potential drivers for sound change in IA pharyngeals.
Collapse
|
33
|
V Latoszek BB, Maryn Y, Gerrits E, De Bodt M. A Meta-Analysis: Acoustic Measurement of Roughness and Breathiness. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:298-323. [PMID: 29392295 DOI: 10.1044/2017_jslhr-s-16-0188] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 10/25/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE Over the last 5 decades, many acoustic measures have been created to measure roughness and breathiness. The aim of this study is to present a meta-analysis of correlation coefficients (r) between auditory-perceptual judgment of roughness and breathiness and various acoustic measures in both sustained vowels and continuous speech. METHOD Scientific literature reporting perceptual-acoustic correlations on roughness and breathiness were sought in 28 databases. Weighted average correlation coefficients (rw) were calculated when multiple r-values were available for a specific acoustic marker. An rw ≥ .60 was the threshold for an acoustic measure to be considered acceptable. RESULTS From 103 studies of roughness and 107 studies of breathiness that were investigated, only 33 studies and 34 studies, respectively, met the inclusion criteria of the meta-analysis on sustained vowels. Eighty-six acoustic measures were identified for roughness and 85 acoustic measures for breathiness on sustained vowels, in which 43 and 39 measures, respectively, yielded multiple r-values. Finally, only 14 measures for roughness and 12 measures for breathiness produced rw ≥ .60. On continuous speech, 4 measures for roughness and 21 measures for breathiness were identified, yielding 3 and 6 measures, respectively, with multiple r-values in which only 1 and 2, respectively, had rw ≥ .60. CONCLUSION This meta-analysis showed that only a few acoustic parameters were determined as the best estimators for roughness and breathiness.
Collapse
Affiliation(s)
- Ben Barsties V Latoszek
- Faculty of Medicine and Health Sciences, University of Antwerp, Belgium
- Institute of Health Studies, HAN University of Applied Sciences, Nijmegen, the Netherlands
| | - Youri Maryn
- Faculty of Medicine and Health Sciences, University of Antwerp, Belgium
- European Institute for ORL, Sint-Augustinus Hospital, Antwerp, Belgium
- Faculty of Education, Health & Social Work, University College Ghent, Belgium
| | - Ellen Gerrits
- Faculty of Health Care, HU University of Applied Sciences Utrecht, the Netherlands
- Faculty of Humanities, University of Utrecht, the Netherlands
- Department of Otolaryngology, University Medical Center Utrecht, the Netherlands
| | - Marc De Bodt
- Faculty of Medicine and Health Sciences, University of Antwerp, Belgium
- Department of Otorhinolaryngology and Head & Neck Surgery, Antwerp University Hospital, Belgium
- Faculty of Medicine & Health Sciences, University of Ghent, Belgium
| |
Collapse
|
34
|
Rilliard A, d'Alessandro C, Evrard M. Paradigmatic variation of vowels in expressive speech: Acoustic description and dimensional analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:109. [PMID: 29390730 DOI: 10.1121/1.5018433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Acoustic variation in expressive speech at the syllable level is studied. As emotions or attitudes can be conveyed by short spoken words, analysis of paradigmatic variations in vowels is an important issue to characterize the expressive content of such speech segments. The corpus contains 160 sentences produced under seven expressive conditions (Neutral, Anger, Fear, Surprise, Sensuality, Joy, Sadness) acted by a French female speaker (a total of 1120 sentences, 13 140 vowels). Eleven base acoustic parameters are selected for voice source and vocal tract related feature analysis. An acoustic description of the expressions is drawn, using the dimensions of melodic range, intensity, noise, spectral tilt, vocalic space, and dynamic features. The first three functions of a discriminant analysis explain 95% of the variance in the data. These statistical dimensions are consistently associated with acoustic dimensions. Covariation of intensity and F0 explains over 80% of the variance, followed by noise features (8%), covariation of spectral tilt, and F0 (7%). On the basis of isolated vowels alone, expressions are classified with a mean accuracy of 78%.
Collapse
Affiliation(s)
- Albert Rilliard
- Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, Centre National de la Recherche Scientifique, Université Paris-Saclay, F-91405 Orsay, France
| | - Christophe d'Alessandro
- Sorbonne Universités, Université Pierre-et-Marie-Curie, University Paris 06, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 7190, Institut Jean Le Rond d'Alembert, 4 Place Jussieu, F-75005 Paris, France
| | - Marc Evrard
- Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, Centre National de la Recherche Scientifique, Université Paris Sud, Université Paris-Saclay, F-91405 Orsay, France
| |
Collapse
|
35
|
Cernak M, Orozco-Arroyave JR, Rudzicz F, Christensen H, Vásquez-Correa JC, Nöth E. Characterisation of voice quality of Parkinson’s disease using differential phonological posterior features. COMPUT SPEECH LANG 2017. [DOI: 10.1016/j.csl.2017.06.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
36
|
Lee J, Ali H, Ziaei A, Tobey EA, Hansen JHL. The Lombard effect observed in speech produced by cochlear implant users in noisy environments: A naturalistic study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2788. [PMID: 28464686 PMCID: PMC5398925 DOI: 10.1121/1.4979927] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Revised: 03/25/2017] [Accepted: 03/27/2017] [Indexed: 06/02/2023]
Abstract
The Lombard effect is an involuntary response speakers experience in the presence of noise during voice communication. This phenomenon is known to cause changes in speech production such as an increase in intensity, pitch structure, formant characteristics, etc., for enhanced audibility in noisy environments. Although well studied for normal hearing listeners, the Lombard effect has received little, if any, attention in the field of cochlear implants (CIs). The objective of this study is to analyze speech production of CI users who are postlingually deafened adults with respect to environmental context. A total of six adult CI users were recruited to produce spontaneous speech in various realistic environments. Acoustic-phonetic analysis was then carried out to characterize their speech production in these environments. The Lombard effect was observed in the speech production of all CI users who participated in this study in adverse listening environments. The results indicate that both suprasegmental (e.g., F0, glottal spectral tilt and vocal intensity) and segmental (e.g., F1 for /i/ and /u/) features were altered in such environments. The analysis from this study suggests that modification of speech production of CI users under the Lombard effect may contribute to some degree an intelligible communication in adverse noisy environments.
Collapse
Affiliation(s)
- Jaewook Lee
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Hussnain Ali
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Ali Ziaei
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Emily A Tobey
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - John H L Hansen
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| |
Collapse
|
37
|
Matheron D, Stathopoulos ET, Huber JE, Sussman JE. Laryngeal Aerodynamics in Healthy Older Adults and Adults With Parkinson's Disease. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:507-524. [PMID: 28241225 PMCID: PMC5544190 DOI: 10.1044/2016_jslhr-s-14-0314] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Revised: 08/07/2015] [Accepted: 06/23/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE The present study compared laryngeal aerodynamic function of healthy older adults (HOA) to adults with Parkinson's disease (PD) while speaking at a comfortable and increased vocal intensity. METHOD Laryngeal aerodynamic measures (subglottal pressure, peak-to-peak flow, minimum flow, and open quotient [OQ]) were compared between HOAs and individuals with PD who had a diagnosis of hypophonia. Increased vocal intensity was elicited via monaurally presented multitalker background noise. RESULTS At a comfortable speaking intensity, HOAs and individuals with PD produced comparable vocal intensity, rates of vocal fold closure, and minimum flow. HOAs used smaller OQs, higher subglottal pressure, and lower peak-to-peak flow than individuals with PD. Both groups increased speaking intensity when speaking in noise to the same degree. However, HOAs produced increased intensity with greater driving pressure, faster vocal fold closure rates, and smaller OQs than individuals with PD. CONCLUSIONS Monaural background noise elicited equivalent vocal intensity increases in HOAs and individuals with PD. Although both groups used laryngeal mechanisms as expected to increase sound pressure level, they used these mechanisms to different degrees. The HOAs appeared to have better control of the laryngeal mechanism to make changes to their vocal intensity.
Collapse
Affiliation(s)
- Deborah Matheron
- Department of Communicative Disorders and Sciences, SUNY University at Buffalo, NY
- Department of Communication Disorders and Sciences, SUNY College at Cortland, NY
| | | | - Jessica E. Huber
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN
| | - Joan E. Sussman
- Department of Communicative Disorders and Sciences, SUNY University at Buffalo, NY
| |
Collapse
|
38
|
Kunduk M, Vansant MB, Ikuma T, McWhorter A. The Effects of the Menstrual Cycle on Vibratory Characteristics of the Vocal Folds Investigated With High-Speed Digital Imaging. J Voice 2017; 31:182-187. [DOI: 10.1016/j.jvoice.2016.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 07/29/2016] [Accepted: 08/01/2016] [Indexed: 11/16/2022]
|
39
|
Samlan RA, Story BH. Influence of Left-Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:306-321. [PMID: 28199505 DOI: 10.1044/2016_jslhr-s-16-0076] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 05/31/2016] [Indexed: 05/25/2023]
Abstract
PURPOSE The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. METHOD A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality. RESULTS Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration. CONCLUSIONS Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.
Collapse
Affiliation(s)
- Robin A Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson
| |
Collapse
|
40
|
Matar N, Portes C, Lancia L, Legou T, Baider F. Voice Quality and Gender Stereotypes: A Study of Lebanese Women With Reinke's Edema. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:S1608-S1617. [PMID: 28002841 DOI: 10.1044/2016_jslhr-s-15-0047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 02/11/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE Women with Reinke's edema (RW) report being mistaken for men during telephone conversations. For this reason, their masculine-sounding voices are interesting for the study of gender stereotypes. The study's objective is to verify their complaint and to understand the cues used in gender identification. METHOD Using a self-evaluation study, we verified RW's perception of their own voices. We compared the acoustic parameters of vowels produced by 10 RW to those produced by 10 men and 10 women with healthy voices (hereafter referred to as NW) in Lebanese Arabic. We conducted a perception study for the evaluation of RW, healthy men's, and NW voices by naïve listeners. RESULTS RW self-evaluated their voices as masculine and their gender identities as feminine. The acoustic parameters that distinguish RW from NW voices concern fundamental frequency, spectral slope, harmonicity of the voicing signal, and complexity of the spectral envelope. Naïve listeners very often rate RW as surely masculine. CONCLUSIONS Listeners may rate RW's gender incorrectly. These incorrect gender ratings are correlated with acoustic measures of fundamental frequency and voice quality. Further investigations will reveal the contribution of each of these parameters to gender perception and guide the treatment plan of patients complaining of a gender ambiguous voice.
Collapse
Affiliation(s)
- Nayla Matar
- Department of Otolaryngology Head and Neck Surgery, Hôtel-Dieu de France Hospital, Faculty of Medicine, Saint-Joseph University, Beirut, LebanonAix-Marseille Université, Aix-en-Provence, France
| | | | | | | | | |
Collapse
|
41
|
Gerratt BR, Kreiman J, Garellek M. Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:994-1001. [PMID: 27626612 PMCID: PMC5345563 DOI: 10.1044/2016_jslhr-s-15-0307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 12/10/2015] [Accepted: 03/24/2016] [Indexed: 05/21/2023]
Abstract
PURPOSE The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. METHOD Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. RESULTS Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. CONCLUSIONS Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Collapse
|
42
|
Lee SJ, Cho Y, Song JY, Lee D, Kim Y, Kim H. Aging Effect on Korean Female Voice: Acoustic and Perceptual Examinations of Breathiness. Folia Phoniatr Logop 2016; 67:300-7. [PMID: 27160514 DOI: 10.1159/000445290] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE This paper sought to examine perceptual and acoustic characteristics in Korean female voices, focusing on the 'breathy' quality as a function of aging. In addition, we aimed to investigate if the three selected measures, H1-H2, H1-A1, and H1-A3, demonstrated any changes along a sustained vowel production. PARTICIPANTS AND METHODS A total of 42 participants were assigned to two age groups, young women and elderly women. All participants were asked to sustain /a/ as long and as steadily as possible. Perceptual judgments of breathiness were made on the GRBAS scale and by a direct magnitude estimation technique, while three acoustic parameters, H1-H2, H1-A1, and H1-A3, were measured at five measurement time points during the sustained vowel test. RESULTS Results indicated that the H1-H2 and H1-A1 values were significantly lower for elderly women compared to young women, although no difference in the perceptual estimation of breathiness was found between the age groups. Among the acoustic measures, only H1-A1 was significantly regressed against the perceptual estimate of breathiness. In addition, no significant acoustic difference in the measures was found across the five measurement points. CONCLUSION Our findings suggest that the aging voice might not be universally characterized by the breathy quality, which hints at the need for further research on ethnic diversity in vocal quality.
Collapse
Affiliation(s)
- Seung Jin Lee
- Graduate Program in Speech and Language Pathology, Yonsei University, Seoul, Korea
| | | | | | | | | | | |
Collapse
|
43
|
Hadwin PJ, Galindo GE, Daun KJ, Zañartu M, Erath BD, Cataldo E, Peterson SD. Non-stationary Bayesian estimation of parameters from a body cover model of the vocal folds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:2683. [PMID: 27250162 PMCID: PMC10423076 DOI: 10.1121/1.4948755] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 04/15/2016] [Accepted: 04/22/2016] [Indexed: 05/09/2023]
Abstract
The evolution of reduced-order vocal fold models into clinically useful tools for subject-specific diagnosis and treatment hinges upon successfully and accurately representing an individual patient in the modeling framework. This, in turn, requires inference of model parameters from clinical measurements in order to tune a model to the given individual. Bayesian analysis is a powerful tool for estimating model parameter probabilities based upon a set of observed data. In this work, a Bayesian particle filter sampling technique capable of estimating time-varying model parameters, as occur in complex vocal gestures, is introduced. The technique is compared with time-invariant Bayesian estimation and least squares methods for determining both stationary and non-stationary parameters. The current technique accurately estimates the time-varying unknown model parameter and maintains tight credibility bounds. The credibility bounds are particularly relevant from a clinical perspective, as they provide insight into the confidence a clinician should have in the model predictions.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Gabriel E Galindo
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Kyle J Daun
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Byron D Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, New York 13699, USA
| | - Edson Cataldo
- Applied Mathematics Department, Graduate Program in Electrical and Telecommunications Engineering (PPGEET), Universidade Federal Fluminense, Niteroi, Rio de Janeiro, CEP24020-140, Brazil
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
44
|
Relationship of Various Open Quotients With Acoustic Property, Phonation Types, Fundamental Frequency, and Intensity. J Voice 2016; 30:145-57. [DOI: 10.1016/j.jvoice.2015.01.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 01/30/2015] [Indexed: 10/23/2022]
|
45
|
Garellek M, Samlan R, Gerratt BR, Kreiman J. Modeling the voice source in terms of spectral slopes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:1404-10. [PMID: 27036277 PMCID: PMC4818273 DOI: 10.1121/1.4944474] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 12/24/2015] [Accepted: 03/03/2016] [Indexed: 05/20/2023]
Abstract
A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1-H2), the second and fourth harmonics (H2-H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4-2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz-5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1-H2, H2-H4, and H4-2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4-2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality.
Collapse
Affiliation(s)
- Marc Garellek
- Department of Linguistics, University of California, San Diego, 9500 Gilman Drive #0108, La Jolla, California 92023-0108, USA
| | - Robin Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, California 90095-1794, USA
| |
Collapse
|
46
|
Modeling of Breathy Voice Quality Using Pitch-strength Estimates. J Voice 2016; 30:774.e1-774.e7. [PMID: 26775221 DOI: 10.1016/j.jvoice.2015.11.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 11/20/2015] [Indexed: 11/23/2022]
Abstract
BACKGROUND The characteristic voice quality of a speaker conveys important linguistic, paralinguistic, and vocal health-related information. Pitch strength refers to the salience of pitch sensation in a sound and was recently reported to be strongly correlated with the magnitude of perceived breathiness based on a small number of voice stimuli. OBJECTIVE The current study examined the relationship between perceptual judgments of breathiness and computational estimates of pitch strength based on the Aud-SWIPE (P-NP) algorithm for a large number of voice stimuli (330 synthetic and 57 natural). METHODS AND RESULTS Similar to the earlier study, the current results confirm a strong relationship between estimated pitch strength and listener judgments of breathiness such that low pitch-strength values are associated with voices that have high perceived breathiness. Based on this result, a model was developed for the perception of breathy voice quality using a pitch-strength estimator. Regression functions derived between the pitch-strength estimates and perceptual judgments of breathiness obtained from matching task revealed a linear relationship for a subset of the natural stimuli. We then used this function to obtain predicted breathiness values for the synthetic and the remaining natural stimuli. CONCLUSIONS Predicted breathiness values from our model were highly correlated with the perceptual data for both types of stimuli. Systematic differences between the breathiness of natural and synthetic stimuli are discussed.
Collapse
|
47
|
Jesus LMT, Martinez J, Hall A, Ferreira A. Acoustic Correlates of Compensatory Adjustments to the Glottic and Supraglottic Structures in Patients with Unilateral Vocal Fold Paralysis. BIOMED RESEARCH INTERNATIONAL 2015; 2015:704121. [PMID: 26557690 PMCID: PMC4628731 DOI: 10.1155/2015/704121] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Accepted: 04/24/2015] [Indexed: 11/17/2022]
Abstract
The goal of this study was to analyse perceptually and acoustically the voices of patients with Unilateral Vocal Fold Paralysis (UVFP) and compare them to the voices of normal subjects. These voices were analysed perceptually with the GRBAS scale and acoustically using the following parameters: mean fundamental frequency (F0), standard-deviation of F0, jitter (ppq5), shimmer (apq11), mean harmonics-to-noise ratio (HNR), mean first (F1) and second (F2) formants frequency, and standard-deviation of F1 and F2 frequencies. Statistically significant differences were found in all of the perceptual parameters. Also the jitter, shimmer, HNR, standard-deviation of F0, and standard-deviation of the frequency of F2 were statistically different between groups, for both genders. In the male data differences were also found in F1 and F2 frequencies values and in the standard-deviation of the frequency of F1. This study allowed the documentation of the alterations resulting from UVFP and addressed the exploration of parameters with limited information for this pathology.
Collapse
Affiliation(s)
- Luis M. T. Jesus
- Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
- School of Health Sciences (ESSUA), University of Aveiro, 3810-193 Aveiro, Portugal
| | - Joana Martinez
- Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
| | - Andreia Hall
- Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal
- Department of Mathematics (DMat), University of Aveiro, 3810-193 Aveiro, Portugal
| | - Aníbal Ferreira
- Department of Electrical and Computer Engineering, University of Porto, 4200-465 Porto, Portugal
| |
Collapse
|
48
|
Awan SN, Krauss AR, Herbst CT. An Examination of the Relationship Between Electroglottographic Contact Quotient, Electroglottographic Decontacting Phase Profile, and Acoustical Spectral Moments. J Voice 2015; 29:519-29. [DOI: 10.1016/j.jvoice.2014.10.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Accepted: 10/23/2014] [Indexed: 10/23/2022]
|
49
|
Kreiman J, Garellek M, Chen G, Alwan A, Gerratt BR. Perceptual evaluation of voice source models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:1-10. [PMID: 26233000 PMCID: PMC4491021 DOI: 10.1121/1.4922174] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Models of the voice source differ in their fits to natural voices, but it is unclear which differences in fit are perceptually salient. This study examined the relationship between the fit of five voice source models to 40 natural voices, and the degree of perceptual match among stimuli synthesized with each of the modeled sources. Listeners completed a visual sort-and-rate task to compare versions of each voice created with the different source models, and the results were analyzed using multidimensional scaling. Neither fits to pulse shapes nor fits to landmark points on the pulses predicted observed differences in quality. Further, the source models fit the opening phase of the glottal pulses better than they fit the closing phase, but at the same time similarity in quality was better predicted by the timing and amplitude of the negative peak of the flow derivative (part of the closing phase) than by the timing and/or amplitude of peak glottal opening. Results indicate that simply knowing how (or how well) a particular source model fits or does not fit a target source pulse in the time domain provides little insight into what aspects of the voice source are important to listeners.
Collapse
Affiliation(s)
- Jody Kreiman
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, 31-24 Rehabilitation Center, Los Angeles, California 90095-1794, USA
| | - Marc Garellek
- Department of Linguistics, University of California-San Diego, 9500 Gilman Drive #0108, La Jolla, California 92093-0108, USA
| | - Gang Chen
- Department of Electrical Engineering, University of California-Los Angeles, 66-147 G Engineering IV, Los Angeles, California 90095-1594, USA
| | - Abeer Alwan
- Department of Electrical Engineering, University of California-Los Angeles, 66-147 G Engineering IV, Los Angeles, California 90095-1594, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, 31-24 Rehabilitation Center, Los Angeles, California 90095-1794, USA
| |
Collapse
|
50
|
Abramson AS, Tiede MK, Luangthongkum T. Voice Register in Mon: Acoustics and Electroglottography. PHONETICA 2015; 72:237-56. [PMID: 26636544 PMCID: PMC4751869 DOI: 10.1159/000441728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 10/14/2015] [Indexed: 05/20/2023]
Abstract
Mon is spoken in villages in Thailand and Myanmar. The dialect of Ban Nakhonchum, Thailand, has 2 voice registers, modal and breathy; these phonation types, along with other phonetic properties, distinguish minimal pairs. Four native speakers of this dialect recorded repetitions of 14 randomized words (7 minimal pairs) for acoustic analysis. We used a subset of these pairs in a listening test to verify the perceptual robustness of the register distinction. Acoustic analysis found significant differences in noise component, spectral slope and fundamental frequency. In a subsequent session 4 speakers were also recorded using electroglottography, which showed systematic differences in the contact quotient. The salience of these properties in maintaining the register distinction is discussed in the context of possible tonogenesis for this language.
Collapse
Affiliation(s)
- Arthur S. Abramson
- Haskins Laboratories, New Haven, Conn., U.S.A
- Department of Linguistics, University of Connecticut, Storrs, Conn., U.S.A
| | | | | |
Collapse
|