1
|
Näger C, Kniesburges S, Tur B, Schoder S, Becker S. An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model. Bioengineering (Basel) 2023; 10:1343. [PMID: 38135934 PMCID: PMC10740801 DOI: 10.3390/bioengineering10121343] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/12/2023] [Accepted: 11/19/2023] [Indexed: 12/24/2023] Open
Abstract
In the human phonation process, acoustic standing waves in the vocal tract can influence the fluid flow through the glottis as well as vocal fold oscillation. To investigate the amount of acoustic back-coupling, the supraglottal flow field has been recorded via high-speed particle image velocimetry (PIV) in a synthetic larynx model for several configurations with different vocal tract lengths. Based on the obtained velocity fields, acoustic source terms were computed. Additionally, the sound radiation into the far field was recorded via microphone measurements and the vocal fold oscillation via high-speed camera recordings. The PIV measurements revealed that near a vocal tract resonance frequency fR, the vocal fold oscillation frequency fo (and therefore also the flow field's fundamental frequency) jumps onto fR. This is accompanied by a substantial relative increase in aeroacoustic sound generation efficiency. Furthermore, the measurements show that fo-fR-coupling increases vocal efficiency, signal-to-noise ratio, harmonics-to-noise ratio and cepstral peak prominence. At the same time, the glottal volume flow needed for stable vocal fold oscillation decreases strongly. All of this results in an improved voice quality and phonation efficiency so that a person phonating with fo-fR-coupling can phonate longer and with better voice quality.
Collapse
Affiliation(s)
- Christoph Näger
- Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany;
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Bogac Tur
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Stefan Schoder
- Aeroacoustics and Vibroacoustics Group, Institute of Fundamentals and Theory in Electrical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Austria;
| | - Stefan Becker
- Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany;
| |
Collapse
|
2
|
Jakubaß B, Peters G, Kniesburges S, Semmler M, Kirsch A, Gerstenberger C, Gugatschka M, Döllinger M. Effect of functional electric stimulation on phonation in an ex vivo aged ovine model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2803. [PMID: 37154554 DOI: 10.1121/10.0017923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 04/07/2023] [Indexed: 05/10/2023]
Abstract
With age, the atrophy of the thyroarytenoid muscle (TAM), and thus atrophy of the vocal folds, leads to decreased glottal closure, increased breathiness, and a loss in voice quality, which results in a reduced quality of life. A method to counteract the atrophy of the TAM is to induce hypertrophy in the muscle by functional electric stimulation (FES). In this study, phonation experiments were performed with ex vivo larynges of six stimulated and six unstimulated ten-year-old sheep to investigate the impact of FES on phonation. Electrodes were implanted bilaterally near the cricothyroid joint. FES treatment was provided for nine weeks before harvesting. The multimodal measurement setup simultaneously recorded high-speed video of the vocal fold oscillation, the supraglottal acoustic signal, and the subglottal pressure signal. Results of 683 measurements show a 65.6% lower glottal gap index, a 22.7% higher tissue flexibility (measured by the amplitude to length ratio), and a 473.7% higher coefficient of determination (R2) of the regression of subglottal and supraglottal cepstral peak prominence during phonation for the stimulated group. These results suggest that FES improves the phonatory process for aged larynges or presbyphonia.
Collapse
Affiliation(s)
- Bernhard Jakubaß
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Gregor Peters
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Andrijana Kirsch
- Division of Phoniatrics, ENT University Hospital Graz, Medical University of Graz, Auenbruggerplatz 26, Graz 8036, Austria
| | - Claus Gerstenberger
- Division of Phoniatrics, ENT University Hospital Graz, Medical University of Graz, Auenbruggerplatz 26, Graz 8036, Austria
| | - Markus Gugatschka
- Division of Phoniatrics, ENT University Hospital Graz, Medical University of Graz, Auenbruggerplatz 26, Graz 8036, Austria
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| |
Collapse
|
3
|
Echternach M, Nusseck M, Strasding M, Richter B. Differences of Electroglottographical Contact Quotients between Connected Speech and Sustained Phonation in Clinical Measurement of Voice. J Voice 2023:S0892-1997(23)00077-2. [PMID: 36941166 DOI: 10.1016/j.jvoice.2023.02.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/15/2023] [Accepted: 02/15/2023] [Indexed: 03/23/2023]
Abstract
INTRODUCTION In clinical practice, sustained phonation is mostly used for acoustic voice measurements, while perceptual evaluation is based on connected speech. Since sustained phonation could be associated with the use of the singing voice, and since vocal registers are more relevant for singing rather than speech, it is unclear if vocal registers contribute to observable vocal fold contact differences between sustained phonation and speech. MATERIAL AND METHODS Sustained phonation (vowel [a] on comfortable pitch and loudness) and connected speech (German text: Der Nordwind und die Sonne) were analyzed for 1216 subjects (426 with and 790 without dysphonia) using the Laryngograph system (combining electroglottography and audio recordings). From these samples, fundamental frequency (ƒo), contact quotient (CQ), sound pressure level (SPL) and frequency perturbation (jitter first for sustained and cFx for connected speech) were evaluated. RESULTS Compared to connected speech, the values of ƒo and SPL were higher for sustained phonation. For female voices, ƒo difference was greater than for male voices. At the same time, and only for the females, CQ was lower for the sustained phonation, indicating a register difference. CONCLUSION In order to achieve a better comparability, sustained phonation should be standardized regarding the ƒo and SPL values in correspondence to the ƒo and SPL range of reading a text. This should also reduce the risk of using a different register for different types of phonation.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany.
| | - Manfred Nusseck
- Institute of Musicians' Medicine, University of Freiburg Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Malin Strasding
- Division of Fixed Prosthodontics and Biomaterials, Université de Genève, Geneve, Switzerland
| | - Bernhard Richter
- Institute of Musicians' Medicine, University of Freiburg Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
4
|
Anand S. Perceptual and Computational Estimates of Vocal Breathiness and Roughness in Sustained Phonation and Connected Speech. J Voice 2023:S0892-1997(23)00069-3. [PMID: 36933971 DOI: 10.1016/j.jvoice.2023.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 03/18/2023]
Abstract
OBJECTIVE Clinical assessment of voice quality (VQ) often uses a combination of sustained phonations and more prolonged and more complex vocalizations. The purpose of this study was to compare the perceived vocal breathiness and vocal roughness of sustained phonations and connected speech over a wide range of dysphonia severity and to evaluate their relationship with acoustic measures and bioinspired models of breathiness and roughness. METHODS VQ dimension-specific single-variable matching task (SVMT) was used to index the perceived breathiness or roughness of five male and five female talkers on the basis of a sustained /a/ phonation and the 5th CAPE-V sentence. Acoustic measures of cepstral peak, autocorrelation peak and psychoacoustic measures of pitch strength, and temporal envelope standard deviation (EnvSD) was used to predict perceived breathiness and roughness judgments obtained from 10 listeners, respectively. RESULTS High intra- and inter-listener reliability was observed for sustained phonations and connected speech. Perceived breathiness and roughness of sustained vowels and sentences obtained using SVMT were highly correlated for most dysphonic voices. The pitch strength model of breathiness was able to capture larger amount of perceptual variance compared to cepstral peak in both vowels and sentences. Autocorrelation peak was strongly correlated to perceived roughness in sentences while EnvSD was strongly correlated to perceived roughness in vowels. CONCLUSIONS Results provide evidence that perception of VQ via SVMT can be successfully extended to connected speech. Computational models of VQ can be easily adapted to connected speech. Such automated models of VQ perception are valuable due to their computational efficiency and their ability to accurately capture the non-linearities of the human auditory system.
Collapse
Affiliation(s)
- Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
| |
Collapse
|
5
|
Lee Y, Park H, Lim D, Kim G. Usefulness of Direct Magnitude Estimation (DME) in Auditory Perceptual Assessments Measuring Dysphonia Severity. J Voice 2022. [DOI: 10.1016/j.jvoice.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
6
|
Ghasemzadeh H, Doyle PC, Searl J. Image representation of the acoustic signal: An effective tool for modeling spectral and temporal dynamics of connected speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:580. [PMID: 35931551 PMCID: PMC9458292 DOI: 10.1121/10.0012734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/09/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
Abstract
Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.
Collapse
Affiliation(s)
- Hamzeh Ghasemzadeh
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, One Bowdoin Square, 11th Floor, Boston, Massachusetts 02114, USA
| | - Philip C Doyle
- Department of Otolaryngology Head and Neck Surgery, Division of Laryngology, Stanford University School of Medicine, Stanford University, 801 Welch Road, Stanford, California. 94305, USA
| | - Jeff Searl
- Department of Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, Oyer Speech & Hearing Building, East Lansing, Michigan 48824, USA
| |
Collapse
|
7
|
Gómez-García J, Moro-Velázquez L, Arias-Londoño J, Godino-Llorente J. On the design of automatic voice condition analysis systems. Part III: review of acoustic modelling strategies. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102049] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Schlegel P, Kist AM, Kunduk M, Dürr S, Döllinger M, Schützenberger A. Interdependencies between acoustic and high-speed videoendoscopy parameters. PLoS One 2021; 16:e0246136. [PMID: 33529244 PMCID: PMC7853476 DOI: 10.1371/journal.pone.0246136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≥ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California, United States of America
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
- * E-mail:
| | - Andreas M. Kist
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Melda Kunduk
- Dep. of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Stephan Dürr
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
9
|
Schlegel P, Kniesburges S, Dürr S, Schützenberger A, Döllinger M. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Sci Rep 2020; 10:10517. [PMID: 32601277 PMCID: PMC7324600 DOI: 10.1038/s41598-020-66405-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 11/13/2022] Open
Abstract
In voice research and clinical assessment, many objective parameters are in use. However, there is no commonly used set of parameters that reflect certain voice disorders, such as functional dysphonia (FD); i.e. disorders with no visible anatomical changes. Hence, 358 high-speed videoendoscopy (HSV) recordings (159 normal females (NF), 101 FD females (FDF), 66 normal males (NM), 32 FD males (FDM)) were analyzed. We investigated 91 quantitative HSV parameters towards their significance. First, 25 highly correlated parameters were discarded. Second, further 54 parameters were discarded by using a LogitBoost decision stumps approach. This yielded a subset of 12 parameters sufficient to reflect functional dysphonia. These parameters separated groups NF vs. FDF and NM vs. FDM with fair accuracy of 0.745 or 0.768, respectively. Parameters solely computed from the changing glottal area waveform (1D-function called GAW) between the vocal folds were less important than parameters describing the oscillation characteristics along the vocal folds (2D-function called Phonovibrogram). Regularity of GAW phases and peak shape, harmonic structure and Phonovibrogram-based vocal fold open and closing angles were mainly important. This study showed the high degree of redundancy of HSV-voice-parameters but also affirms the need of multidimensional based assessment of clinical data.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.
| | - Stefan Kniesburges
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stephan Dürr
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
10
|
On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.12.024] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
11
|
Kacha A, Grenez F, Schoentgen J. Multiband vocal dysperiodicities analysis using empirical mode decomposition in the log-spectral domain. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2014.08.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
An Examination of Variations in the Cepstral Spectral Index of Dysphonia Across a Single Breath Group in Connected Speech. J Voice 2015; 29:26-34. [DOI: 10.1016/j.jvoice.2014.04.012] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 04/23/2014] [Accepted: 04/28/2014] [Indexed: 11/23/2022]
|
13
|
Leong K, Hawkshaw MJ, Dentchev D, Gupta R, Lurie D, Sataloff RT. Reliability of Objective Voice Measures of Normal Speaking Voices. J Voice 2013; 27:170-6. [DOI: 10.1016/j.jvoice.2012.07.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Accepted: 07/10/2012] [Indexed: 11/27/2022]
|
14
|
Choi SH, Zhang Y, Jiang JJ, Bless DM, Welham NV. Nonlinear dynamic-based analysis of severe dysphonia in patients with vocal fold scar and sulcus vocalis. J Voice 2012; 26:566-76. [PMID: 22516315 DOI: 10.1016/j.jvoice.2011.09.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Accepted: 09/15/2011] [Indexed: 11/24/2022]
Abstract
OBJECTIVE The primary goal of this study was to evaluate a nonlinear dynamic approach to the acoustic analysis of dysphonia associated with vocal fold scar and sulcus vocalis. STUDY DESIGN Case-control study. METHODS Acoustic voice samples from scar/sulcus patients and age-/sex-matched controls were analyzed using correlation dimension (D2) and phase plots, time-domain based perturbation indices (jitter, shimmer, signal-to-noise ratio [SNR]), and an auditory-perceptual rating scheme. Signal typing was performed to identify samples with bifurcations and aperiodicity. RESULTS Type 2 and 3 acoustic signals were highly represented in the scar/sulcus patient group. When data were analyzed irrespective of signal type, all perceptual and acoustic indices successfully distinguished scar/sulcus patients from controls. Removal of type 2 and 3 signals eliminated the previously identified differences between experimental groups for all acoustic indices except D2. The strongest perceptual-acoustic correlation in our data set was observed for SNR and the weakest correlation was observed for D2. CONCLUSIONS These findings suggest that D2 is inferior to time-domain based perturbation measures for the analysis of dysphonia associated with scar/sulcus; however, time-domain based algorithms are inherently susceptible to inflation under highly aperiodic (ie, type 2 and 3) signal conditions. Auditory-perceptual analysis, unhindered by signal aperiodicity, is therefore a robust strategy for distinguishing scar/sulcus patient voices from normal voices. Future acoustic analysis research in this area should consider alternative (e.g., frequency- and quefrency-domain based) measures alongside additional nonlinear approaches.
Collapse
Affiliation(s)
- Seong Hee Choi
- Department of Audiology and Speech-Language Pathology, Catholic University of Daegu, Gyeongsan, Korea
| | | | | | | | | |
Collapse
|
15
|
Watts CR, Awan SN. Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1525-1537. [PMID: 22180020 DOI: 10.1044/1092-4388(2011/10-0209)] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE In this study, the authors evaluated the diagnostic value of spectral/cepstral measures to differentiate dysphonic from nondysphonic voices using sustained vowels and continuous speech samples. METHODOLOGY Thirty-two age- and gender-matched individuals (16 participants with dysphonia and 16 controls) were recorded reading a standard passage (The Rainbow Passage; Fairbanks, 1960) and sustaining the vowel /α/. Recorded voices were analyzed with custom software that calculated 4 spectral/cepstral measures. RESULTS Measures of cepstral peak prominence (CPP) and low-high spectral ratio (L/H ratio) were significantly different between groups in both speaking conditions; the standard deviation of the CPP was significantly different between groups in continuous speech only. In differentiating dysphonic individuals with a hypofunctional etiology from nondysphonic individuals, receiver operating characteristic (ROC) analyses demonstrated (a) high sensitivity and high specificity for the CPP in the sustained vowel condition and (b) high sensitivity and moderate specificity for the CPP in the speech condition. CONCLUSIONS In a sample of dysphonic speakers (hypofunctional etiologies) versus typical speakers, spectral/cepstral measures of CPP and L/H ratio were able to differentiate these groups from one another in both vowel prolongation and continuous speech contexts with high sensitivity and specificity. The results of this study support the growing body of literature documenting the significant value of cepstral and other spectral-based acoustic measures to the clinical evaluation and management processes.
Collapse
|
16
|
Awan SN, Helou LB, Stojadinovic A, Solomon NP. Tracking voice change after thyroidectomy: application of spectral/cepstral analyses. CLINICAL LINGUISTICS & PHONETICS 2011; 25:302-320. [PMID: 21158501 DOI: 10.3109/02699206.2010.535646] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This study evaluates the utility of perioperative spectral and cepstral acoustic analyses to monitor voice change after thyroidectomy. Perceptual and acoustic analyses were conducted on speech samples (sustained vowel /α/ and CAPE-V sentences) provided by 70 participants (36 women and 34 men) at four study time points: prior to thyroid surgery and 2 weeks, 3 months and 6 months after thyroidectomy. Repeated measures analyses of variance focused on the relative amplitude of the dominant harmonic in the voice signal (cepstral peak prominence, CPP), the ratio of low-to-high spectral energy, and their respective standard deviations (SD). Data were also examined for relationships between acoustic measures and perceptual ratings of overall severity of voice quality. Results showed that perceived overall severity and the acoustic measures of the CPP and its SD (CPPsd) computed from sentence productions were significantly reduced at 2-week post-thyroidectomy for 20 patients (29% of the sample) who had self-reported post-operative voice change. For this same group of patients, the CPP and CPPsd computed from sentence productions improved significantly from 2-weeks post-thyroidectomy to 6-months post-surgery. CPP and CPPsd also correlated well with perceived overall severity (r = -0.68 and -0.79, respectively). Measures of CPP from sustained vowel productions were not as effective as those from sentence productions in reflecting voice deterioration in the post-thyroidectomy patients at the 2-week post-surgery time period, were weaker correlates with perceived overall severity, and were not as effective in discriminating negative voice outcome (NegVO) from normal voice outcome (NormVO) patients as compared to the results from the sentence-level stimuli. Results indicate that spectral/cepstral analysis methods can be used with continuous speech samples to provide important objective data to document the effects of dysphonia in a post-thyroidectomy patient sample. When used in conjunction with patient's self-report and other general measures of vocal dysfunction, the acoustic measures employed in this study contribute to a complete profile of the patient's vocal condition.
Collapse
Affiliation(s)
- Shaheen N Awan
- Department of Audiology & Speech Pathology, Bloomsburg University of Pennsylvania, Bloomsburg, PA 17815-1301, USA.
| | | | | | | |
Collapse
|
17
|
Awan SN, Roy N, Jetté ME, Meltzner GS, Hillman RE. Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V. CLINICAL LINGUISTICS & PHONETICS 2010; 24:742-58. [PMID: 20687828 DOI: 10.3109/02699206.2010.492446] [Citation(s) in RCA: 181] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
This study investigated the relationship between acoustic spectral/cepstral measures and listener severity ratings in normal and disordered voice samples. CAPE-V sentence samples and the vowel /a/were elicited from eight normal speakers and 24 patients with varying degrees of dysphonia severity. Samples were analysed for measures of the cepstral peak prominence (CPP), the ratio of low-to-high spectral energy, and their respective standard deviations. Perceptual ratings of overall severity were also obtained for all samples. Results showed that all acoustic variables combined in a four-factor model which correlated with perceived severity with R = 0.81 (R(2) = 0.65). For the vowel /a/, a five-factor model incorporating all acoustic variables and gender correlated with perceived severity with R = 0.96 (R(2) = 0.91). Results indicate that a strong relationship between perceptual and acoustic estimates of dysphonia severity can be achieved in both continuous speech and vowel contexts using a model incorporating spectral/cepstral measures.
Collapse
Affiliation(s)
- Shaheen N Awan
- Department of Audiology & Speech Pathology, Bloomsburg University of Pennsylvania, Bloomsburg, PA 17815-1301, USA.
| | | | | | | | | |
Collapse
|
18
|
Maryn Y, Corthals P, Van Cauwenberge P, Roy N, De Bodt M. Toward Improved Ecological Validity in the Acoustic Measurement of Overall Voice Quality: Combining Continuous Speech and Sustained Vowels. J Voice 2010; 24:540-55. [DOI: 10.1016/j.jvoice.2008.12.014] [Citation(s) in RCA: 229] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2008] [Accepted: 12/31/2008] [Indexed: 01/09/2023]
|
19
|
McDonald R, Parsa V, Doyle PC. Objective estimation of tracheoesophageal speech ratings using an auditory model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1032-1041. [PMID: 20136224 DOI: 10.1121/1.3270396] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Total laryngectomy is often the treatment of choice for many individuals diagnosed with advanced laryngeal cancer. This procedure alters the normal voice production mechanism, and tracheoesophageal (TE) speech is one alternative method of voicing postlaryngectomy. TE speech is created when pulmonary air is passed through the upper esophagus to create a vibratory source that is then articulated into speech. TE speech is often characterized by abnormal voice quality. Acoustic analysis of TE speech has the potential of quantifying the voice quality and assisting the speech language pathologist in facilitating rehabilitation. Motivated in part by the recent advances in telecommunication industry for speech quality estimation, this paper investigated the application of an auditory model in predicting the ratings of TE speech by normal hearing listeners. The Moore-Glasberg auditory model was employed to extract perceptually relevant features from the acoustic waveform, and these features were later combined to estimate the subjective ratings of TE speech. This approach was validated with a database of subjective ratings of speech samples recorded from 35 TE speakers. Results showed moderate correlations between the objective metrics and the subjective ratings, and these correlations were significantly better than those obtained with traditional methods used in the telecommunication applications.
Collapse
Affiliation(s)
- Robert McDonald
- Department of Electrical and Computer Engineering, and National Centre for Audiology, University of Western Ontario, London, N6A 4B8 Ontario, Canada
| | | | | |
Collapse
|
20
|
Maryn Y, Roy N, De Bodt M, Van Cauwenberge P, Corthals P. Acoustic measurement of overall voice quality: a meta-analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2619-34. [PMID: 19894840 DOI: 10.1121/1.3224706] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Over the past several decades, many acoustic markers have been proposed to be sensitive to and measure overall voice quality. This meta-analysis presents a retrospective appraisal of scientific reports, which evaluated the relation between perceived overall voice quality and several acoustic-phonetic correlates. Twenty-five studies met the inclusion criteria and were evaluated using meta-analytic techniques. Correlation coefficients between perceptual judgments and acoustic measures were computed. Where more than one correlation coefficient for a specific acoustic marker was available, a weighted average correlation coefficient was calculated. This was the case in 36 acoustic measures on sustained vowels and in 3 measures on continuous speech. Acoustic measures were ranked according to the strength of the correlation with perceptual voice quality ratings. Acoustic markers with more than one correlation value available in literature and yielding a homogeneous weighted r of 0.60 or above were considered to be superior. The meta-analysis identified four measures that met these criteria in sustained vowels and three measures in continuous speech. Although acoustic measures are routinely utilized in clinical voice examinations, the results of this meta-analysis suggest that caution is warranted regarding the concurrent validity and thus the clinical utility of many of these measures.
Collapse
Affiliation(s)
- Youri Maryn
- Department of Speech-Language Pathology and Audiology, Sint-Jan General Hospital, Ruddershove 10, 8000 Bruges, Belgium.
| | | | | | | | | |
Collapse
|
21
|
Awan SN, Roy N, Dromey C. Estimating dysphonia severity in continuous speech: application of a multi-parameter spectral/cepstral model. CLINICAL LINGUISTICS & PHONETICS 2009; 23:825-41. [PMID: 19891523 DOI: 10.3109/02699200903242988] [Citation(s) in RCA: 140] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The purpose of the study was to identify a sub-set of spectral/cepstral-based analysis methods that would most effectively predict dysphonia severity (as estimated via auditory-perceptual analysis) in samples of continuous speech. Acoustic estimates of dysphonia severity were used as an objective treatment outcomes measure in a set of pre- vs post-treatment speech samples. Pre- and post-treatment continuous speech samples from 104 females with primary muscle tension dysphonia (MTD) were rated by listeners using a 100 point visual analogue scale (VAS) and analysed acoustically with spectral/cepstral-based measures. Stepwise linear regression produced a three-factor model consisting of the cepstral peak prominence (CPP); the mean ratio of low-to-high frequency spectral energy; and the standard deviation of the ratio of low-to-high frequency spectral energy that was strongly correlated with perceived dysphonia severity ratings (R = .85; R2 = .73). Mean differences between predicted vs perceptual ratings for pre- and post-treatment speech samples were < 6 points on the 100 point VAS; mean absolute differences between predicted and perceived ratings were < 16 points on the 100 point VAS (equivalent to within one scale value on commonly used 7-point equal-appearing interval rating scales). A multi-parameter acoustic model consisting of spectral/cepstral-based measures shows considerable promise as an objective measure of dysphonia severity in continuous speech, even across the diverse voice types and severities observed in pre- and post-treatment MTD speech samples.
Collapse
Affiliation(s)
- Shaheen N Awan
- Department of Audiology & Speech Pathology, Bloomsburg University of Pennsylvania, Bloomsburg, PA 17815-1301, USA.
| | | | | |
Collapse
|
22
|
|
23
|
Zhang Y, Jiang JJ. Acoustic Analyses of Sustained and Running Voices From Patients With Laryngeal Pathologies. J Voice 2008; 22:1-9. [PMID: 16978835 DOI: 10.1016/j.jvoice.2006.08.003] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2006] [Accepted: 08/03/2006] [Indexed: 10/24/2022]
Abstract
In this paper, we investigated the acoustic characteristics of sustained and running vowels from normal subjects and patients with laryngeal pathologies. Perturbation methods (including jitter and shimmer), signal-to-noise ratio (SNR), and nonlinear dynamic methods (such as correlation dimension and second-order entropy) were used to analyze sustained and running vowels. We found that the sustained vowels and running voices from normal subjects and patients with laryngeal pathologies had low-dimensional dynamic characteristics. For sustained vowels, the analyses of jitter, shimmer, correlation dimension, and second-order entropy revealed significant differences between normal and pathological voices. For running voices, jitter and shimmer did not statistically discriminate between normal and pathological voices, but a significant difference was found for SNR, correlation dimension, and second-order entropy. The results suggest that nonlinear dynamic analysis and traditional SNR analysis may be valuable for the analysis of sustained and running vowels; perturbation analysis may be applicable for the analysis of sustained vowels but should be applied with caution for running voice analysis.
Collapse
Affiliation(s)
- Yu Zhang
- Department of Surgery, Division of Otolaryngology Head and Neck Surgery, University of Wisconsin Medical School, Madison, Wisconsin 53706, USA
| | | |
Collapse
|
24
|
Eadie TL, Baylor CR. The Effect of Perceptual Training on Inexperienced Listeners' Judgments of Dysphonic Voice. J Voice 2006; 20:527-44. [PMID: 16324823 DOI: 10.1016/j.jvoice.2005.08.007] [Citation(s) in RCA: 156] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2005] [Accepted: 08/20/2005] [Indexed: 10/25/2022]
Abstract
OBJECTIVES/HYPOTHESIS The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN Prospective, single group, pre- and postdesign. METHODS Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.
Collapse
Affiliation(s)
- Tanya L Eadie
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA 98105, USA.
| | | |
Collapse
|
25
|
Kacha A, Bettens F, Grenez F. Vocal dysperiodicities estimation by means of adaptive long-term prediction. Med Biol Eng Comput 2006; 44:61-8. [PMID: 16929922 DOI: 10.1007/s11517-005-0003-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
An adaptive formulation of the long-term bidirectional linear predictive analysis is proposed in the context of the acoustic assessment of disordered speech. Vocal dysperiodicities are summarized by means of a signal-to-dysperiodicity ratio (SDR) marker. It is shown that performing an adaptive forward and backward long-term linear prediction of each speech sample and retaining the minimal prediction error energy as a cue of vocal dysperiodicity results in an SDR that correlates with the perceived degree of hoarseness. The coefficients of the time-varying long-term linear predictive model are estimated by means of the recursive least squares algorithm. The corpora comprise sustained vowels and French sentences produced by male and female normophonic and dysphonic speakers. A perceptual assessment of speech samples, which rests on comparative judgments, is used to evaluate the ability of the acoustic marker to predict subjective measures of voice quality. Experimental results show that the adaptive approach gives rise to high correlations for sustained vowels as well as for sentences.
Collapse
Affiliation(s)
- Abdellah Kacha
- Service Ondes et Signaux, Faculté des Sciences Appliquées, Université Libre de Bruxelles, Av. F. D. Roosevelt 50, CP 165/51, 1050 Bruxelles, Belgium.
| | | | | |
Collapse
|
26
|
Kacha A, Grenez F, Schoentgen J. Multiband frame-based acoustic cues of vocal dysperiodicities in disordered connected speech. Biomed Signal Process Control 2006. [DOI: 10.1016/j.bspc.2006.07.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
27
|
Umapathy K, Krishnan S. Feature analysis of pathological speech signals using local discriminant bases technique. Med Biol Eng Comput 2005; 43:457-64. [PMID: 16255427 DOI: 10.1007/bf02344726] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Speech is an integral part of the human communication system. Various pathological conditions affect the vocal functions, inducing speech disorders. Acoustic parameters of speech are commonly used for the assessment of speech disorders and for monitoring the progress of the patient over the course of therapy. In the last two decades, signal-processing techniques have been successfully applied in screening speech disorders. In the paper, a novel approach is proposed to classify pathological speech signals using a local discriminant bases (LDB) algorithm and wavelet packet decompositions. The focus of the paper was to demonstrate the significance of identifying the signal subspaces that contribute to the discriminatory characteristics of normal and pathological speech signals in a computationally efficient way. Features were extracted from target subspaces for classification, and time-frequency decomposition was used to eliminate the need for segmentation of the speech signals. The technique was tested with a database of 212 speech signals (51 normal and 161 pathological) using the Daubechies wavelet (db4). Classification accuracies up to 96% were achieved for a two-group classification as normal and pathological speech signals, and 74% was achieved for a four-group classification as male normal, female normal, male pathological and female pathological signals.
Collapse
Affiliation(s)
- K Umapathy
- Department of Electrical & Computer Engineering, The University of Western Ontario, London, Canada
| | | |
Collapse
|
28
|
Umapathy K, Krishnan S, Parsa V, Jamieson DG. Discrimination of pathological voices using a time-frequency approach. IEEE Trans Biomed Eng 2005; 52:421-30. [PMID: 15759572 DOI: 10.1109/tbme.2004.842962] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Acoustical measures of vocal function are routinely used in the assessments of disordered voice, and for monitoring the patient's progress over the course of voice therapy. Typically, acoustic measures are extracted from sustained vowel stimuli where short-term and long-term perturbations in fundamental frequency and intensity, and the level of "glottal noise" are used to characterize the vocal function. However, acoustic measures extracted from continuous speech samples may well be required for accurate prediction of abnormal voice quality that is relevant to the client's "real world" experience. In contrast with sustained vowel research, there is relatively sparse literature on the effectiveness of acoustic measures extracted from continuous speech samples. This is partially due to the challenge of segmenting the speech signal into voiced, unvoiced, and silence periods before features can be extracted for vocal function characterization. In this paper we propose a joint time-frequency approach for classifying pathological voices using continuous speech signals that obviates the need for such segmentation. The speech signals were decomposed using an adaptive time-frequency transform algorithm, and several features such as the octave max, octave mean, energy ratio, length ratio, and frequency ratio were extracted from the decomposition parameters and analyzed using statistical pattern classification techniques. Experiments with a database consisting of continuous speech samples from 51 normal and 161 pathological talkers yielded a classification accuracy of 93.4%.
Collapse
Affiliation(s)
- Karthikeyan Umapathy
- Department of Electrical and Computer Engineering, The University of Western Ontario, London, ON N6A 5B9, Canada
| | | | | | | |
Collapse
|
29
|
Eadie TL, Doyle PC. Classification of Dysphonic Voice: Acoustic and Auditory-Perceptual Measures. J Voice 2005; 19:1-14. [PMID: 15766846 DOI: 10.1016/j.jvoice.2004.02.002] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2004] [Indexed: 11/15/2022]
Abstract
The purpose of this study was (1) to determine the relationship between acoustic measures and auditory-perceptual dimensions of overall voice severity and pleasantness and (2) to evaluate the ability of acoustic and auditory-perceptual measures to discriminate normal from dysphonic voices. Thirty adult dysphonic speakers and six, age-matched normal control speakers were asked to provide oral reading samples of the Rainbow Passage. Acoustic analysis of the speech samples was used to identify abnormal phonatory events associated with dysphonia. The acoustic program calculated long-term average spectral measures, glottal noise measures, and those measures based on linear prediction (LP) modeling. Twelve adult listeners judged overall voice severity and pleasantness from the connected speech samples using direct magnitude estimation (DME) procedures. The acoustic measures accounted for 48% of overall voice severity and 40% of voice pleasantness for dysphonic speakers. The classification performance of the acoustic measures and auditory-perceptual measures was quantified using logistic regression analysis. When acoustic measures or auditory-perceptual measures were considered in isolation, classification was generally accurate and similar across measures. Classification accuracy improved to 100% when acoustic and auditory-perceptual measures were combined. These data provide further support for use of both auditory-perceptual evaluation and acoustic analyses for classifying and evaluating dysphonia.
Collapse
Affiliation(s)
- Tanya L Eadie
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
30
|
Bettens F, Grenez F, Schoentgen J. Estimation of vocal dysperiodicities in disordered connected speech by means of distant-sample bidirectional linear predictive analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:328-337. [PMID: 15704425 DOI: 10.1121/1.1835511] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The article presents an analysis of vocal dysperiodicities in connected speech produced by dysphonic speakers. The processing is based on a comparison of the present speech fragment with future and past fragments. The size of the dysperiodicity estimate is zero for periodic speech signals. A feeble increase of the vocal dysperiodicity is guaranteed to produce a feeble increase of the estimate. No spurious noise boosting occurs owing to cycle insertion and omission errors, or phonetic segment boundary artifacts. Additional objectives of the study have been investigating whether deviations from periodicity are larger or more commonplace in connected speech than in sustained vowels, and whether sentences that comprise frequent voice onsets and offsets are noisier than sentences that comprise few. The corpora contain sustained vowels as well as grammatically- and phonetically matched sentences. An acoustic marker that correlates with the perceived degree of hoarseness summarizes the size of the dysperiodicities. The marker values for sustained vowels have been highly correlated with those for connected speech, and the marker values for sentences that comprise few voiced/unvoiced transients have been highly correlated with the marker values for sentences that comprise many.
Collapse
Affiliation(s)
- Frédéric Bettens
- Department Signals and Waves, Université Libre de Bruxelles, B-1050 Brussels, Belgium
| | | | | |
Collapse
|
31
|
Eadie TL, Doyle PC. Direct magnitude estimation and interval scaling of pleasantness and severity in dysphonic and normal speakers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2002; 112:3014-3021. [PMID: 12509023 DOI: 10.1121/1.1518983] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The purpose of this study was to determine the validity of voice pleasantness and overall voice severity ratings of dysphonic and normal speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) auditory-perceptual scaling procedures. Twelve naive listeners perceptually evaluated voice pleasantness and severity from connected speech samples produced by 24 adult dysphonic speakers and 6 normal adult speakers. A statistical comparison of the two auditory-perceptual scales yielded a linear relationship representative of a metathetic continuum for voice pleasantness. A statistical relationship that is consistent with a prothetic continuum was revealed for ratings of voice severity. These data provide support for the use of either DME or EAI scales when making auditory-perceptual judgments of pleasantness, but only DME scales when judging overall voice severity for dysphonic speakers. These results suggest further psychophysical study of perceptual dimensions of voice and speech must be undertaken in order to avoid the inappropriate and invalid use of EAI scales used in the auditory-perceptual evaluation of the normal and dysphonic voice.
Collapse
Affiliation(s)
- Tanya L Eadie
- Voice Production and Perception Laboratory, School of Communication Sciences and Disorders, Elborn College, University of Western Ontario, London, Ontario N6G 1H1, Canada.
| | | |
Collapse
|
32
|
Parsa V, Jamieson DG. Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2001; 44:327-339. [PMID: 11324655 DOI: 10.1044/1092-4388(2001/027)] [Citation(s) in RCA: 154] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
We investigated the ability of acoustic measures to discriminate between normal and pathological talkers. Two groups of measures were compared: (a) those extracted from sustained vowels and (b) those based on continuous speech samples. Nine acoustic measures, which include fundamental frequency and amplitude perturbation measures, long term average spectral measures, and glottal noise measures were extracted from both sustained vowel and continuous speech samples. Our experiments were performed on a published database of 53 normal talkers and 175 talkers with a pathological voice. The classification performance of the nine acoustic measures was quantified using linear discriminant analysis and receiver operating characteristic (ROC) curve analysis. When individual measures were considered in isolation, classification was more accurate for measures extracted from sustained vowels than for those based on continuous speech samples. Classification accuracy improved when combinations of acoustic parameters were considered. For such combinations of measures, classification results were comparable for measures extracted from continuous speech samples and for those based on sustained vowels.
Collapse
Affiliation(s)
- V Parsa
- National Centre for Audiology, The University of Western Ontario, London, Canada
| | | |
Collapse
|