1
|
Story BH, Bunton K. The relation of velopharyngeal coupling area and vocal tract scaling to identification of stop-nasal cognates. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3741-3759. [PMID: 38099832 DOI: 10.1121/10.0023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023]
Abstract
The purpose of this study was to determine whether the threshold of velopharyngeal (VP) coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English was different for speech produced by a model based on an adult male, an adult female, and a 4-year-old child. V1CV2 stimuli were generated with a speech production model that encodes phonetic segments as relative acoustic targets imposed on an underlying vocal tract and laryngeal structure that can be scaled according to sex and age. Each V1CV2 was synthesized with a set of VP coupling functions whose maximum area ranged from 0 to 0.1 cm2. Results showed that scaling the vocal tract and vocal folds had essentially no effect on the VP coupling area at which listener identification shifted from stop to nasal. The range of coupling areas at which the crossover occurred was 0.037-0.049 cm2 for the male model, 0.040-0.055 cm2 for the female model, and 0.039-0.052 cm2 for the 4-year-old child model, and overall mean was 0.044 cm2. Calculations of band limited peak nasalance indicated that 85% peak nasalance during the consonant was well aligned with listener responses.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| |
Collapse
|
2
|
Kotzor S, Wetterlin A, Roberts AC, Reetz H, Lahiri A. Bengali nasal vowels: lexical representation and listener perception. PHONETICA 2022; 79:115-150. [PMID: 35619051 DOI: 10.1515/phon-2022-2017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This paper focuses on the question of the representation of nasality as well as speakers' awareness and perceptual use of phonetic nasalisation by examining surface nasalisation in two types of vowels in Bengali: underlying nasal vowels (CṼC) and nasalised vowels before a nasal consonant (CVN). A series of three cross-modal forced-choice experiments was used to investigate the hypothesis that only unpredictable nasalisation is stored and that this sparse representation governs how listeners interpret vowel nasality. Visual full-word targets were preceded by auditory primes consisting of CV segments of CVC words with nasal vowels ([tʃɑ̃] for [tʃɑ̃d] 'moon'), oral vowels ([tʃɑ] for [tʃɑl] 'unboiled rice') or nasalised oral vowels ([tʃɑ̃(n)] for [tʃɑ̃n] 'bath') and reaction times and errors were measured. Some targets fully matched the prime while some matched surface or underlying representation only. Faster reaction times and fewer errors were observed after CṼC primes compared to both CVC and CVN primes. Furthermore, any surface nasality was most frequently matched to a CṼC target unless no such target was available. Both reaction times and error data indicate that nasal vowels are specified for nasality leading to faster recognition compared to underspecified oral vowels, which cannot be perfectly matched with incoming signals.
Collapse
Affiliation(s)
- Sandra Kotzor
- Language and Brain Laboratory, Faculty of Linguistics, Philology and Phonetics, University of Oxford, Oxford, UK
| | | | | | - Henning Reetz
- Institut für Phonetik, Goethe Universität Frankfurt, Frankfurt, Germany
| | - Aditi Lahiri
- Language and Brain Laboratory, Faculty of Linguistics, Philology and Phonetics, University of Oxford, Oxford, UK
| |
Collapse
|
3
|
Story BH, Bunton K. The relation of velopharyngeal coupling area to the identification of stop versus nasal consonants in North American English based on speech generated by acoustically driven vocal tract modulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3618. [PMID: 34852618 DOI: 10.1121/10.0007223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/23/2021] [Indexed: 06/13/2023]
Abstract
The purpose of this study was to determine the threshold of velopharyngeal coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English, based on V1CV2 stimuli generated with a speech production model that encodes phonetic segments as relative acoustic targets. Each V1CV2 was synthesized with a set of velopharyngeal coupling functions whose area ranged from 0 to 0.1 cm2. Results show that consonants were identified by listeners as a stop when the coupling area was less than 0.035-0.057 cm2, depending on place of articulation and final vowel. The smallest coupling area (0.035 cm2) at which the stop-to-nasal switch occurred was found for an alveolar consonant in the /ɑCi/ context, whereas the largest (0.057 cm2) was for a bilabial in /ɑCɑ/. For each stimulus, the balance of oral versus nasal acoustic energy was characterized by the peak nasalance during the consonant. Stimuli with peak nasalance below 40% were mostly identified by listeners as stops, whereas those above 40% were identified as nasals. This study was intended to be a precursor to further investigations using the same model but scaled to represent the developing speech production system of male and female talkers.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| |
Collapse
|
4
|
Havel M, Sundberg J, Traser L, Burdumy M, Echternach M. Effects of Nasalization on Vocal Tract Response Curve. J Voice 2021; 37:339-347. [PMID: 33773895 DOI: 10.1016/j.jvoice.2021.02.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/28/2021] [Accepted: 02/02/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND Earlier studies have shown that nasalization affects the radiated spectrum by modifying the vocal tract transfer function in a complex manner. METHODS Here we study this phenomenon by measuring sine-sweep response of 3-D models of the vowels /u, a, ᴂ, i/, derived from volumetric MR imaging, coupled by means of tubes of different lengths and diameters to a 3-D model of a nasal tract. RESULTS The coupling introduced a dip into the vocal tract transfer function. The dip frequency was close to the main resonance of the nasal tract, a result in agreement with the Fujimura & Lindqvist in vivo sweep tone measurements [Fujimura & Lindqvist, 1972]. With increasing size of the coupling tube the depth of the dip increased and the first formant peak either changed in frequency or was split by the dip. Only marginal effects were observed of the paranasal sinuses. For certain coupling tube sizes, the spectrum balance was changed, boosting the formant peaks in the 2 - 4 kHz range. CONCLUSION A velopharyngeal opening introduces a dip in the transfer function at the main resonance of the nasal tract. Its depth increases with the area of the opening and its frequency rises in some vowels.
Collapse
Affiliation(s)
- Miriam Havel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany.
| | - Johan Sundberg
- Dept. of Speech Music Hearing, School of Computer Science and Communication, KTH (Royal Institute of Technology) Stockholm, Sweden; Dept. of Linguistics, Stockholm University, Stockholm, Sweden; University College of Music Education Stockholm, Stockholm, Sweden
| | - Louisa Traser
- Institute of Musicians' Medicine, Medical Center - University of Freiburg, Germany
| | - Michael Burdumy
- Dept. of Radiology, Medical Physics, Medical Center - University of Freiburg, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| |
Collapse
|
5
|
Saxon M, Tripathi A, Jiao Y, Liss J, Berisha V. Robust Estimation of Hypernasality in Dysarthria with Acoustic Model Likelihood Features. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 28:2511-2522. [PMID: 33748328 PMCID: PMC7978228 DOI: 10.1109/taslp.2020.3015035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hypernasality is a common characteristic symptom across many motor-speech disorders. For voiced sounds, hypernasality introduces an additional resonance in the lower frequencies and, for unvoiced sounds, there is reduced articulatory precision due to air escaping through the nasal cavity. However, the acoustic manifestation of these symptoms is highly variable, making hypernasality estimation very challenging, both for human specialists and automated systems. Previous work in this area relies on either engineered features based on statistical signal processing or machine learning models trained on clinical ratings. Engineered features often fail to capture the complex acoustic patterns associated with hypernasality, whereas metrics based on machine learning are prone to overfitting to the small disease-specific speech datasets on which they are trained. Here we propose a new set of acoustic features that capture these complementary dimensions. The features are based on two acoustic models trained on a large corpus of healthy speech. The first acoustic model aims to measure nasal resonance from voiced sounds, whereas the second acoustic model aims to measure articulatory imprecision from unvoiced sounds. To demonstrate that the features derived from these acoustic models are specific to hypernasal speech, we evaluate them across different dysarthria corpora. Our results show that the features generalize even when training on hypernasal speech from one disease and evaluating on hypernasal speech from another disease (e.g., training on Parkinson's disease, evaluation on Huntington's disease), and when training on neurologically disordered speech but evaluating on cleft palate speech.
Collapse
Affiliation(s)
- Michael Saxon
- Arizona State Univ., Sch. of Elect., Comput., & Energy Eng., Tempe, Arizona, USA
| | - Ayush Tripathi
- Arizona State Univ., Sch. of Elect., Comput., & Energy Eng., Tempe, Arizona, USA
| | - Yishan Jiao
- Arizona State Univ., Sch. of Elect., Comput., & Energy Eng., Tempe, Arizona, USA
| | - Julie Liss
- Arizona State Univ., Sch. of Elect., Comput., & Energy Eng., Tempe, Arizona, USA
| | - Visar Berisha
- Arizona State Univ., Sch. of Elect., Comput., & Energy Eng., Tempe, Arizona, USA
| |
Collapse
|
6
|
de Boer G, Marino V, Berti L, Fabron E, Spazzapan EA, Bressmann T. Influence of Altered Auditory Feedback on Oral-Nasal Balance in Speakers of Brazilian Portuguese. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3752-3762. [PMID: 31639320 DOI: 10.1044/2019_jslhr-s-18-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose This study explored the role of auditory feedback in the regulation of oral-nasal balance in speakers of Brazilian Portuguese. Method Twenty typical speakers of Brazilian Portuguese (10 male, 10 female) wore a Nasometer headset and headphones while continuously repeating stimuli with oral and nasal sounds. Oral-nasal balance was quantified with nasalance scores. The signals from 2 additional oral and nasal microphones were played back to the participants through the headphones. The relative loudness of the nasal channel in the mix was gradually changed, so that the speakers heard themselves as more or less nasal. Results A repeated-measures analysis of variance of the mean nasalance scores of the stimuli at baseline, minimum, and maximum nasal feedback conditions demonstrated significant effects of nasal feedback condition (p < .0001) and stimuli (p < .0001). Post hoc analyses demonstrated that the mean nasalance scores were lowest for the maximum nasal feedback condition. The scores of the minimum nasal feedback condition were significantly higher than 2 of 3 baseline feedback conditions. The speaking amplitude of the participants did not change between the nasal feedback conditions. Conclusions Increased nasal signal level feedback led to a compensatory adjustment in the opposite direction, confirming that oral-nasal balance is regulated by auditory feedback. However, reduced nasal signal level feedback resulted in a compensatory response that was lower in magnitude. This suggests that, even in Brazilian Portuguese, a language with phonetic and phonological vowel nasalization, decreased nasality was not perceived as critically as increased nasality by the speakers.
Collapse
Affiliation(s)
- Gillian de Boer
- Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
| | - Viviane Marino
- Department of Fonoaudiologia, Universidade Estadual Paulista, Marília, Brazil
| | - Larissa Berti
- Department of Fonoaudiologia, Universidade Estadual Paulista, Marília, Brazil
| | - Eliana Fabron
- Department of Fonoaudiologia, Universidade Estadual Paulista, Marília, Brazil
| | | | - Tim Bressmann
- Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
| |
Collapse
|
7
|
Reby D, Wyman MT, Frey R, Charlton BD, Dalmont JP, Gilbert J. Vocal tract modelling in fallow deer: are male groans nasalized? ACTA ACUST UNITED AC 2018; 221:jeb.179416. [PMID: 29941611 DOI: 10.1242/jeb.179416] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 06/21/2018] [Indexed: 11/20/2022]
Abstract
Males of several species of deer have a descended and mobile larynx, resulting in an unusually long vocal tract, which can be further extended by lowering the larynx during call production. Formant frequencies are lowered as the vocal tract is extended, as predicted when approximating the vocal tract as a uniform quarter wavelength resonator. However, formant frequencies in polygynous deer follow uneven distribution patterns, indicating that the vocal tract configuration may in fact be rather complex. We CT-scanned the head and neck region of two adult male fallow deer specimens with artificially extended vocal tracts and measured the cross-sectional areas of the supra-laryngeal vocal tract along the oral and nasal tracts. The CT data were then used to predict the resonances produced by three possible configurations, including the oral vocal tract only, the nasal vocal tract only, or combining the two. We found that the area functions from the combined oral and nasal vocal tracts produced resonances more closely matching the formant pattern and scaling observed in fallow deer groans than those predicted by the area functions of the oral vocal tract only or of the nasal vocal tract only. This indicates that the nasal and oral vocal tracts are both simultaneously involved in the production of a non-human mammal vocalization, and suggests that the potential for nasalization in putative oral loud calls should be carefully considered.
Collapse
Affiliation(s)
- D Reby
- School of Psychology, University of Sussex, Falmer, Brighton BN1 9QH, UK
| | - M T Wyman
- School of Psychology, University of Sussex, Falmer, Brighton BN1 9QH, UK.,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - R Frey
- Department of Reproduction Management, Leibniz Institute for Zoo and Wildlife Research (IZW), 10315 Berlin, Germany
| | - B D Charlton
- San Diego Zoo's Institute for Conservation Research, Escondido 92027, CA, USA
| | - J P Dalmont
- Laboratoire d'Acoustique de l'Université du Mans, CNRS, 72085 le Mans, France
| | - J Gilbert
- Laboratoire d'Acoustique de l'Université du Mans, CNRS, 72085 le Mans, France
| |
Collapse
|
8
|
Kent RD, Vorperian HK. Static measurements of vowel formant frequencies and bandwidths: A review. JOURNAL OF COMMUNICATION DISORDERS 2018; 74:74-97. [PMID: 29891085 PMCID: PMC6002811 DOI: 10.1016/j.jcomdis.2018.05.004] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/23/2018] [Accepted: 05/27/2018] [Indexed: 05/05/2023]
Abstract
PURPOSE Data on vowel formants have been derived primarily from static measures representing an assumed steady state. This review summarizes data on formant frequencies and bandwidths for American English and also addresses (a) sources of variability (focusing on speech sample and time sampling point), and (b) methods of data reduction such as vowel area and dispersion. METHOD Searches were conducted with CINAHL, Google Scholar, MEDLINE/PubMed, SCOPUS, and other online sources including legacy articles and references. The primary search items were vowels, vowel space area, vowel dispersion, formants, formant frequency, and formant bandwidth. RESULTS Data on formant frequencies and bandwidths are available for both sexes over the lifespan, but considerable variability in results across studies affects even features of the basic vowel quadrilateral. Origins of variability likely include differences in speech sample and time sampling point. The data reveal the emergence of sex differences by 4 years of age, maturational reductions in formant bandwidth, and decreased formant frequencies with advancing age in some persons. It appears that a combination of methods of data reduction provide for optimal data interpretation. CONCLUSION The lifespan database on vowel formants shows considerable variability within specific age-sex groups, pointing to the need for standardized procedures.
Collapse
Affiliation(s)
- Raymond D Kent
- Waisman Center, University of Wisconsin-Madison, United States.
| | | |
Collapse
|
9
|
Story BH, Vorperian HK, Bunton K, Durtschi RB. An age-dependent vocal tract model for males and females based on anatomic measurements. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3079. [PMID: 29857736 PMCID: PMC5966313 DOI: 10.1121/1.5038264] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 04/29/2018] [Accepted: 05/01/2018] [Indexed: 05/29/2023]
Abstract
The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Houri K Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Reid B Durtschi
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| |
Collapse
|
10
|
Rong P, Kuehn DP, Shosted RK. Modeling of oropharyngeal articulatory adaptation to compensate for the acoustic effects of nasalization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2145. [PMID: 27914422 DOI: 10.1121/1.4963065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Hypernasality is one of the most detrimental speech disturbances that lead to declines of speech intelligibility. Velopharyngeal inadequacy, which is associated with anatomic defects such as cleft palate or neuromuscular disorders that affect velopharygneal function, is the primary cause of hypernasality. A simulation study by Rong and Kuehn [J. Speech Lang. Hear. Res. 55(5), 1438-1448 (2012)] demonstrated that properly adjusted oropharyngeal articulation can reduce nasality for vowels synthesized with an articulatory model [Mermelstein, J. Acoust. Soc. Am. 53(4), 1070-1082 (1973)]. In this study, a speaker-adaptive articulatory model was developed to simulate speaker-customized oropharyngeal articulatory adaptation to compensate for the acoustic effects of nasalization on /a/, /i/, and /u/. The results demonstrated that (1) the oropharyngeal articulatory adaptation effectively counteracted the effects of nasalization on the second lowest formant frequency (F2) and partially compensated for the effects of nasalization on vowel space (e.g., shifting and constriction of vowel space) and (2) the articulatory adaptation strategies generated by the speaker-adaptive model might be more efficacious for counteracting the acoustic effects of nasalization compared to the adaptation strategies generated by the standard articulatory model in Rong and Kuehn. The findings of this study indicated the potential of using oropharyngeal articulatory adaptation as a means to correct maladaptive articulatory behaviors and to reduce nasality.
Collapse
Affiliation(s)
- Panying Rong
- Department of Speech-Language-Hearing: Sciences and Disorders, University of Kansas, Lawrence, Kansas 66045, USA
| | - David P Kuehn
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Ryan K Shosted
- Department of Linguistics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
11
|
Story BH, Bunton K. Formant measurement in children's speech based on spectral filtering. SPEECH COMMUNICATION 2016; 76:93-111. [PMID: 26855461 PMCID: PMC4743040 DOI: 10.1016/j.specom.2015.11.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Children's speech presents a challenging problem for formant frequency measurement. In part, this is because high fundamental frequencies, typical of a children's speech production, generate widely spaced harmonic components that may undersample the spectral shape of the vocal tract transfer function. In addition, there is often a weakening of upper harmonic energy and a noise component due to glottal turbulence. The purpose of this study was to develop a formant measurement technique based on cepstral analysis that does not require modification of the cepstrum itself or transformation back to the spectral domain. Instead, a narrow-band spectrum is low-pass filtered with a cutoff point (i.e., cutoff "quefrency" in the terminology of cepstral analysis) to preserve only the spectral envelope. To test the method, speech representative of a 2-3 year-old child was simulated with an airway modulation model of speech production. The model, which includes physiologically-scaled vocal folds and vocal tract, generates sound output analogous to a microphone signal. The vocal tract resonance frequencies can be calculated independently of the output signal and thus provide test cases that allow for assessing the accuracy of the formant tracking algorithm. When applied to the simulated child-like speech, the spectral filtering approach was shown to provide a clear spectrographic representation of formant change over the time course of the signal, and facilitates tracking formant frequencies for further analysis.
Collapse
Affiliation(s)
- Brad H. Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, AZ 85721
| | - Kate Bunton
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, AZ 85721
| |
Collapse
|
12
|
Kreuzer W, Kasess CH. Tuning of vocal tract model parameters for nasals using sensitivity functions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:1021-1031. [PMID: 25698033 DOI: 10.1121/1.4906158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Determining the cross-sectional areas of the vocal tract models from the linear predictive coding or autoregressive-moving-average analysis of speech signals from vowels has been of research interest for several decades now. To tune the shape of the vocal tract to given sets of formant frequencies, iterative methods using sensitivity functions have been developed. In this paper, the idea of sensitivity functions is expanded to a three-tube model used in connection with nasals, and the energy-based sensitivity function is compared with a Jacobian-based sensitivity function for the branched-tube model. It is shown that the difference between both functions is negligible if the sensitivity is taken with respect to the formant frequency only. Results for an iterative tuning a three-tube vocal tract model based on the sensitivity functions for a nasal (/m/) are given. It is shown that besides the polar angle, the absolute value of the poles and zeros of the rational transfer function also needs to be considered in the tuning process. To test the effectiveness of the iterative solver, the steepest descent method is compared with the Gauss-Newton method. It is shown, that the Gauss-Newton method converges faster if a good starting value for the iteration is given.
Collapse
Affiliation(s)
- W Kreuzer
- Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, A-1040 Vienna, Austria
| | - C H Kasess
- Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, A-1040 Vienna, Austria
| |
Collapse
|
13
|
Bettens K, Wuyts FL, Van Lierde KM. Instrumental assessment of velopharyngeal function and resonance: a review. JOURNAL OF COMMUNICATION DISORDERS 2014; 52:170-183. [PMID: 24909583 DOI: 10.1016/j.jcomdis.2014.05.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 04/14/2014] [Accepted: 05/16/2014] [Indexed: 06/03/2023]
Abstract
UNLABELLED The purpose of this literature review is to describe and discuss instrumental assessment techniques of the velopharyngeal function in order to diagnose velopharyngeal disorders and resonance characteristics. Both direct and indirect assessment techniques are addressed, in which successively nasopharyngoscopy, videofluoroscopy, magnetic resonance imaging (MRI), cephalometric radiographic analysis, computed tomography (CT), ultrasound, acoustic and aerodynamic measurements are considered. Despite the multiple instrumental assessments available to detect and define velopharyngeal dysfunction, the ideal technique is not yet accessible. Therefore, a combination of different quantitative parameters can possibly form a solution for a more reliable determination of resonance disorders. These multi-dimensional approaches will be described and discussed. The combination of quantitative measurement techniques and perceptual evaluation of nasality will probably remain necessary to provide sufficient information to make appropriate decisions concerning the diagnosis and treatment of resonance disorders. LEARNING OUTCOMES The reader will be able to describe and discuss currently available instrumental techniques to assess the velopharyngeal mechanism and its functioning in order to diagnose velopharyngeal disorders. Additionally, he will be able to explain the possible advantages of the combination of several types of complementary measurement techniques.
Collapse
Affiliation(s)
- Kim Bettens
- Department of Speech, Language and Hearing Sciences, Ghent University, Ghent, Belgium.
| | - Floris L Wuyts
- Department of Speech, Language and Hearing Sciences, Ghent University, Ghent, Belgium; Biomedical Physics, University of Antwerp, Antwerp, Belgium
| | | |
Collapse
|
14
|
Havel M, Hofmann G, Mürbe D, Sundberg J. Contribution of paranasal sinuses to the acoustic properties of the nasal tract. Folia Phoniatr Logop 2014; 66:109-14. [PMID: 25342046 DOI: 10.1159/000363501] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The contribution of the nasal and paranasal cavities to the vocal tract resonator properties is unclear. Here we investigate these resonance phenomena of the sinonasal tract in isolation in a cadaver and compare the results with those gained in a simplified brass tube model. METHODS The resonance characteristics were measured as the response to sine sweep excitation from an earphone. In the brass model the earphone was placed at the closed end and in the cadaver in the epipharynx. The response was picked up by a microphone placed at the open end of the model and at the nostrils, respectively. A shunting cavity with varied volumes was connected to the model and the effects on the response curve were determined. In the cadaver, different conditions with blocked and unblocked middle meatus and sphenoidal ostium were tested. Additionally, infundibulotomy was performed allowing direct access to and selective occlusion of the maxillary ostium. RESULTS In both the brass model and the cadaver, a baseline condition with no cavities included produced response curves with clear resonance peaks separated by valleys. Marked dips occurred when shunting cavities were attached to the model. The frequencies of these dips decreased with increasing shunting volume. In the cadaver, a marked dip was observed after removing the unilateral occlusion of the middle meatus and the sphenoidal ostium. Another marked dip was detected at low frequency after removal of the occlusion of the maxillary ostium following infundibulotomy. CONCLUSION Combining measurements on a simplified nasal model with measurements in a cadaveric sinonasal tract seems a promising method for shedding light on the acoustic properties of the nasal resonator.
Collapse
Affiliation(s)
- Miriam Havel
- Department of Otorhinolaryngology, Head and Neck Surgery, Section Phoniatrics, University of Munich, Munich, Germany
| | | | | | | |
Collapse
|
15
|
Story BH. Phrase-level speech simulation with an airway modulation model of speech production. COMPUT SPEECH LANG 2013; 27:989-1010. [PMID: 23503742 DOI: 10.1016/j.csl.2012.10.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Dept. of Speech, Language, and Hearing Sciences, University of Arizona, 1131 E. 2nd St., P.O. Box 210071, Tucson, AZ, 85721, United States
| |
Collapse
|
16
|
Shosted R, Hualde JI, Scarpace D. Palatal complexity revisited: an electropalatographic analysis of /see text for symbol/ in Brazilian Portuguese with comparison to Peninsular Spanish. LANGUAGE AND SPEECH 2012; 55:477-502. [PMID: 23420979 DOI: 10.1177/0023830911434120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Are palatal consonants articulated by multiple tongue gestures (coronal and dorsal) or by a single gesture that brings the tongue into contact with the palate at several places of articulation? The lenition of palatal consonants (resulting in approximants) has been presented as evidence that palatals are simple, not complex: When reduced, they do not lose their coronal gesture and become dorsals; instead, they manifest reduced linguopalatal contact while retaining their anterior place of articulation. The frequently-reported deocclusivization of the Brazilian Portuguese (BP) palatal nasal may support this claim. However, the linguopalatal configuration of this sound has not been studied directly. Electropalatographic evidence from three speakers of BP (compared with data from three speakers of Peninsular Spanish) demonstrates that the palatal nasal is frequently realized as an approximant. There is no evidence of anterior occlusion in BP's post-palatal, lenited nasal. Under conditions of focus/hyperarticulation, there is no evidence of stronger/more anterior occlusion. We argue that the articulatory target of the BP palatal nasal is neither occluded nor anterior.
Collapse
Affiliation(s)
- Ryan Shosted
- University of Illinois at Urbana-Champaign, Department of Linguistics, 4080 Foreign Languages Building, 707 S. Mathews Avenue, MC-168, Urbana, Illinois 61801, USA.
| | | | | |
Collapse
|
17
|
Rong P, Kuehn D. The effect of articulatory adjustment on reducing hypernasality. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:1438-1448. [PMID: 22411285 DOI: 10.1044/1092-4388(2012/11-0142)] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE With the goal of using articulatory adjustments to reduce hypernasality, this study utilized an articulatory synthesis model (Childers, 2000) to simulate the adjustment of articulatory configurations with an open velopharynx to achieve the same acoustic goal as normal speech simulated with a closed velopharynx. METHOD To examine the effect of articulatory adjustment on perceived nasality, this study used an articulatory synthesis model (Childers, 2000) to synthesize 18 oral /i/ vowels, 18 nasal /i/ vowels, and 18 nasal /i/ vowels with computer-generated articulatory adjustments; these vowels were then presented to 7 listeners for perceptual ratings of nasality following the direct magnitude estimation method. RESULTS Comparisons of nasality ratings of nasal vowels showed a significant reduction of perceived nasality after articulatory adjustment. Moreover, the acoustic features associated with nasal resonances were attenuated and the oral formant structures changed by nasalization were restored after articulatory adjustment, which confirmed findings in Rong and Kuehn (2010). CONCLUSION Appropriate articulatory adjustments are able to reduce the nasality of synthetic nasal /i/ vowels by compensating for the acoustic deviations caused by excessive velopharyngeal opening. Such compensatory interarticulator coordination may have an application in using articulatory adjustments to reduce hypernasality in clinical speech therapies.
Collapse
Affiliation(s)
- Panying Rong
- University of Illinois at Urbana–Champaign, USA.
| | | |
Collapse
|
18
|
Mehta DD, Rudoy D, Wolfe PJ. Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1732-46. [PMID: 22978900 DOI: 10.1121/1.4739462] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a statistically principled state-space framework. Extended Kalman (K) algorithms take advantage of a linearized mapping to infer formant and antiformant parameters from frame-based estimates of autoregressive moving average (ARMA) cepstral coefficients. Error analysis of KARMA, wavesurfer, and praat is accomplished in the all-pole case using a manually marked formant database and synthesized speech waveforms. KARMA formant tracks exhibit lower overall root-mean-square error relative to the two benchmark algorithms with the ability to modify parameters in a controlled manner to trade off bias and variance. Antiformant tracking performance of KARMA is illustrated using synthesized and spoken nasal phonemes. The simultaneous tracking of uncertainty levels enables practitioners to recognize time-varying confidence in parameters of interest and adjust algorithmic settings accordingly.
Collapse
Affiliation(s)
- Daryush D Mehta
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | | | |
Collapse
|
19
|
Shosted R, Carignan C, Rong P. Managing the distinctiveness of phonemic nasal vowels: articulatory evidence from Hindi. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:455-465. [PMID: 22280607 DOI: 10.1121/1.3665998] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
There is increasing evidence that fine articulatory adjustments are made by speakers to reinforce and sometimes counteract the acoustic consequences of nasality. However, it is difficult to attribute the acoustic changes in nasal vowel spectra to either oral cavity configuration or to velopharyngeal opening (VPO). This paper takes the position that it is possible to disambiguate the effects of VPO and oropharyngeal configuration on the acoustic output of the vocal tract by studying the position and movement of the tongue and lips during the production of oral and nasal vowels. This paper uses simultaneously collected articulatory, acoustic, and nasal airflow data during the production of all oral and phonemically nasal vowels in Hindi (four speakers) to understand the consequences of the movements of oral articulators on the spectra of nasal vowels. For Hindi nasal vowels, the tongue body is generally lowered for back vowels, fronted for low vowels, and raised for front vowels (with respect to their oral congeners). These movements are generally supported by accompanying changes in the vowel spectra. In Hindi, the lowering of back nasal vowels may have originally served to enhance the acoustic salience of nasality, but has since engendered a nasal vowel chain shift.
Collapse
Affiliation(s)
- Ryan Shosted
- Department of Linguistics, University of Illinois at Urbana-Champaign, 4080 Foreign Languages Building, 707 South Mathews Avenue, Urbana, Illinois 61801, USA.
| | | | | |
Collapse
|
20
|
Bae Y, Kuehn DP, Conway CA, Sutton BP. Real-time magnetic resonance imaging of velopharyngeal activities with simultaneous speech recordings. Cleft Palate Craniofac J 2010; 48:695-707. [PMID: 21214321 DOI: 10.1597/09-158] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
OBJECTIVE To examine the relationships between acoustic and physiologic aspects of the velopharyngeal mechanism during acoustically nasalized segments of speech in normal individuals by combining fast magnetic resonance imaging (MRI) with simultaneous speech recordings and subsequent acoustic analyses. DESIGN Ten normal Caucasian adult individuals participated in the study. Midsagittal dynamic magnetic resonance imaging (MRI) and simultaneous speech recordings were performed while participants were producing repetitions of two rate-controlled nonsense syllables including /zanaza/ and /zunuzu/. Acoustic features of nasalization represented as the peak amplitude and the bandwidth of the first resonant frequency (F1) were derived from speech at the rate of 30 sets per second. Physiologic information was based on velar and tongue positional changes measured from the dynamic MRI data, which were acquired at a rate of 21.4 images per second and resampled with a corresponding rate of 30 images per second. Each acoustic feature of nasalization was regressed on gender, vowel context, and velar and tongue positional variables. RESULTS Acoustic features of nasalization represented by F1 peak amplitude and bandwidth changes were significantly influenced by the vowel context surrounding the nasal consonant, velar elevated position, and tongue height at the tip. CONCLUSIONS Fast MRI combined with acoustic analysis was successfully applied to the investigation of acoustic-physiologic relationships of the velopharyngeal mechanism with the type of speech samples employed in the present study. Future applications are feasible to examine how anatomic and physiologic deviations of the velopharyngeal mechanism would be acoustically manifested in individuals with velopharyngeal incompetence.
Collapse
|
21
|
Riede T, Goller F. Peripheral mechanisms for vocal production in birds - differences and similarities to human speech and singing. BRAIN AND LANGUAGE 2010; 115:69-80. [PMID: 20153887 PMCID: PMC2896990 DOI: 10.1016/j.bandl.2009.11.003] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Accepted: 11/03/2009] [Indexed: 05/28/2023]
Abstract
Song production in songbirds is a model system for studying learned vocal behavior. As in humans, bird phonation involves three main motor systems (respiration, vocal organ and vocal tract). The avian respiratory mechanism uses pressure regulation in air sacs to ventilate a rigid lung. In songbirds sound is generated with two independently controlled sound sources, which reside in a uniquely avian vocal organ, the syrinx. However, the physical sound generation mechanism in the syrinx shows strong analogies to that in the human larynx, such that both can be characterized as myoelastic-aerodynamic sound sources. Similarities include active adduction and abduction, oscillating tissue masses which modulate flow rate through the organ and a layered structure of the oscillating tissue masses giving rise to complex viscoelastic properties. Differences in the functional morphology of the sound producing system between birds and humans require specific motor control patterns. The songbird vocal apparatus is adapted for high speed, suggesting that temporal patterns and fast modulation of sound features are important in acoustic communication. Rapid respiratory patterns determine the coarse temporal structure of song and maintain gas exchange even during very long songs. The respiratory system also contributes to the fine control of airflow. Muscular control of the vocal organ regulates airflow and acoustic features. The upper vocal tract of birds filters the sounds generated in the syrinx, and filter properties are actively adjusted. Nonlinear source-filter interactions may also play a role. The unique morphology and biomechanical system for sound production in birds presents an interesting model for exploring parallels in control mechanisms that give rise to highly convergent physical patterns of sound generation. More comparative work should provide a rich source for our understanding of the evolution of complex sound producing systems.
Collapse
Affiliation(s)
- Tobias Riede
- Department of Biology and National Center for Voice and Speech, University of Utah, Salt Lake City, 84112, USA
| | | |
Collapse
|
22
|
Rong P, Kuehn DP. The effect of oral articulation on the acoustic characteristics of nasalized vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:2543-2553. [PMID: 20370036 DOI: 10.1121/1.3294486] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
To study the acoustic characteristics of nasalized vowels, the effects of velopharyngeal opening and oral articulation are considered. Based on vocal tract area functions for one American English speaker, spectral evolutions for the nasalization of three English vowels /a/, /i/, and /u/ were studied by simulating transfer functions for vowels with only velar movement, and for different nasal consonant-vowel utterances, which include both velar and oral movements. Simulations indicate extra nasal spectral poles and zeros and oral formant shifts as a result of the velopharyngeal opening and oral movements, respectively. In this sense, if oral articulation is coordinated with velar movement in such a way that nasal acoustic features are prominently attenuated, corresponding compensatory articulation can be developed to reduce hypernasality. This may be realized by (1) adjusting the articulatory placement for isolated nasalized vowels or by (2) changing the relative timing of coarticulatory movements for dynamic speech. The results demonstrate the effect of oral articulation on the acoustics of nasalized vowels. This effect allows oral articulation to compensate for velopharyngeal dysfunction, which may involve a constellation of speech production disorders resulting from anomalous velopharyngeal closure and which is usually accompanied by hypernasality and nasal emission of air.
Collapse
Affiliation(s)
- Panying Rong
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820, USA
| | | |
Collapse
|
23
|
Maier A, Hönig F, Bocklet T, Nöth E, Stelzle F, Nkenke E, Schuster M. Automatic detection of articulation disorders in children with cleft lip and palate. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2589-2602. [PMID: 19894838 DOI: 10.1121/1.3216913] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical and nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time and manpower. Hence, there is a need for an easy to apply and reliable automatic method. To create a reference for an automatic system, speech data of 58 children with CLP were assessed perceptually by experienced speech therapists for characteristic phonetic disorders at the phoneme level. The first part of the article aims to detect such characteristics by a semiautomatic procedure and the second to evaluate a fully automatic, thus simple, procedure. The methods are based on a combination of speech processing algorithms. The semiautomatic method achieves moderate to good agreement (kappa approximately 0.6) for the detection of all phonetic disorders. On a speaker level, significant correlations between the perceptual evaluation and the automatic system of 0.89 are obtained. The fully automatic system yields a correlation on the speaker level of 0.81 to the perceptual evaluation. This correlation is in the range of the inter-rater correlation of the listeners. The automatic speech evaluation is able to detect phonetic disorders at an experts'level without any additional human postprocessing.
Collapse
Affiliation(s)
- Andreas Maier
- Lehrstuhl fur Informatik 5, Universitat Erlangen-Nurnberg, 91058 Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|