1
|
Liu Y, Islam J, Radford K, Tkachman O, Gick B. Tonguedness in speech: Lateral bias in lingual bracing. JASA Express Lett 2024; 4:025203. [PMID: 38341684 PMCID: PMC10848656 DOI: 10.1121/10.0024756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/18/2024] [Indexed: 02/13/2024]
Abstract
This study examines the lateral biases in tongue movements during speech production. It builds on previous research on asymmetry in various aspects of human biology and behavior, focusing on the tongue's asymmetric behavior during speech. The findings reveal that speakers have a pronounced preference toward one side of the tongue during lateral releases with a majority displaying the left-side bias. This lateral bias in tongue speech movements is referred to as tonguedness. This research contributes to our understanding of the articulatory mechanisms involved in tongue movements and underscores the importance of considering lateral biases in speech production research.
Collapse
Affiliation(s)
- Yadong Liu
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Jahurul Islam
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Kate Radford
- California Institute of Technology, Pasadena, California 91125, USA
| | - Oksana Tkachman
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Bryan Gick
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
- Haskins Laboratories, New Haven, Connecticut 06511, , , , ,
| |
Collapse
|
2
|
Aoyama K, Hong L, Flege JE, Akahane-Yamada R, Yamada T. Relationships Between Acoustic Characteristics and Intelligibility Scores: A Reanalysis of Japanese Speakers' Productions of American English Liquids. Lang Speech 2023; 66:1030-1045. [PMID: 36680472 DOI: 10.1177/00238309221140910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The primary purpose of this research report was to investigate the relationships between acoustic characteristics and perceived intelligibility for native Japanese speakers' productions of American English liquids. This report was based on a reanalysis of intelligibility scores and acoustic analyses that were reported in two previous studies. We examined which acoustic parameters were associated with higher perceived intelligibility scores for their productions of /l/ and /ɹ/ in American English, and whether Japanese speakers' productions of the two liquids were acoustically differentiated from each other. Results demonstrated that the second formant (F2) was strongly correlated with the perceived intelligibility scores for the Japanese adults' productions. Results also demonstrated that the Japanese adults' and children's productions of /l/ and /ɹ/ were indeed differentiated by some acoustic parameters including the third formant (F3). In addition, some changes occurred in the Japanese children's productions over the course of 1 year. Overall, the present report shows that Japanese speakers of American English may be making a distinction between /l/ and /ɹ/ in production, although the distinctions are made in a different way compared with native English speakers' productions. These findings have implications for setting realistic goals for improving intelligibility of English /l/ and /ɹ/ for Japanese speakers, as well as theoretical advancement of second-language speech learning.
Collapse
Affiliation(s)
- Katsura Aoyama
- Department of Audiology & Speech-Language Pathology, University of North Texas, USA
| | - Lingzi Hong
- Department of Information Science, University of North Texas, USA
| | - James E Flege
- Speech and Hearing Sciences, University of Alabama at Birmingham, USA
| | | | - Tsuneo Yamada
- Department of Informatics, The Open University of Japan, Japan
| |
Collapse
|
3
|
Colantoni L, Kochetov A, Steele J. Articulatory Insights into the L2 Acquisition of English-/l/ Allophony. Lang Speech 2023:238309231200629. [PMID: 38031458 DOI: 10.1177/00238309231200629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
In many English varieties, /l/ is produced differently in onsets and codas. Compared with "light" syllable-initial realizations, "dark" syllable-final variants involve reduced tongue tip-alveolar ridge contact and a raised/retracted tongue dorsum. We investigate whether native French and Spanish speakers whose L1 lacks such positionally conditioned variation can acquire English-/l/ allophony, testing the hypotheses that (1) the allophonic pattern will be acquired by both groups but (2) learners will differ from native speakers in their phonetic implementation, particularly in codas; and (3) French-speaking learners will outperform their Spanish-speaking counterparts. The production of syllable-initial and -final /l/ (singletons and clusters) in words read in isolation and a carrier sentence by 4 French- and 3 Spanish-speaking learners as well as three native English speakers was analyzed via electropalatography and acoustic analysis. While some learners produced distinct onset and coda variants and all learners had moved away to some extent from their L1 production, they differed from the native speakers in certain ways. Moreover, between- and within-group variability was observed including greater target-like anterior and posterior contact reduction in codas in the L1 French versus L1 Spanish group and generally higher F2 values in both learner groups compared with their native speaker peers. A comparison of the learners' L1 and L2 production revealed L1-based patterns of positional reduction of the tongue tip and dorsum gestures. We conclude by addressing the contributions of EPG to our understanding of L2 speech and highlight avenues for future research including the study of both linguistic and speaker variables.
Collapse
|
4
|
Weismer G. Oromotor Nonverbal Performance and Speech Motor Control: Theory and Review of Empirical Evidence. Brain Sci 2023; 13:brainsci13050768. [PMID: 37239240 DOI: 10.3390/brainsci13050768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/20/2023] [Accepted: 04/27/2023] [Indexed: 05/28/2023] Open
Abstract
This position paper offers a perspective on the long-standing debate concerning the role of oromotor, nonverbal gestures in understanding typical and disordered speech motor control secondary to neurological disease. Oromotor nonverbal tasks are employed routinely in clinical and research settings, but a coherent rationale for their use is needed. The use of oromotor nonverbal performance to diagnose disease or dysarthria type, versus specific aspects of speech production deficits that contribute to loss of speech intelligibility, is argued to be an important part of the debate. Framing these issues are two models of speech motor control, the Integrative Model (IM) and Task-Dependent Model (TDM), which yield contrasting predictions of the relationship between oromotor nonverbal performance and speech motor control. Theoretical and empirical literature on task specificity in limb, hand, and eye motor control is reviewed to demonstrate its relevance to speech motor control. The IM rejects task specificity in speech motor control, whereas the TDM is defined by it. The theoretical claim of the IM proponents that the TDM requires a special, dedicated neural mechanism for speech production is rejected. Based on theoretical and empirical information, the utility of oromotor nonverbal tasks as a window into speech motor control is questionable.
Collapse
Affiliation(s)
- Gary Weismer
- Department of Communication Sciences & Disorders, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
5
|
Moore S, Rong P. Articulatory Underpinnings of Reduced Acoustic-Phonetic Contrasts in Individuals With Amyotrophic Lateral Sclerosis. Am J Speech Lang Pathol 2022; 31:2022-2044. [PMID: 35973111 DOI: 10.1044/2022_ajslp-22-00046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE The aim of this study is to identify the articulatory underpinnings of the acoustic-phonetic correlates of functional speech decline in individuals with amyotrophic lateral sclerosis (ALS). METHOD Thirteen individuals with varying severities of speech impairment secondary to ALS and 10 neurologically healthy controls speakers read 12 minimal word pairs, targeting the contrasts in the height, advancement, and length of vowels; the manner and place of articulation for consonants and consonant cluster; and liquid and glide approximants, 5 times. Sixteen acoustic features were extracted to characterize the phonetic contrasts of these minimal word pairs. These acoustic features were correlated with a functional speech index-intelligible speaking rate-using penalized regression, based on which the contributive features were identified as the acoustic-phonetic correlates of the functional speech outcome. Articulatory contrasts of the minimal word pairs were characterized by a set of dissimilarity indices derived by the dynamic time warping algorithm, which measured the differences in the displacement and velocity trajectories of tongue tip, tongue dorsum, lower lip, and jaw between the minimal word pairs. The contributive articulatory features to the acoustic-phonetic correlates were identified by penalized regression. RESULTS A variety of acoustic-phonetic features were identified as contributing to the functional speech outcome, of which the contrasts in vowel height and advancement, [r]-[l], [r]-[w], and initial cluster-singleton were the most affected in individuals with ALS. Differential articulatory underpinnings were identified for these acoustic-phonetic features. Impairments of these articulatory underpinnings, especially of tongue tip and tongue dorsum velocities and tongue tip displacement, were associated with reduced acoustic-phonetic contrasts of the minimal word pairs, in a context-specific manner. CONCLUSION The findings established explanatory relationships between articulatory impairment and the acoustic-phonetic profile of functional speech decline in ALS, providing useful information for developing targeted management strategies to improve and prolong functional speech in individuals with ALS.
Collapse
Affiliation(s)
- Sophie Moore
- Department of Speech-Language-Hearing: Sciences & Disorders, The University of Kansas, Lawrence
| | - Panying Rong
- Department of Speech-Language-Hearing: Sciences & Disorders, The University of Kansas, Lawrence
| |
Collapse
|
6
|
Chung H. Acoustic Characteristics of Pre- and Post-vocalic /l/: Patterns from One Southern White Vernacular English. Lang Speech 2022; 65:513-528. [PMID: 34396801 DOI: 10.1177/00238309211037368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This study examined acoustic characteristics of the phoneme /l/ produced by young female and male adult speakers of Southern White Vernacular English (SWVE) from Louisiana. F1, F2, and F2-F1 values extracted at the /l/ midpoint were analyzed by word position (pre- vs. post-vocalic) and vowel contexts (/i, ɪ/ vs. /ɔ, a/). Descriptive analysis showed that SWVE /l/ exhibited characteristics of the dark /l/ variant. The formant patterns of /l/, however, differed significantly by word position and vowel context, with pre-vocalic /l/ showing significantly higher F2-F1 values than post-vocalic /l/, and /l/ in the high front vowel context showing significantly higher F2-F1 values than those in the low back vowel context. Individual variation in the effects of word position and vowel contexts on /l/ pattern was also observed. Overall, the findings of the current study showed a gradient nature of SWVE /l/ variants whose F2-F1 patterns generally fell into the range of the dark /l/ variant, while varying by word position and vowel context.
Collapse
|
7
|
Coniglio EA, Chung H, Schellinger SK. Perception of Children's Productions of /l/: Acoustic Correlates and Effects of Listener Experience. Folia Phoniatr Logop 2022; 74:392-406. [PMID: 35367979 DOI: 10.1159/000524395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 03/27/2022] [Indexed: 01/10/2023] Open
Abstract
INTRODUCTION The aim of the current study was to examine the effect of listeners' experience with child speech and phonetic training on perceptual judgment of children's word-initial /l/ productions. The acoustic correlates of acceptable and misarticulated productions of /l/ and their relation to listeners' experience with child speech were explored. METHODS Three listener groups listened to children's word-initial /l/ productions embedded in monosyllabic words and judged the "/l/-likeness" of the productions using a Visual Analog Scale (VAS). Three listener groups included (a) speech-language pathologists with at least 10 years of experience (SLP group), (b) graduate students in speech-language pathology (GS group), and (c) naïve listeners with no clinical phonetics experience (NL group). Acoustic correlates (both static and dynamic measures) of listeners' perception of /l/ sounds were also investigated. RESULTS While mean VAS ratings did not differ significantly by listener group, the SLP group used a wider range of the VAS than the GS and NL groups. Correlational analysis between the static measure (F2-F1 values) and mean listener ratings showed that listeners tend to perceive sounds with the highest F2-F1 values more as /j/ than /l/, while those with the lowest F2-F1 value were perceived more as /w/ than /l/, especially for sounds that are in between phonemic categories. Listener ratings were not highly correlated with dynamic measures. CONCLUSION These results suggest that experienced listeners use the VAS more continuously than less experienced listeners to indicate perception of subphonemic features of children's productions of /l/, and that their ratings correlate with acoustic measures. Furthermore, listeners with experience with child speech and phonetic training are more sensitive to subphonemic features of children's productions of /l/, especially for misarticulated productions. This supports the clinical use of VAS for perceptual judgments of children's /l/ productions.
Collapse
Affiliation(s)
- Emily Ann Coniglio
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Hyunju Chung
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Sarah Kenney Schellinger
- Department of Communication Sciences and Disorders, Saint Xavier University, Chicago, Illinois, USA
| |
Collapse
|
8
|
Kim Y, Chung H, Thompson A. Acoustic and Articulatory Characteristics of English Semivowels /ɹ, l, w/ Produced by Adult Second-Language Speakers. J Speech Lang Hear Res 2022; 65:890-905. [PMID: 35104414 DOI: 10.1044/2021_jslhr-21-00152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE This study presents the results of acoustic and kinematic analyses of word-initial semivowels (/ɹ, l, w/) produced by second-language (L2) speakers of English whose native language is Korean. In addition, the relationship of acoustic and kinematic measures to the ratings of foreign accent was examined by correlation analyses. METHOD Eleven L2 speakers and 10 native speakers (first language [L1]) of English read The Caterpillar passage. Acoustic and kinematic data were simultaneously recorded using an electromagnetic articulography system. In addition to speaking rate, two acoustic measures (ratio of third-formant [F3] frequency to second-formant [F2] frequency and duration of steady states of F2) and two kinematic measures (lip aperture and duration of lingual maximum hold) were obtained from individual target sounds. To examine the degree of contrast among the three sounds, acoustic and kinematic Euclidean distances were computed on the F2-F3 and x-y planes, respectively. RESULTS Compared with L1 speakers, L2 speakers exhibited a significantly slower speaking rate. For the three semivowels, L2 speakers showed a reduced F3/F2 ratio during constriction, increased lip aperture, and reduced acoustic Euclidean distances among semivowels. Additionally, perceptual ratings of foreign accent were significantly correlated with three measures: duration of steady F2, acoustic Euclidean distance, and kinematic Euclidean distance. CONCLUSIONS The findings provide acoustic and kinematic evidence for challenges that L2 speakers experience in the production of English semivowels, especially /ɹ/ and /w/. The robust and consistent finding of reduced contrasts among semivowels and their correlations with perceptual accent ratings suggests using sound contrasts as a potentially effective approach to accent modification paradigms.
Collapse
Affiliation(s)
- Yunjung Kim
- School of Communication Science & Disorders, Florida State University, Tallahassee
| | - Hyunju Chung
- Department of Communication Sciences & Disorders, Louisiana State University, Baton Rouge
| | - Austin Thompson
- School of Communication Science & Disorders, Florida State University, Tallahassee
| |
Collapse
|
9
|
Abstract
Purpose Most acoustic and articulatory studies on /l/ have focused on either duration, formant frequencies, or tongue shape during the constriction interval. Only a limited set of data exists for the transition characteristics of /l/ to and from surrounding vowels. The aim of this study was to examine second formant (F2) transition characteristics of /l/ produced by young children and adults. This was to better understand articulatory behaviors in the production of /l/ and potential clinical applications of these data to typical and delayed /l/ development. Method Participants included 17 children with typically developing speech between the ages of 2 and 5 years, and 10 female adult speakers of Southern American English. Each subject produced single words containing pre- and postvocalic /l/ in two vowel contexts (/i, ɪ/ and /ɔ, ɑ/). F2 transitions, out of and into /l/ constriction intervals from the adjacent vowels, were analyzed for perceptually acceptable /l/ productions. The F2 transition extent, duration, and rate, as well as F2 loci data, were compared across age groups by vowel context for both pre- and postvocalic /l/. Results F2 transitions of adults' /l/ showed a great similarity across and within speakers. Those of young children showed greater variability, but became increasingly similar to those of adults with age. The F2 loci data seemed consistent with greater coarticulation among children than adults. This conclusion, however, must be regarded as preliminary due to the possible influence of different vocal tract size across ages and variability in the data. Conclusions The results suggest that adult patterns can serve as a reliable reference to which children's /l/ productions can be evaluated. The articulatory configurations associated with the /l/ constriction interval and the vocal tract movements into and out of that interval may provide insight into the underlying difficulties related to misarticulated /l/.
Collapse
Affiliation(s)
- Hyunju Chung
- Department of Communication Sciences & Disorders, Louisiana State University, Baton Rouge
| | - Gary Weismer
- Department of Communication Sciences & Disorders, University of Wisconsin-Madison
| |
Collapse
|
10
|
Tabain M, Kochetov A, Beare R. An ultrasound and formant study of manner contrasts at four coronal places of articulation. J Acoust Soc Am 2020; 148:3195. [PMID: 33261411 DOI: 10.1121/10.0002486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]
Abstract
This study examines consonant manner of articulation at four coronal places of articulation, using ultrasound and formant analyses of the Australian language Arrernte. Stop, nasal, and lateral articulations are examined at the dental, alveolar, retroflex, and alveo-palatal places of articulation: /t̪ n̪ l̪ / vs /t n l/ vs /ʈɳɭ/ vs /c ɲ ʎ/. Ultrasound data clearly show a more retracted tongue root for the lateral, and a more advanced tongue root for the nasal, as compared to the stop. However, the magnitude of the differences is much greater for the stop∼lateral contrast than for the stop∼nasal contrast. Acoustic results show clear effects on F1 in the adjacent vowels, in particular the preceding vowel, with F1 lower adjacent to nasals and higher adjacent to laterals, as compared to stops. Correlations between the articulatory and acoustic data are particularly strong for this formant. However, the retroflex place of articulation shows effects according to manner for higher formants as well, suggesting that a better understanding of retroflex acoustics for different manners of articulation is required. The study also suggests that articulatory symmetry and gestural economy are affected by the size of the phonemic inventory.
Collapse
|
11
|
Abstract
Purpose The aim of the current study was to examine /l/ developmental patterns in young learners of Southern American English, especially in relation to the effect of word position and phonetic contexts. Method Eighteen children with typically developing speech, aged between 2 and 5 years, produced monosyllabic single words containing singleton /l/ in different word positions (pre- vs. postvocalic /l/) across different vowel contexts (high front vs. low back) and cluster /l/ in different consonant contexts (/pl, bl/ vs. /kl, gl/). Each production was analyzed for its accuracy and acoustic patterns as measured by the first two formant frequencies and their difference (F1, F2, and F2-F1). Results There was great individual variability in /l/ acquisition patterns, with some 2- and 3-year-olds reaching 100% accuracy for prevocalic /l/, while others were below 70%. Overall, accuracy of prevocalic /l/ was higher than that of postvocalic /l/. Acoustic patterns of pre- and postvocalic /l/ showed greater differences in younger children and less apparent differences in 5-year-olds. There were no statistically significant differences between the acoustic patterns of /l/ coded as perceptually acceptable and those coded as misarticulated. There was also no apparent effect of vowel and consonant contexts on /l/ patterns. Conclusion The accuracy patterns of this study suggest an earlier development of /l/, especially prevocalic /l/, than has been reported in previous studies. The differences in acoustic patterns between pre- and postvocalic /l/, which become less apparent with age, may suggest that children alter the way they articulate /l/ with age. No significant acoustic differences between acceptable and misarticulated /l/, especially postvocalic /l/, suggest a gradient nature of /l/ that is dialect specific. This suggests the need for careful consideration of a child's dialect/language background when studying /l/.
Collapse
Affiliation(s)
- Hyunju Chung
- Department of Communication Sciences & Disorders, Louisiana State University, Baton Rouge
| |
Collapse
|
12
|
Abstract
Kalasha, a Northwestern Indo-Aryan language spoken in a remote mountainous region of Pakistan, is relatively unusual among languages of the region as it has lateral approximants contrasting in secondary articulation-velarization and palatalization (/ɫ/ vs /lʲ/). Given the paucity of previous phonetic work on the language and some discrepancies between descriptive accounts, the nature of the Kalasha lateral contrast remains poorly understood. This paper presents an analysis of fieldwork recordings with laterals produced by 14 Kalasha speakers in a variety of lexical items and phonetic contexts. Acoustic analysis of formants measured during the lateral closure revealed that the contrast was most clearly distinguished by F2 (as well as by F2-F1 difference), which was considerably higher for /lʲ/ than for /ɫ/. This confirms that the two laterals are primarily distinguished by secondary articulation and not by retroflexion, which is otherwise robustly represented in the language inventory. The laterals showed no positional differences but did show considerable fronting (higher F2) next to front vowels. Some inter-speaker variation was observed in the realization of /ɫ/, which was produced with little or no velarization by older speakers. This is indicative of a change in progress, resulting in an overall enhancement of an otherwise auditorily vulnerable contrast.
Collapse
Affiliation(s)
- Alexei Kochetov
- Department of Linguistics, University of Toronto, 100 Saint George Street, Toronto, Ontario M5S 3G3, Canada
| | - Jan Heegård Petersen
- Department of Nordic Studies and Linguistics, University of Copenhagen, Emil Holms Kanal 2, 2300 København S, Denmark
| | - Paul Arsenault
- Department of Linguistics, Tyndale University, 3377 Bayview Avenue, Toronto, Ontario, M2M 3S4, Canada
| |
Collapse
|
13
|
Charles S, Lulich SM. Articulatory-acoustic relations in the production of alveolar and palatal lateral sounds in Brazilian Portuguese. J Acoust Soc Am 2019; 145:3269. [PMID: 31255144 DOI: 10.1121/1.5109565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 05/09/2019] [Indexed: 06/09/2023]
Abstract
Lateral approximant speech sounds are notoriously difficult to measure and describe due to their complex articulation and acoustics. This has prevented researchers from reaching a unifying description of the articulatory and acoustic characteristics of laterals. This paper examines articulatory and acoustic properties of Brazilian Portuguese alveolar and palatal lateral approximants (/l/ and /ʎ/) produced by six native speakers. The methodology for obtaining vocal tract area functions was based on three-dimensional/four-dimensional (3D/4D) ultrasound recordings and 3D digitized palatal impressions with simultaneously recorded audio signals. Area functions were used to calculate transfer function spectra, and predicted formant and anti-resonance frequencies were compared with the acoustic recordings. Mean absolute error in formant frequency prediction was 4% with a Pearson correlation of r = 0.987. Findings suggest anti-resonances from the interdental channels are less important than a prominent anti-resonance from the supralingual cavity but can become important in asymmetrical articulations. The use of 3D/4D ultrasound to study articulatory-acoustic relations is promising, but significant limitations remain and future work is needed to make better use of 3D/4D ultrasound data, e.g., by combining it with magnetic resonance imaging.
Collapse
Affiliation(s)
- Sherman Charles
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA
| | - Steven M Lulich
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA
| |
Collapse
|
14
|
|
15
|
Abstract
In this paper, we present a production study to explore the controversial question about /l/ velarisation. Measurements of first (F1), second (F2) and third (F3) formant frequencies and the slope of F2 were analysed to clarify the /l/ velarisation behaviour in European Portuguese (EP). The acoustic data were collected from ten EP speakers, producing trisyllabic words with paroxytone stress pattern, with the liquid consonant at the middle of the word in onset, complex onset and coda positions. Results suggested that /l/ is produced on a continuum in EP. The consistently low F2 indicates that /l/ is velarised in all syllable positions, but variation especially in F1 and F3 revealed that /l/ could be “more velarised” or “less velarised” dependent on syllable positions and vowel contexts. These findings suggest that it is important to consider different acoustic measures to better understand /l/ velarisation in EP.
Collapse
Affiliation(s)
- Susana Rodrigues
- School of Health Sciences, University of Algarve (ESSUAlg), Faro, Portugal
- CLUL, School of Arts and Humanities, University of Lisbon, Lisbon, Portugal
| | - Fernando Martins
- CLUL, School of Arts and Humanities, University of Lisbon, Lisbon, Portugal
| | - Susana Silva
- Neurocognition and Language Research Group, Center for Psychology at University of Porto, Faculty of Psychology and Education Science at University of Porto, Porto, Portugal
| | - Luis M. T. Jesus
- School of Health Sciences (ESSUA), University of Aveiro, Aveiro, Portugal
- Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
- * E-mail:
| |
Collapse
|
16
|
Kirkham S, Nance C, Littlewood B, Lightfoot K, Groarke E. Dialect variation in formant dynamics: The acoustics of lateral and vowel sequences in Manchester and Liverpool English. J Acoust Soc Am 2019; 145:784. [PMID: 30823785 DOI: 10.1121/1.5089886] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 01/18/2019] [Indexed: 06/09/2023]
Abstract
This study analyses the time-varying acoustics of laterals and their adjacent vowels in Manchester and Liverpool English. Generalized additive mixed-models (GAMMs) are used for quantifying time-varying formant data, which allows the modelling of non-linearities in acoustic time series while simultaneously modelling speaker and word level variability in the data. These models are compared to single time-point analyses of lateral and vowel targets in order to determine what analysing formant dynamics can tell about dialect variation in speech acoustics. The results show that lateral targets exhibit robust differences between some positional contexts and also between dialects, with smaller differences present in vowel targets. The time-varying analysis shows that dialect differences frequently occur globally across the lateral and adjacent vowels. These results suggest a complex relationship between lateral and vowel targets and their coarticulatory dynamics, which problematizes straightforward claims about the realization of laterals and their adjacent vowels. These findings are further discussed in terms of hypotheses about positional and sociophonetic variation. In doing so, the utility of GAMMs for analysing time-varying multi-segmental acoustic signals is demonstrated, and the significance of the results for accounts of English lateral typology is highlighted.
Collapse
Affiliation(s)
- Sam Kirkham
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster LA1 4YL, United Kingdom
| | - Claire Nance
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster LA1 4YL, United Kingdom
| | - Bethany Littlewood
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster LA1 4YL, United Kingdom
| | - Kate Lightfoot
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster LA1 4YL, United Kingdom
| | - Eve Groarke
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster LA1 4YL, United Kingdom
| |
Collapse
|
17
|
Kochetov A, Tabain M, Sreedevi N, Beare R. Manner and place differences in Kannada coronal consonants: Articulatory and acoustic results. J Acoust Soc Am 2018; 144:3221. [PMID: 30599639 DOI: 10.1121/1.5081686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 11/12/2018] [Indexed: 06/09/2023]
Abstract
This study investigated articulatory differences in the realization of Kannada coronal consonants of the same place but different manner of articulation. This was done by examining tongue positions and acoustic formant transitions for dentals and retroflexes of three manners of articulation: stops, nasals, and laterals. Ultrasound imaging data collected from ten speakers of the language revealed that the tongue body/root was more forward for the nasal manner of articulation compared to stop and lateral consonants of the same place of articulation. The dental nasal and lateral were also produced with a higher front part of the tongue compared to the dental stop. As a result, the place contrast was greater in magnitude for the stops (being the prototypical dental vs retroflex) than for the nasals and laterals (being apparently alveolar vs retroflex). Acoustic formant transition differences were found to reflect some of the articulatory differences, while also providing evidence for the more dynamic articulation of nasal and lateral retroflexes. Overall, the results of the study shed light on factors underlying manner requirements (aerodynamic or physiological) and how the factors interact with principles of gestural economy/symmetry, providing an empirical baseline for further cross-language investigations and articulation-to-acoustics modeling.
Collapse
Affiliation(s)
- Alexei Kochetov
- Department of Linguistics, University of Toronto, 100 Saint George Street, Toronto, Ontario, M5S 3G3, Canada
| | - Marija Tabain
- Department of Languages and Linguistics, Latrobe University, Melbourne, Australia
| | - N Sreedevi
- Clinical Services, All India Institute of Speech and Hearing, Mysore, Karnataka, India
| | - Richard Beare
- Monash University, Murdoch Children's Research Institute, Melbourne, Australia
| |
Collapse
|
18
|
Lim Y, Zhu Y, Lingala SG, Byrd D, Narayanan S, Nayak KS. 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med 2018; 81:1511-1520. [PMID: 30390319 DOI: 10.1002/mrm.27570] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 09/25/2018] [Accepted: 09/26/2018] [Indexed: 12/19/2022]
Abstract
PURPOSE To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech. METHODS We demonstrate 2.4 × 2.4 × 5.8 mm3 spatial resolution, 61-ms temporal resolution, and a 200 × 200 × 70 mm3 FOV. The proposed method uses 3D gradient-echo imaging with a custom upper-airway coil, a minimum-phase slab excitation, stack-of-spirals readout, pseudo golden-angle view order in kx -ky , linear Cartesian order along kz , and spatiotemporal finite difference constrained reconstruction, with 13-fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI. RESULTS This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants "s" and "l" and of consonant-vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume-of-interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels. CONCLUSION We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant-vowel syllables, without requiring multiple repetitions.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Yinghua Zhu
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Sajan Goud Lingala
- Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, Iowa
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Krishna Shrinivas Nayak
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| |
Collapse
|
19
|
Hewer A, Wuhrer S, Steiner I, Richmond K. A multilinear tongue model derived from speech related MRI data of the human vocal tract. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2018.02.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Story BH, Vorperian HK, Bunton K, Durtschi RB. An age-dependent vocal tract model for males and females based on anatomic measurements. J Acoust Soc Am 2018; 143:3079. [PMID: 29857736 PMCID: PMC5966313 DOI: 10.1121/1.5038264] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 04/29/2018] [Accepted: 05/01/2018] [Indexed: 05/29/2023]
Abstract
The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Houri K Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Reid B Durtschi
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| |
Collapse
|
21
|
De Decker P, Mackenzie S. Tracking the phonological status of /l/ in Newfoundland English: Experiments in articulation and acoustics. J Acoust Soc Am 2017; 142:350. [PMID: 28764433 DOI: 10.1121/1.4991349] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper investigates patterning of /l/ in Newfoundland English. Using acoustic and ultrasound methods, the reported displacement of the traditional Irish pattern of word-final light /l/ is assessed. Acoustic results show darker /l/'s in word-final position in both phrases and compounds. Although the standard allophonic pattern is widespread in Newfoundland English, dialectal variation arising from early settlement patterns continues to influence speech patterns with less distinction between initial and final /l/ in Irish-settled areas. Men show relatively less distinction between initial and final /l/, consistent with sociolinguistic patterns in which men retain local variants. Last, light /l/ in final position may be resurfacing among younger speakers. Ultrasound imaging also shows variable rates of distinction between word-final and initial /l/, but without significant main effects of region or gender. Articulatory analysis reveals a small effect of age, with older speakers being less likely to have significant differences in articulation across positions. An interaction between region and gender shows males from an Irish-settled community are less likely to employ distinct lingual shapes across positions. While some articulatory findings complement the acoustic results, it is suggested that differences between these domains result from lateralization or other aspects of articulation not captured in ultrasound imaging.
Collapse
Affiliation(s)
- Paul De Decker
- Department of Linguistics, Memorial University of Newfoundland, St. John's, Newfoundland, A1B 3X9, Canada
| | - Sara Mackenzie
- Department of Linguistics, Memorial University of Newfoundland, St. John's, Newfoundland, A1B 3X9, Canada
| |
Collapse
|
22
|
Töger J, Sorensen T, Somandepalli K, Toutios A, Lingala SG, Narayanan S, Nayak K. Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. J Acoust Soc Am 2017; 141:3323. [PMID: 28599561 PMCID: PMC5436977 DOI: 10.1121/1.4983081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Static anatomical and real-time dynamic magnetic resonance imaging (RT-MRI) of the upper airway is a valuable method for studying speech production in research and clinical settings. The test-retest repeatability of quantitative imaging biomarkers is an important parameter, since it limits the effect sizes and intragroup differences that can be studied. Therefore, this study aims to present a framework for determining the test-retest repeatability of quantitative speech biomarkers from static MRI and RT-MRI, and apply the framework to healthy volunteers. Subjects (n = 8, 4 females, 4 males) are imaged in two scans on the same day, including static images and dynamic RT-MRI of speech tasks. The inter-study agreement is quantified using intraclass correlation coefficient (ICC) and mean within-subject standard deviation (σe). Inter-study agreement is strong to very strong for static measures (ICC: min/median/max 0.71/0.89/0.98, σe: 0.90/2.20/6.72 mm), poor to strong for dynamic RT-MRI measures of articulator motion range (ICC: 0.26/0.75/0.90, σe: 1.6/2.5/3.6 mm), and poor to very strong for velocities (ICC: 0.21/0.56/0.93, σe: 2.2/4.4/16.7 cm/s). In conclusion, this study characterizes repeatability of static and dynamic MRI-derived speech biomarkers using state-of-the-art imaging. The introduced framework can be used to guide future development of speech biomarkers. Test-retest MRI data are provided free for research use.
Collapse
Affiliation(s)
- Johannes Töger
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Tanner Sorensen
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Krishna Somandepalli
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Krishna Nayak
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| |
Collapse
|
23
|
Recasens D, Rodríguez C. Lingual Articulation and Coarticulation for Catalan Consonants and Vowels: An Ultrasound Study. Phonetica 2017; 74:125-156. [PMID: 28268220 DOI: 10.1159/000452475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 10/05/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND The study investigates the tongue position and coarticulatory characteristics of a subset of Catalan consonants and vowels using ultrasound. METHOD Ultrasound data were recorded and analyzed for the Catalan front lingual consonants /t, d, n, l, ɾ, s, r, ʎ, ɲ, ʃ/ and vowels /i, e, a, o, u/ in symmetrical VCV sequences produced by 5 adult Catalan speakers. RESULTS Among other aspects, data show more tongue body fronting for palatal consonants and, among dentals and alveolars, for laminals than for apicals; the manner of articulation demands account for considerable tongue body retraction and predorsum lowering during the trill /r/ and for some tongue body retraction during /l/ next to front vowels. Vowel and consonant coarticulation occurs mostly in lingual regions which are not primarily involved in closure or constriction formation. Differences in the relative prominence of the anticipatory and carryover consonant-to-vowel effects in tongue body position were found to hold clearly for /r/ in all vowel contexts and for palatal consonants next to /a, o, u/. CONCLUSIONS Place-dependent and manner-dependent articulatory characteristics for consonants and vowels account for the most relevant coarticulatory effects and may contribute to explain several sound change patterns.
Collapse
Affiliation(s)
- Daniel Recasens
- Departament de Filolologia Catalana, Universitat Autònoma de Barcelona, and Institut d'Estudis Catalans, Barcelona, Spain
| | | |
Collapse
|
24
|
Gick B, Allen B, Roewer-Després F, Stavness I. Speaking Tongues Are Actively Braced. J Speech Lang Hear Res 2017; 60:494-506. [PMID: 28196377 DOI: 10.1044/2016_jslhr-s-15-0141] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 05/22/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE Bracing of the tongue against opposing vocal-tract surfaces such as the teeth or palate has long been discussed in the context of biomechanical, somatosensory, and aeroacoustic aspects of tongue movement. However, previous studies have tended to describe bracing only in terms of contact (rather than mechanical support), and only in limited phonetic contexts, supporting a widespread view of bracing as an occasional state, peculiar to specific sounds or sound combinations. METHOD The present study tests the pervasiveness and effortfulness of tongue bracing in continuous English speech passages using electropalatography and 3-D biomechanical simulations. RESULTS The tongue remains in continuous contact with the upper molars during speech, with only rare exceptions. Use of the term bracing (rather than merely contact) is supported here by biomechanical simulations showing that lateral bracing is an active posture requiring dedicated muscle activation; further, loss of lateral contact for onset /l/ allophones is found to be consistently accompanied by contact of the tongue blade against the anterior palate. In the rare instances where direct evidence for contact is lacking (only in a minority of low vowel and postvocalic /l/ tokens), additional biomechanical simulations show that lateral contact is maintained against pharyngeal structures dorsal to the teeth. CONCLUSION Taken together, these results indicate that tongue bracing is both pervasive and active in running speech and essential in understanding tongue movement control.
Collapse
Affiliation(s)
- Bryan Gick
- Department of Linguistics, University of British Columbia, Vancouver, Canada
| | - Blake Allen
- Department of Linguistics, University of British Columbia, Vancouver, Canada
| | | | - Ian Stavness
- Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
25
|
Katz WF, Mehta S, Wood M, Wang J. Using electromagnetic articulography with a tongue lateral sensor to discriminate manner of articulation. J Acoust Soc Am 2017; 141:EL57. [PMID: 28147568 PMCID: PMC5724616 DOI: 10.1121/1.4973907] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Revised: 12/09/2016] [Accepted: 12/12/2016] [Indexed: 06/06/2023]
Abstract
This study examined the contributions of the tongue tip (TT), tongue body (TB), and tongue lateral (TL) sensors in the electromagnetic articulography (EMA) measurement of American English alveolar consonants. Thirteen adults produced /ɹ/, /l/, /z/, and /d/ in /ɑCɑ/ syllables while being recorded with an EMA system. According to statistical analysis of sensor movement and the results of a machine classification experiment, the TT sensor contributed most to consonant differences, followed by TB. The TL sensor played a complementary role, particularly for distinguishing /z/.
Collapse
Affiliation(s)
- William F Katz
- Department of Communication Sciences and Disorders, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080-3021, USA , ,
| | - Sonya Mehta
- Department of Communication Sciences and Disorders, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080-3021, USA , ,
| | - Matthew Wood
- Department of Communication Sciences and Disorders, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080-3021, USA , ,
| | - Jun Wang
- Department of Communication Sciences and Disorders, Department of Bioengineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080-3021, USA
| |
Collapse
|
26
|
Sturman HW, Baker-Smemoe W, Carreño S, Miller BB. Learning the Marshallese Phonological System: The Role of Cross-language Similarity on the Perception and Production of Secondary Articulations. Lang Speech 2016; 59:462-487. [PMID: 28008802 DOI: 10.1177/0023830915614603] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The current study determines the influence of cross-language similarity on native English speakers' perception and production of Marshallese consonant contrasts. Marshallese provides a unique opportunity to study this influence because all Marshallese consonants have a secondary articulation. Results of discrimination and production tasks indicate that learners more easily acquire sounds if they are perceptually less similar to native language phonemes. In addition, the degree of cross-language similarity seemed to affect perception and production and may also interact with the effect of orthography.
Collapse
Affiliation(s)
| | - Wendy Baker-Smemoe
- Department of Linguistics and English Language, Brigham Young University, USA
| | - Sofía Carreño
- Department of Linguistics and English Language, Brigham Young University, USA
| | - Bradley B Miller
- Department of Linguistics and English Language, Brigham Young University, USA
| |
Collapse
|
27
|
Prasad A, Ghosh PK. Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition. COMPUT SPEECH LANG 2016. [DOI: 10.1016/j.csl.2016.03.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
28
|
Toutios A, Narayanan SS. Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA Trans Signal Inf Process 2016; 5:e6. [PMID: 27833745 PMCID: PMC5100697 DOI: 10.1017/atsip.2016.5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Collapse
Affiliation(s)
- Asterios Toutios
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| |
Collapse
|
29
|
Tabain M, Butcher A, Breen G, Beare R. An acoustic study of multiple lateral consonants in three Central Australian languages. J Acoust Soc Am 2016; 139:361-372. [PMID: 26827031 DOI: 10.1121/1.4937751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This study presents dental, alveolar, retroflex, and palatal lateral /̪ll ɭ ʎ/ data from three Central Australian languages: Arrernte, Pitjantjatjara, and Warlpiri. Formant results show that the laminal laterals (dental /̪l/ and palatal /ʎ/) have a relatively low F1, presumably due to a high jaw position for these sounds, as well as higher F4. In addition, the palatal /ʎ/ has very high F2. There is relatively little difference in F3 between the four lateral places of articulation. However, the retroflex /ɭ/ appears to have slightly lower F3 and F4 in comparison to the other lateral sounds. Importantly, spectral moment analyses suggest that centre of gravity and standard deviation (first and second spectral moments) are sufficient to characterize the four places of articulation. The retroflex has a concentration of energy at slightly lower frequencies than the alveolar, while the palatal has a concentration of energy at higher frequencies. The dental is characterized by a more even spread of energy. These various results are discussed in light of different acoustic models of lateral production, and the possibility of spectral cues to place of articulation across manners of articulation is considered.
Collapse
Affiliation(s)
- Marija Tabain
- Linguistics, Latrobe University, Melbourne, Australia
| | | | - Gavan Breen
- Institute for Aboriginal Development, Alice Springs, Australia
| | - Richard Beare
- Monash University, and Murdoch Children's Research Institute, Melbourne, Australia
| |
Collapse
|
30
|
Abstract
Magnetic Resonance Imaging (MRI) plays an increasing role in the study of speech. This article reviews the MRI literature of anatomical imaging, imaging for acoustic modelling and dynamic imaging. It describes existing imaging techniques attempting to meet the challenges of imaging the upper airway during speech and examines the remaining hurdles and future research directions.
Collapse
Affiliation(s)
- Andrew D Scott
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; NIHR Cardiovascular Biomedical Research Unit, The Royal Brompton Hospital, Sydney Street, London SW3 6NP, United Kingdom
| | - Marzena Wylezinska
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom
| | - Malcolm J Birch
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom
| | - Marc E Miquel
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom.
| |
Collapse
|
31
|
Perkell JS. Five decades of research in speech motor control: what have we learned, and where should we go from here? J Speech Lang Hear Res 2013; 56:S1857-S1874. [PMID: 24687442 DOI: 10.1044/1092-4388(2013/12-0382)] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE The author presents a view of research in speech motor control over the past 5 decades, as observed from within Ken Stevens's Speech Communication Group (SCG) in the Research Laboratory of Electronics at MIT. METHOD The author presents a limited overview of some important developments and discoveries. The perspective is based largely on the research interests of the Speech Motor Control Group (SMCG) within the SCG; thus, it is selective, focusing on normal motor control of the vocal tract in the production of sound segments and syllables. It also covers the particular theories and models that drove the research. Following a brief introduction, there are sections on methodological advances, scientific advances, and conclusions. RESULTS Scientific and methodological advances have been closely interrelated. Advances in instrumentation and computer hardware and software have made it possible to record and process increasingly large, multifaceted data sets; introduce new paradigms for feedback perturbation; image brain activity; and develop more sophisticated, computational physiological and neural models. Such approaches have led to increased understanding of the widespread variability in speech, motor-equivalent trading relations, sensory goals, and the nature of feedback and feedforward neural control mechanisms. CONCLUSIONS Some ideas about important future directions for speech research are presented.
Collapse
|
32
|
Idemaru K, Holt LL. The developmental trajectory of children's perception and production of English /r/-/l/. J Acoust Soc Am 2013; 133:4232-46. [PMID: 23742374 PMCID: PMC3689790 DOI: 10.1121/1.4802905] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Revised: 02/07/2013] [Accepted: 04/05/2013] [Indexed: 05/23/2023]
Abstract
The English /l-r/ distinction is difficult to learn for some second language learners as well as for native-speaking children. This study examines the use of the second (F2) and third (F3) formants in the production and perception of /l/ and /r/ sounds in 4-, 4.5-, 5.5-, and 8.5-yr-old English-speaking children. The children were tested with elicitation and repetition tasks as well as word recognition tasks. The results indicate that whereas young children's /l/ and /r/ in both production and perception show fairly high accuracy and were well defined along the primary acoustic parameter that differentiates them, F3 frequency, these children were still developing in regard to the integration of the secondary cue, F2 frequency. The pattern of development is consistent with the distribution of these features in the ambient input relative to the /l/ and /r/ category distinction: F3 is robust and reliable, whereas F2 is less reliable in distinguishing /l/ and /r/. With delayed development of F2, cue weighting of F3 and F2 for the English /l-r/ categorization seems to continue to develop beyond 8 or 9 yr of age. These data are consistent with a rather long trajectory of phonetic development whereby native categories are refined and tuned well into childhood.
Collapse
Affiliation(s)
- Kaori Idemaru
- Department of East Asian Languages and Literatures, University of Oregon, Eugene, Oregon 97405, USA.
| | | |
Collapse
|
33
|
Zhou X, Woo J, Stone M, Prince JL, Espy-Wilson CY. Improved vocal tract reconstruction and modeling using an image super-resolution technique. J Acoust Soc Am 2013; 133:EL439-45. [PMID: 23742437 PMCID: PMC3656922 DOI: 10.1121/1.4802903] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Magnetic resonance imaging has been widely used in speech production research. Often only one image stack (sagittal, axial, or coronal) is used for vocal tract modeling. As a result, complementary information from other available stacks is not utilized. To overcome this, a recently developed super-resolution technique was applied to integrate three orthogonal low-resolution stacks into one isotropic volume. The results on vowels show that the super-resolution volume produces better vocal tract visualization than any of the low-resolution stacks. Its derived area functions generally produce formant predictions closer to the ground truth, particularly for those formants sensitive to area perturbations at constrictions.
Collapse
Affiliation(s)
- Xinhui Zhou
- Speech Communication Laboratory, Institute of Systems Research and Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA.
| | | | | | | | | |
Collapse
|
34
|
Abstract
Noninvasive imaging is widely used in speech research as a means to investigate the shaping and dynamics of the vocal tract during speech production. 3-D dynamic MRI would be a major advance, as it would provide 3-D dynamic visualization of the entire vocal tract. We present a novel method for the creation of 3-D dynamic movies of vocal tract shaping based on the acquisition of 2-D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2-D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /a.ιa/, /asa/, and /a∫a/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The aligned image sequences enable dynamic 3-D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. The resulting volumes allow for dynamic 3-D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.
Collapse
Affiliation(s)
- Yinghua Zhu
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA.
| | | | | | | | | |
Collapse
|
35
|
Abstract
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
Collapse
Affiliation(s)
- Peter Birkholz
- Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
36
|
Kim YC, Proctor MI, Narayanan SS, Nayak KS. Improved imaging of lingual articulation using real-time multislice MRI. J Magn Reson Imaging 2011; 35:943-8. [PMID: 22127935 DOI: 10.1002/jmri.23510] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 10/24/2011] [Indexed: 11/09/2022] Open
Abstract
PURPOSE To develop a real-time imaging technique that allows for simultaneous visualization of vocal tract shaping in multiple scan planes, and provides dynamic visualization of complex articulatory features. MATERIALS AND METHODS Simultaneous imaging of multiple slices was implemented using a custom real-time imaging platform. Midsagittal, coronal, and axial scan planes of the human upper airway were prescribed and imaged in real-time using a fast spiral gradient-echo pulse sequence. Two native speakers of English produced voiceless and voiced fricatives /f/-/v/, /θ/-/ð/, /s/-/z/, /∫/- in symmetrical maximally contrastive vocalic contexts /a_a/, /i_i/, and /u_u/. Vocal tract videos were synchronized with noise-cancelled audio recordings, facilitating the selection of frames associated with production of English fricatives. RESULTS Coronal slices intersecting the postalveolar region of the vocal tract revealed tongue grooving to be most pronounced during fricative production in back vowel contexts, and more pronounced for sibilants /s/-/z/ than for /∫/-. The axial slice best revealed differences in dorsal and pharyngeal articulation; voiced fricatives were observed to be produced with a larger cross-sectional area in the pharyngeal airway. Partial saturation of spins provided accurate location of imaging planes with respect to each other. CONCLUSION Real-time MRI of multiple intersecting slices can provide valuable spatial and temporal information about vocal tract shaping, including details not observable from a single slice.
Collapse
Affiliation(s)
- Yoon-Chul Kim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
| | | | | | | |
Collapse
|
37
|
Abstract
We tested the hypothesis that rapid shadowers imitate the articulatory gestures that structure acoustic speech signals-not just acoustic patterns in the signals themselves-overcoming highly practiced motor routines and phonological conditioning in the process. In a first experiment, acoustic evidence indicated that participants reproduced allophonic differences between American English /l/ types (light and dark) in the absence of the positional variation cues more typically present with lateral allophony. However, imitative effects were small. In a second experiment, varieties of /l/ with exaggerated light/dark differences were presented by ear. Acoustic measures indicated that all participants reproduced differences between /l/ types; larger average imitative effects obtained. Finally, we examined evidence for imitation in articulation. Participants ranged in behavior from one who did not imitate to another who reproduced distinctions among light laterals, dark laterals and /w/, but displayed a slight but inconsistent tendency toward enhancing imitation of lingual gestures through a slight lip protrusion. Overall, results indicated that most rapid shadowers need not substitute familiar allophones as they imitate reorganized gestural constellations even in the absence of explicit instruction to imitate, but that the extent of the imitation is small. Implications for theories of speech perception are discussed.
Collapse
Affiliation(s)
- Douglas N. Honorof
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
| | - Jeffrey Weihing
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
- Department of Communication Sciences, University of Connecticut, Storrs, CT 06269, USA
| | - Carol A. Fowler
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
- Department of Psychology, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
38
|
Bae Y, Kuehn DP, Conway CA, Sutton BP. Real-time magnetic resonance imaging of velopharyngeal activities with simultaneous speech recordings. Cleft Palate Craniofac J 2010; 48:695-707. [PMID: 21214321 DOI: 10.1597/09-158] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
OBJECTIVE To examine the relationships between acoustic and physiologic aspects of the velopharyngeal mechanism during acoustically nasalized segments of speech in normal individuals by combining fast magnetic resonance imaging (MRI) with simultaneous speech recordings and subsequent acoustic analyses. DESIGN Ten normal Caucasian adult individuals participated in the study. Midsagittal dynamic magnetic resonance imaging (MRI) and simultaneous speech recordings were performed while participants were producing repetitions of two rate-controlled nonsense syllables including /zanaza/ and /zunuzu/. Acoustic features of nasalization represented as the peak amplitude and the bandwidth of the first resonant frequency (F1) were derived from speech at the rate of 30 sets per second. Physiologic information was based on velar and tongue positional changes measured from the dynamic MRI data, which were acquired at a rate of 21.4 images per second and resampled with a corresponding rate of 30 images per second. Each acoustic feature of nasalization was regressed on gender, vowel context, and velar and tongue positional variables. RESULTS Acoustic features of nasalization represented by F1 peak amplitude and bandwidth changes were significantly influenced by the vowel context surrounding the nasal consonant, velar elevated position, and tongue height at the tip. CONCLUSIONS Fast MRI combined with acoustic analysis was successfully applied to the investigation of acoustic-physiologic relationships of the velopharyngeal mechanism with the type of speech samples employed in the present study. Future applications are feasible to examine how anatomic and physiologic deviations of the velopharyngeal mechanism would be acoustically manifested in individuals with velopharyngeal incompetence.
Collapse
|
39
|
Stone M, Liu X, Chen H, Prince JL. A preliminary application of principal components and cluster analysis to internal tongue deformation patterns. Comput Methods Biomech Biomed Engin 2010; 13:493-503. [PMID: 20635265 DOI: 10.1080/10255842.2010.484809] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Complex patterns of muscle contractions create gross tongue motion during speech. It is of scientific and medical importance to better understand speech motor strategies and variations due to language or disorders. Dense patterns of tongue motion can be imaged using tagged magnetic resonance imaging, but characterisation of motion strategies is difficult using visualisation alone. This paper explores the use of principal component analysis for dimensionality reduction and cluster analysis for tongue motion categorisation. Velocity fields were acquired and analysed from midsagittal tongue slices during motion from /i/ to /u/ for eight datasets containing multiple languages and a glossectomy patient. The analyses were carried out on the tongue-only and tongue-plus-floor of the mouth regions. The results showed that both the analyses were sensitive to region size and that cluster analysis was harder to interpret. Both the analyses grouped the Japanese speaker with the glossectomy patient, which although explicable with biologically plausible reasons, highlights the limitations of extensive data reduction.
Collapse
Affiliation(s)
- Maureen Stone
- Department of Neural and Pain Sciences, Department of Orthodontics, University of Maryland Dental School, Baltimore, MD 21201, USA.
| | | | | | | |
Collapse
|
40
|
Inohara K, Sumita YI, Ohbayashi N, Ino S, Kurabayashi T, Ifukube T, Taniguchi H. Standardization of Thresholding for Binary Conversion of Vocal Tract Modeling in Computed Tomography. J Voice 2010; 24:503-9. [DOI: 10.1016/j.jvoice.2008.10.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Accepted: 10/31/2008] [Indexed: 10/20/2022]
|
41
|
Abstract
This perceptual study describes changes in how listeners perceive VCV elements within successive truncations taken from an iambic phrase containing /l/ (e.g. a leaf, or a load) spoken by four male speakers of General American English. Evidence of the respective roles of dorsal gestural affiliation between /l/ and the reduced vowel, (V(1)CV(2)), and gestural separation from a tautosyllabic high front vowel (V(2)) were demonstrated. Coproduction of dark-l with a preceding reduced vowel was evident in early reports of back vowels or diphthongs, particularly when the carrier word contained a front vowel, and was noted more in darker-l than lighter-l speakers. The pairing of /l/ with a tautosyllabic front vowel reduced earlier identification of /l/, whereas its pairing with a back vowel enhanced early identification. The role of perceived contrast in identification of /l/ was reflected in changes in listener's perception of the reduced vowel across successive truncations. Clinical implications are addressed.
Collapse
|
42
|
Abstract
Emerging neurophysiologic evidence indicates that motor systems are activated during the perception of speech, but whether this activity reflects basic processes underlying speech perception remains a matter of considerable debate. Our contribution to this debate is to report direct behavioral evidence that specific articulatory commands are activated automatically and involuntarily during speech perception. We used electropalatography to measure whether motor information activated from spoken distractors would yield specific distortions on the articulation of printed target syllables. Participants produced target syllables beginning with /k/ or /s/ while listening to the same syllables or to incongruent rhyming syllables beginning with /t/. Tongue-palate contact for target productions was measured during the articulatory closure of /k/ and during the frication of /s/. Results revealed "traces" of the incongruent distractors on target productions, with the incongruent /t/-initial distractors inducing greater alveolar contact in the articulation of /k/ and /s/ than the congruent distractors. Two further experiments established that (i) the nature of this interference effect is dependent specifically on the articulatory properties of the spoken distractors; and (ii) this interference effect is unique to spoken distractors and does not arise when distractors are presented in printed form. Results are discussed in terms of a broader emerging framework concerning the relationship between perception and action, whereby the perception of action entails activation of the motor system.
Collapse
|
43
|
Abstract
In speech-production research, three-dimensional (3D) MRI of the upper airway has provided insights into vocal tract shaping and data for its modeling. Small movements of articulators can lead to large changes in the produced sound, therefore improving the resolution of these data sets, within the constraints of a sustained speech sound (6-12 s), is an important area for investigation. The purpose of the study is to provide a first application of compressed sensing (CS) to high-resolution 3D upper airway MRI using spatial finite difference as the sparsifying transform, and to experimentally determine the benefit of applying constraints on image phase. Estimates of image phase are incorporated into the CS reconstruction to improve the sparsity of the finite difference of the solution. In a retrospective subsampling experiment with no sound production, 5x and 4x were the highest acceleration factors that produced acceptable image quality when using a phase constraint and when not using a phase constraint, respectively. The prospective use of a 5x undersampled acquisition and phase-constrained CS reconstruction enabled 3D vocal tract MRI during sustained sound production of English consonants /s/, /integral/, /l/, and /r/ with 1.5 x 1.5 x 2.0 mm(3) spatial resolution and 7 s of scan time.
Collapse
Affiliation(s)
- Yoon-Chul Kim
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089-2564, USA.
| | | | | |
Collapse
|
44
|
Wismueller A, Behrends J, Hoole P, Leinsinger GL, Reiser MF, Westesson PL. Human vocal tract analysis by in vivo 3D MRI during phonation: a complete system for imaging, quantitative modeling, and speech synthesis. Med Image Comput Comput Assist Interv 2008; 11:306-12. [PMID: 18982619 DOI: 10.1007/978-3-540-85990-1_37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using a standardized MRI protocol (1.5 T, T1w FLASH, ST 4mm, 23 slices, acq. time 21s). The volunteers performed a prolonged (> or = 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (i) automated registration of the phoneme-specific data acquired in different slice orientations, (ii) semi-automated segmentation of oropharyngeal structures, (iii) computation of a curvilinear vocal tract midline in 3D by nonlinear PCA, (iv) computation of cross-sectional areas of the vocal tract perpendicular to this midline. For the vowels /a/,/e/,/i/,/o/,/ø/,/u/,/y/, the extracted area functions were used to synthesize phoneme sounds based on an articulatory-acoustic model. For quantitative analysis, recorded and synthesized phonemes were compared, where area functions extracted from 2D midsagittal slices were used as a reference. All vowels could be identified correctly based on the synthesized phoneme sounds. The comparison between synthesized and recorded vowel phonemes revealed that the quality of phoneme sound synthesis was improved for phonemes /a/, /o/, and /y/, if 3D instead of 2D data were used, as measured by the average relative frequency shift between recorded and synthesized vowel formants (p < 0.05, one-sided Wilcoxon rank sum test). In summary, the combination of fast MRI followed by subsequent 3D segmentation and analysis is a novel approach to examine human phonation in vivo. It unveils functional anatomical findings that may be essential for realistic modelling of the human vocal tract during speech production.
Collapse
|
45
|
Zhou X, Espy-Wilson CY, Boyce S, Tiede M, Holland C, Choe A. A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/. J Acoust Soc Am 2008; 123:4466-81. [PMID: 18537397 PMCID: PMC2680662 DOI: 10.1121/1.2902168] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Speakers of rhotic dialects of North American English show a range of different tongue configurations for /r/. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). "A dialect study of American English r's by x-ray motion picture," Linguistics 44, 28-69; Westbury, J. R. et al. (1998), "Differences among speakers in lingual articulation for American English /r/," Speech Commun. 26, 203-206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of "retroflex" /r/ and "bunched" /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4/F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4/F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator.
Collapse
Affiliation(s)
- Xinhui Zhou
- Speech Communication Laboratory, Institute of Systems Research, and Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA.
| | | | | | | | | | | |
Collapse
|
46
|
Story BH. Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. J Acoust Soc Am 2008; 123:327-35. [PMID: 18177162 PMCID: PMC2377017 DOI: 10.1121/1.2805683] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A new set of area functions for vowels has been obtained with magnetic resonance imaging from the same speaker as that previously reported in 1996 [Story et al., J. Acoust. Soc. Am. 100, 537-554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on magnetic resonance images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intraspeaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
47
|
Abstract
This paper presents a four-subject study that examines the relative influence of syllable position and stress, together with vowel context on the colouring of the dark-l characteristic of speakers of General American English. Most investigators report lighter /l/ tokens in syllable onsets and darker tokens in coda positions. The present study demonstrates that when dark-l serves as an onset in iambic intervocalic context with tautosyllabic high front vowels, it is fully dark as a result of domain-initial strengthening. By contrast, when dark-l is abutted across a word boundary to word-final or word-initial consonants, or when it is contained in a foot-internal context (preboundary intervocalic rime with trochaic stress) its dorsal gesture is constrained, resulting in less dark tokens. In the case of dark-l, articulatory undershoot must be understood not only in terms of the alveolar gesture, but also the dorsal gesture.
Collapse
Affiliation(s)
- Judith Oxley
- Department of Communicative Disorders, University of Louisiana at Lafayette, LA 70504, USA.
| | | | | |
Collapse
|
48
|
Pruthi T, Espy-Wilson CY, Story BH. Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. J Acoust Soc Am 2007; 121:3858-73. [PMID: 17552733 DOI: 10.1121/1.2722220] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In this study, vocal tract area functions for one American English speaker, recorded using magnetic resonance imaging, were used to simulate and analyze the acoustics of vowel nasalization. Computer vocal tract models and susceptance plots were used to study the three most important sources of acoustic variability involved in the production of nasalized vowels: velar coupling area, asymmetry of nasal passages, and the sinus cavities. Analysis of the susceptance plots of the pharyngeal and oral cavities, -(B(p)+B(o)), and the nasal cavity, B(n), helped in understanding the movement of poles and zeros with varying coupling areas. Simulations using two nasal passages clearly showed the introduction of extra pole-zero pairs due to the asymmetry between the passages. Simulations with the inclusion of maxillary and sphenoidal sinuses showed that each sinus can potentially introduce one pole-zero pair in the spectrum. Further, the right maxillary sinus introduced a pole-zero pair at the lowest frequency. The effective frequencies of these poles and zeros due to the sinuses in the sum of the oral and nasal cavity outputs changes with a change in the configuration of the oral cavity, which may happen due to a change in the coupling area, or in the vowel being articulated.
Collapse
Affiliation(s)
- Tarun Pruthi
- Speech Communication Laboratory, Institute of Systems Research and Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA.
| | | | | |
Collapse
|
49
|
Takemoto H, Honda K, Masaki S, Shimada Y, Fujimoto I. Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. J Acoust Soc Am 2006; 119:1037-49. [PMID: 16521766 DOI: 10.1121/1.2151823] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
A 3D cine-MRI technique was developed based on a synchronized sampling method [Masaki et al., J. Acoust. Soc. Jpn. E 20, 375-379 (1999)] to measure the temporal changes in the vocal tract area function during a short utterance /aiueo/ in Japanese. A time series of head-neck volumes was obtained after 640 repetitions of the utterance produced by a male speaker, from which area functions were extracted frame-by-frame. A region-based analysis showed that the volumes of the front and back cavities tend to change reciprocally and that the areas near the larynx and posterior edge of the hard palate were almost constant throughout the utterance. The lower four formants were calculated from all the area functions and compared with those of natural speech sounds. The mean absolute percent error between calculated and measured formants among all the frames was 4.5%. The comparison of vocal tract shapes for the five vowels with those from the static MRI method suggested a problem of MRI observation of the vocal tract: data from static MRI tend to result in a deviation from natural vocal tract geometry because of the gravity effect.
Collapse
Affiliation(s)
- Hironori Takemoto
- ATR Human Information Science Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288 Japan
| | | | | | | | | |
Collapse
|
50
|
Abstract
A model of the vocal-tract area function is described that consists of four tiers. The first tier is a vowel substrate defined by a system of spatial eigenmodes and a neutral area function determined from MRI-based vocal-tract data. The input parameters to the first tier are coefficient values that, when multiplied by the appropriate eigenmode and added to the neutral area function, construct a desired vowel. The second tier consists of a consonant shaping function defined along the length of the vocal tract that can be used to modify the vowel substrate such that a constriction is formed. Input parameters consist of the location, area, and range of the constriction. Location and area roughly correspond to the standard phonetic specifications of place and degree of constriction, whereas the range defines the amount of vocal-tract length over which the constriction will influence the tract shape. The third tier allows length modifications for articulatory maneuvers such as lip rounding/spreading and larynx lowering/raising. Finally, the fourth tier provides control of the level of acoustic coupling of the vocal tract to the nasal tract. All parameters can be specified either as static or time varying, which allows for multiple levels of coarticulation or coproduction.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|