1
|
Bögels S, Levinson SC. Ultrasound measurements of interactive turn-taking in question-answer sequences: Articulatory preparation is delayed but not tied to the response. PLoS One 2023; 18:e0276470. [PMID: 37405982 DOI: 10.1371/journal.pone.0276470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
We know that speech planning in conversational turn-taking can happen in overlap with the previous turn and research suggests that it starts as early as possible, that is, as soon as the gist of the previous turn becomes clear. The present study aimed to investigate whether planning proceeds all the way up to the last stage of articulatory preparation (i.e., putting the articulators in place for the first phoneme of the response) and what the timing of this process is. Participants answered pre-recorded quiz questions (being under the illusion that they were asked live), while their tongue movements were measured using ultrasound. Planning could start early for some quiz questions (i.e., midway during the question), but late for others (i.e., only at the end of the question). The results showed no evidence for a difference between tongue movements in these two types of questions for at least two seconds after planning could start in early-planning questions, suggesting that speech planning in overlap with the current turn proceeds more slowly than in the clear. On the other hand, when time-locking to speech onset, tongue movements differed between the two conditions from up to two seconds before this point. This suggests that articulatory preparation can occur in advance and is not fully tied to the overt response itself.
Collapse
Affiliation(s)
- Sara Bögels
- Department of Communication and Cognition, Tilburg University, Tilburg, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, The Netherlands
| | | |
Collapse
|
2
|
Belz M, Rasskazova O, Krivokapić J, Mooshammer C. Interaction between Phrasal Structure and Vowel Tenseness in German: An Acoustic and Articulatory Study. LANGUAGE AND SPEECH 2023; 66:3-34. [PMID: 35021902 PMCID: PMC9975821 DOI: 10.1177/00238309211064857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Phrase-final lengthening affects the segments preceding a prosodic boundary. This prosodic variation is generally assumed to be independent of the phonemic identity. We refer to this as the 'uniform lengthening hypothesis' (ULH). However, in German, lax vowels do not undergo lengthening for word stress or shortening for increased speech rate, indicating that temporal properties might interact with phonemic identity. We test the ULH by comparing the effect of the boundary on acoustic and kinematic measures for tense and lax vowels and several coda consonants. We further examine if the boundary effect decreases with distance from the boundary. Ten native speakers of German were recorded by means of electromagnetic articulography (EMA) while reading sentences that contained six minimal pairs varying in vowel tenseness and boundary type. In line with the ULH, the results show that the acoustic durations of lax vowels are lengthened phrase-finally, similarly to tense vowels. We find that acoustic lengthening is stronger the closer the segments are to the boundary. Articulatory parameters of the closing movements toward the post-vocalic consonants are affected by both phrasal position and identity of the preceding vowel. The results are discussed with regard to the interaction between prosodic structure and vowel tenseness.
Collapse
Affiliation(s)
- Malte Belz
- Malte Belz, Institut für deutsche Sprache und Linguistik, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany.
| | | | | | | |
Collapse
|
3
|
Villarreal D, Clark L. Intraspeaker Priming across the New Zealand English Short Front Vowel Shift. LANGUAGE AND SPEECH 2022; 65:713-739. [PMID: 34743645 PMCID: PMC9326802 DOI: 10.1177/00238309211053033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A growing body of research in psycholinguistics, corpus linguistics, and sociolinguistics shows that we have a strong tendency to repeat linguistic material that we have recently produced, seen, or heard. The present paper investigates whether priming effects manifest in continuous phonetic variation the way it has been reported in phonological, morphological, and syntactic variation. We analyzed nearly 60,000 tokens of vowels involved in the New Zealand English short front vowel shift (SFVS), a change in progress in which trap/dress move in the opposite direction to kit, from a topic-controlled corpus of monologues (166 speakers), to test for effects that are characteristic of priming phenomena: repetition, decay, and lexical boost. Our analysis found evidence for all three effects. Tokens that were relatively high and front tended to be followed by tokens that were also high and front; the repetition effect weakened with greater time between the prime and target; and the repetition effect was stronger if the prime and target belonged to (different tokens of) the same word. Contrary to our expectations, however, the cross-vowel effects suggest that the repetition effect responded not to the direction of vowel changes within the SFVS, but rather the peripherality of the tokens. We also found an interaction between priming behavior and gender, with stronger repetition effects among men than women. While these findings both indicate that priming manifests in continuous phonetic variation and provide further evidence that priming is among the factors providing structure to intraspeaker variation, they also challenge unitary accounts of priming phenomena.
Collapse
Affiliation(s)
- Dan Villarreal
- Dan Villarreal, University of Pittsburgh, Department of Linguistics, 2816 Cathedral of Learning, Pittsburgh, PA 15260.
| | | |
Collapse
|
4
|
Castellucci GA, Kovach CK, Howard MA, Greenlee JDW, Long MA. A speech planning network for interactive language use. Nature 2022; 602:117-122. [PMID: 34987226 PMCID: PMC9990513 DOI: 10.1038/s41586-021-04270-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 11/19/2021] [Indexed: 11/09/2022]
Abstract
During conversation, people take turns speaking by rapidly responding to their partners while simultaneously avoiding interruption1,2. Such interactions display a remarkable degree of coordination, as gaps between turns are typically about 200 milliseconds3-approximately the duration of an eyeblink4. These latencies are considerably shorter than those observed in simple word-production tasks, which indicates that speakers often plan their responses while listening to their partners2. Although a distributed network of brain regions has been implicated in speech planning5-9, the neural dynamics underlying the specific preparatory processes that enable rapid turn-taking are poorly understood. Here we use intracranial electrocorticography to precisely measure neural activity as participants perform interactive tasks, and we observe a functionally and anatomically distinct class of planning-related cortical dynamics. We localize these responses to a frontotemporal circuit centred on the language-critical caudal inferior frontal cortex10 (Broca's region) and the caudal middle frontal gyrus-a region not normally implicated in speech planning11-13. Using a series of motor tasks, we then show that this planning network is more active when preparing speech as opposed to non-linguistic actions. Finally, we delineate planning-related circuitry during natural conversation that is nearly identical to the network mapped with our interactive tasks, and we find this circuit to be most active before participant speech during unconstrained turn-taking. Therefore, we have identified a speech planning network that is central to natural language generation during social interaction.
Collapse
Affiliation(s)
- Gregg A Castellucci
- NYU Neuroscience Institute and Department of Otolaryngology, New York University Langone Medical Center, New York, NY, USA
- Center for Neural Science, New York University, New York, NY, USA
| | | | - Matthew A Howard
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
| | | | - Michael A Long
- NYU Neuroscience Institute and Department of Otolaryngology, New York University Langone Medical Center, New York, NY, USA.
- Center for Neural Science, New York University, New York, NY, USA.
| |
Collapse
|
5
|
Krivokapić J, Styler W, Byrd D. The role of speech planning in the articulation of pauses. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:402. [PMID: 35104998 DOI: 10.1121/10.0009279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 12/20/2021] [Indexed: 06/14/2023]
Abstract
Extensive research has found that the duration of a pause is influenced by the length of an upcoming utterance, suggesting that speakers plan the upcoming utterance during this time. Research has more recently begun to examine articulation during pauses. A specific configuration of the vocal tract during acoustic pauses, termed pause posture (PP), has been identified in Greek and American English. However, the cognitive function giving rise to PPs is not well understood. The present study examines whether PPs are related to speech planning processes, such that they contribute additional planning time for an upcoming utterance. In an articulatory magnetometer study, the hypothesis is tested that an increase in upcoming utterance length leads to more frequent PP occurrence and that PPs are longer in pauses that precede longer phrases. The results indicate that PPs are associated with planning time for longer utterances but that they are associated with a relatively fixed scope of planning for upcoming speech. To further examine the relationship between articulation and speech planning, an additional hypothesis examines whether the first part of the pause predominantly serves to mark prosodic boundaries while the second part serves speech planning purposes. This hypothesis is not supported by the results.
Collapse
Affiliation(s)
- Jelena Krivokapić
- Department of Linguistics, University of Michigan, Lorch Hall, 611 Tappan Ave, Ann Arbor, Michigan 48109, USA
| | - Will Styler
- Department of Linguistics, University of California San Diego, 9500 Gilman Drive #0108, La Jolla, California 92093, USA
| | - Dani Byrd
- Department of Linguistics, University of Southern California, 3601 Watt Way GFS 301, Los Angeles, California 90089, USA
| |
Collapse
|
6
|
Wiltshire CEE, Chiew M, Chesters J, Healy MP, Watkins KE. Speech Movement Variability in People Who Stutter: A Vocal Tract Magnetic Resonance Imaging Study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2438-2452. [PMID: 34157239 PMCID: PMC8323486 DOI: 10.1044/2021_jslhr-20-00507] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 01/29/2021] [Accepted: 03/01/2021] [Indexed: 06/01/2023]
Abstract
Purpose People who stutter (PWS) have more unstable speech motor systems than people who are typically fluent (PWTF). Here, we used real-time magnetic resonance imaging (MRI) of the vocal tract to assess variability and duration of movements of different articulators in PWS and PWTF during fluent speech production. Method The vocal tracts of 28 adults with moderate to severe stuttering and 20 PWTF were scanned using MRI while repeating simple and complex pseudowords. Midsagittal images of the vocal tract from lips to larynx were reconstructed at 33.3 frames per second. For each participant, we measured the variability and duration of movements across multiple repetitions of the pseudowords in three selected articulators: the lips, tongue body, and velum. Results PWS showed significantly greater speech movement variability than PWTF during fluent repetitions of pseudowords. The group difference was most evident for measurements of lip aperture using these stimuli, as reported previously, but here, we report that movements of the tongue body and velum were also affected during the same utterances. Variability was not affected by phonological complexity. Speech movement variability was unrelated to stuttering severity within the PWS group. PWS also showed longer speech movement durations relative to PWTF for fluent repetitions of multisyllabic pseudowords, and this group difference was even more evident as complexity increased. Conclusions Using real-time MRI of the vocal tract, we found that PWS produced more variable movements than PWTF even during fluent productions of simple pseudowords. PWS also took longer to produce multisyllabic words relative to PWTF, particularly when words were more complex. This indicates general, trait-level differences in the control of the articulators between PWS and PWTF. Supplemental Material https://doi.org/10.23641/asha.14782092.
Collapse
Affiliation(s)
- Charlotte E. E. Wiltshire
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| | - Mark Chiew
- Wellcome Centre for Integrative Neuroimaging, Nuffield Department of Clinical Neurosciences, University of Oxford, United Kingdom
| | - Jennifer Chesters
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| | - Máiréad P. Healy
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| | - Kate E. Watkins
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| |
Collapse
|
7
|
A deep neural network based correction scheme for improved air-tissue boundary prediction in real-time magnetic resonance imaging video. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2020.101160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
8
|
Yuan J, Cai X, Bian Y, Ye Z, Church K. Pauses for Detection of Alzheimer’s Disease. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2020.624488] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Pauses, disfluencies and language problems in Alzheimer’s disease can be naturally modeled by fine-tuning Transformer-based pre-trained language models such as BERT and ERNIE. Using this method with pause-encoded transcripts, we achieved 89.6% accuracy on the test set of the ADReSS (Alzheimer’sDementiaRecognition throughSpontaneousSpeech) Challenge. The best accuracy was obtained with ERNIE, plus an encoding of pauses. Robustness is a challenge for large models and small training sets. Ensemble over many runs of BERT/ERNIE fine-tuning reduced variance and improved accuracy. We found thatumwas used much less frequently in Alzheimer’s speech, compared touh. We discussed this interesting finding from linguistic and cognitive perspectives.
Collapse
|
9
|
Krivokapić J, Styler W, Parrell B. Pause Postures: The relationship between articulation and cognitive processes during pauses. JOURNAL OF PHONETICS 2020; 79:100953. [PMID: 32218635 PMCID: PMC7098615 DOI: 10.1016/j.wocn.2019.100953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Studies examining articulatory characteristics of pauses have identified language-specific postures of the vocal tract in inter-utterance pauses and different articulatory patterns in grammatical and non-grammatical pauses. Pause postures-specific articulatory movements that occur during pauses at strong prosodic boundaries-have been identified for Greek and German. However, the cognitive function of these articulations has not been examined so far. We start addressing this question by investigating the effect of 1) utterance type and 2) planning on pause posture occurrence and properties in American English. We first examine whether pause postures exist in American English. In an electromagnetic articulometry study, seven participants produced sentences varying in linguistic structure (stress, boundary, sentence type). To determine the presence of pause postures, as well as to lay the groundwork for their future automatic annotation and detection, a Support Vector Machine Classifier was built to identify pause postures. Results show that pause postures exist for all speakers in this study but that the frequency of occurrence is speaker dependent. Across participants, we find that there is a stable relationship between the pause posture and other events (boundary tones and vowels) at prosodic boundaries, parallel to previous work in Greek. We find that the occurrence of pause postures is not systematically related to utterance type. Lastly, pause postures increase in frequency and duration as utterance length increases, suggesting that pause postures are at least partially related to speech planning processes.
Collapse
Affiliation(s)
- Jelena Krivokapić
- University of Michigan Department of Linguistics, 421 Lorch Hall, 611 Tappan Street, Ann Arbor, MI 48109-1220
- Haskins Laboratories, 300 George St 9th Fl, New Haven, CT 06511-6624
| | - Will Styler
- University of California, San Diego Department of Linguistics, 9500 Gilman Drive #0108, La Jolla, CA 92093-0108
| | - Benjamin Parrell
- University of Wisconsin-Madison Department of Communication Sciences and Disorders, Goodnight Hall, 1975 Willow Drive, Madison, WI 53706
| |
Collapse
|
10
|
Shaw JA, Chen WR. Spatially Conditioned Speech Timing: Evidence and Implications. Front Psychol 2019; 10:2726. [PMID: 31866911 PMCID: PMC6906199 DOI: 10.3389/fpsyg.2019.02726] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 11/18/2019] [Indexed: 11/13/2022] Open
Abstract
Patterns of relative timing between consonants and vowels appear to be conditioned in part by phonological structure, such as syllables, a finding captured naturally by the two-level feedforward model of Articulatory Phonology (AP). In AP, phonological form - gestures and the coordination relations between them - receive an invariant description at the inter-gestural level. The inter-articulator level actuates gestures, receiving activation from the inter-gestural level and resolving competing demands on articulators. Within this architecture, the inter-gestural level is blind to the location of articulators in space. A key prediction is that intergestural timing is stable across variation in the spatial position of articulators. We tested this prediction by conducting an Electromagnetic Articulography (EMA) study of Mandarin speakers producing CV monosyllables, consisting of labial consonants and back vowels in isolation. Across observed variation in the spatial position of the tongue body before each syllable, we investigated whether inter-gestural timing between the lips, for the consonant, and the tongue body, for the vowel, remained stable, as is predicted by feedforward control, or whether timing varied with the spatial position of the tongue at the onset of movement. Results indicated a correlation between the initial position of the tongue gesture for the vowel and C-V timing, indicating that inter-gestural timing is sensitive to the position of the articulators, possibly relying on somatosensory feedback. Implications of these results and possible accounts within the Articulatory Phonology framework are discussed.
Collapse
Affiliation(s)
- Jason A. Shaw
- Department of Linguistics, Yale University, New Haven, CT, United States
| | | |
Collapse
|
11
|
Heyne M, Derrick D, Al-Tamimi J. Native Language Influence on Brass Instrument Performance: An Application of Generalized Additive Mixed Models (GAMMs) to Midsagittal Ultrasound Images of the Tongue. Front Psychol 2019; 10:2597. [PMID: 31827453 PMCID: PMC6890863 DOI: 10.3389/fpsyg.2019.02597] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/01/2019] [Indexed: 02/06/2023] Open
Abstract
This paper presents the findings of an ultrasound study of 10 New Zealand English and 10 Tongan-speaking trombone players, to determine whether there is an influence of native language speech production on trombone performance. Trombone players' midsagittal tongue shapes were recorded while reading wordlists and during sustained note productions, and tongue surface contours traced. After normalizing to account for differences in vocal tract shape and ultrasound transducer orientation, we used generalized additive mixed models (GAMMs) to estimate average tongue surface shapes used by the players from the two language groups when producing notes at different pitches and intensities, and during the production of the monophthongs in their native languages. The average midsagittal tongue contours predicted by our models show a statistically robust difference at the back of the tongue distinguishing the two groups, where the New Zealand English players display an overall more retracted tongue position; however, tongue shape during playing does not directly map onto vowel tongue shapes as prescribed by the pedagogical literature. While the New Zealand English-speaking participants employed a playing tongue shape approximating schwa and the vowel used in the word 'lot,' the Tongan participants used a tongue shape loosely patterning with the back vowels /o/ and /u/. We argue that these findings represent evidence for native language influence on brass instrument performance; however, this influence seems to be secondary to more basic constraints of brass playing related to airflow requirements and acoustical considerations, with the vocal tract configurations observed across both groups satisfying these conditions in different ways. Our findings furthermore provide evidence for the functional independence of various sections of the tongue and indicate that speech production, itself an acquired motor skill, can influence another skilled behavior via motor memory of vocal tract gestures forming the basis of local optimization processes to arrive at a suitable tongue shape for sustained note production.
Collapse
Affiliation(s)
- Matthias Heyne
- Speech Laboratory, Department of Speech, Language & Hearing Sciences, College of Health & Rehabilitation Sciences: Sargent College, Boston University, Boston, MA, United States
- New Zealand Institute of Language Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Donald Derrick
- New Zealand Institute of Language Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Jalal Al-Tamimi
- Speech and Language Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
12
|
Ramanarayanan V, Tilsen S, Proctor M, Töger J, Goldstein L, Nayak KS, Narayanan S. Analysis of speech production real-time MRI. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2018.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
13
|
Abstract
Speech motor actions are performed quickly, while simultaneously maintaining a high degree of accuracy. Are speed and accuracy in conflict during speech production? Speed-accuracy tradeoffs have been shown in many domains of human motor action, but have not been directly examined in the domain of speech production. The present work seeks evidence for Fitts' law, a rigorous formulation of this fundamental tradeoff, in speech articulation kinematics by analyzing USC-TIMIT, a real-time magnetic resonance imaging data set of speech production. A theoretical framework for considering Fitts' law with respect to models of speech motor control is elucidated. Methodological challenges in seeking relationships consistent with Fitts' law are addressed, including the operational definitions and measurement of key variables in real-time MRI data. Results suggest the presence of speed-accuracy tradeoffs for certain types of speech production actions, with wide variability across syllable position, and substantial variability also across subjects. Coda consonant targets immediately following the syllabic nucleus show the strongest evidence of this tradeoff, with correlations as high as 0.72 between speed and accuracy. A discussion is provided concerning the potentially limited applicability of Fitts' law in the context of speech production, as well as the theoretical context for interpreting the results.
Collapse
|
14
|
Wieling M, Tiede M. Quantitative identification of dialect-specific articulatory settings. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:389. [PMID: 28764427 DOI: 10.1121/1.4990951] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The purpose of this study was to quantitatively contrast the articulatory settings of two Dutch dialects. Tongue movement data during speech were collected on site at two high schools (34 speakers) in the Netherlands using a portable electromagnetic articulography device. Comparing the tongue positions during pauses in speech between the two groups revealed a clear difference in the articulatory settings, with significantly more frontal tongue positions for the speakers from Ubbergen in the Southeast of the Netherlands compared to those from Ter Apel in the North of the Netherlands. These results provide quantitative evidence for differences in articulatory settings at the dialect level.
Collapse
Affiliation(s)
- Martijn Wieling
- University of Groningen, Oude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands
| | - Mark Tiede
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA
| |
Collapse
|
15
|
Töger J, Sorensen T, Somandepalli K, Toutios A, Lingala SG, Narayanan S, Nayak K. Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:3323. [PMID: 28599561 PMCID: PMC5436977 DOI: 10.1121/1.4983081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Static anatomical and real-time dynamic magnetic resonance imaging (RT-MRI) of the upper airway is a valuable method for studying speech production in research and clinical settings. The test-retest repeatability of quantitative imaging biomarkers is an important parameter, since it limits the effect sizes and intragroup differences that can be studied. Therefore, this study aims to present a framework for determining the test-retest repeatability of quantitative speech biomarkers from static MRI and RT-MRI, and apply the framework to healthy volunteers. Subjects (n = 8, 4 females, 4 males) are imaged in two scans on the same day, including static images and dynamic RT-MRI of speech tasks. The inter-study agreement is quantified using intraclass correlation coefficient (ICC) and mean within-subject standard deviation (σe). Inter-study agreement is strong to very strong for static measures (ICC: min/median/max 0.71/0.89/0.98, σe: 0.90/2.20/6.72 mm), poor to strong for dynamic RT-MRI measures of articulator motion range (ICC: 0.26/0.75/0.90, σe: 1.6/2.5/3.6 mm), and poor to very strong for velocities (ICC: 0.21/0.56/0.93, σe: 2.2/4.4/16.7 cm/s). In conclusion, this study characterizes repeatability of static and dynamic MRI-derived speech biomarkers using state-of-the-art imaging. The introduced framework can be used to guide future development of speech biomarkers. Test-retest MRI data are provided free for research use.
Collapse
Affiliation(s)
- Johannes Töger
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Tanner Sorensen
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Krishna Somandepalli
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Krishna Nayak
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| |
Collapse
|
16
|
Krivokapić J, Tiede MK, Tyrone ME. A Kinematic Study of Prosodic Structure in Articulatory and Manual Gestures: Results from a Novel Method of Data Collection. LABORATORY PHONOLOGY 2017; 8:3. [PMID: 28626493 PMCID: PMC5472837 DOI: 10.5334/labphon.75] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The primary goal of this work is to examine prosodic structure as expressed concurrently through articulatory and manual gestures. Specifically, we investigated the effects of phrase-level prominence (Experiment 1) and of prosodic boundaries (Experiments 2 and 3) on the kinematic properties of oral constriction and manual gestures. The hypothesis guiding this work is that prosodic structure will be similarly expressed in both modalities. To test this, we have developed a novel method of data collection that simultaneously records speech audio, vocal tract gestures (using electromagnetic articulometry) and manual gestures (using motion capture). This method allows us, for the first time, to investigate kinematic properties of body movement and vocal tract gestures simultaneously, which in turn allows us to examine the relationship between speech and body gestures with great precision. A second goal of the paper is thus to establish the validity of this method. Results from two speakers show that manual and oral gestures lengthen under prominence and at prosodic boundaries, indicating that the effects of prosodic structure extend beyond the vocal tract to include body movement.
Collapse
|
17
|
Freitas AC, Wylezinska M, Birch MJ, Petersen SE, Miquel ME. Comparison of Cartesian and Non-Cartesian Real-Time MRI Sequences at 1.5T to Assess Velar Motion and Velopharyngeal Closure during Speech. PLoS One 2016; 11:e0153322. [PMID: 27073905 PMCID: PMC4830548 DOI: 10.1371/journal.pone.0153322] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/28/2016] [Indexed: 11/19/2022] Open
Abstract
Dynamic imaging of the vocal tract using real-time MRI has been an active and growing area of research, having demonstrated great potential to become routinely performed in the clinical evaluation of speech and swallowing disorders. Although many technical advances have been made in regards to acquisition and reconstruction methodologies, there is still no consensus in best practice protocols. This study aims to compare Cartesian and non-Cartesian real-time MRI sequences, regarding image quality and temporal resolution trade-off, for dynamic speech imaging. Five subjects were imaged at 1.5T, while performing normal phonation, in order to assess velar motion and velopharyngeal closure. Data was acquired using both Cartesian and non-Cartesian (spiral and radial) real-time sequences at five different spatial-temporal resolution sets, between 10 fps (1.7×1.7×10 mm3) and 25 fps (1.5×1.5×10 mm3). Only standard scanning resources provided by the MRI scanner manufacturer were used to ensure easy applicability to clinical evaluation and reproducibility. Data sets were evaluated by comparing measurements of the velar structure, dynamic contrast-to-noise ratio and image quality visual scoring. Results showed that for all proposed sequences, FLASH spiral acquisitions provided higher contrast-to-noise ratio, up to a 170.34% increase at 20 fps, than equivalent bSSFP Cartesian acquisitions for the same spatial-temporal resolution. At higher frame rates (22 and 25 fps), spiral protocols were optimal and provided higher CNR and visual scoring than equivalent radial protocols. Comparison of dynamic imaging at 10 and 22 fps for radial and spiral acquisitions revealed no significant difference in CNR performance, thus indicating that temporal resolution can be doubled without compromising spatial resolution (1.9×1.9 mm2) or CNR. In summary, this study suggests that the use of FLASH spiral protocols should be preferred over bSSFP Cartesian for the dynamic imaging of velopharyngeal closure, as it allows for an improvement in CNR and overall image quality without compromising spatial-temporal resolution.
Collapse
Affiliation(s)
- Andreia C. Freitas
- NIHR Cardiovascular Biomedical Research Unit at Barts, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Clinical Physics, Barts Health NHS Trust, London, United Kingdom
| | - Marzena Wylezinska
- NIHR Cardiovascular Biomedical Research Unit at Barts, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Malcolm J. Birch
- Clinical Physics, Barts Health NHS Trust, London, United Kingdom
| | - Steffen E. Petersen
- NIHR Cardiovascular Biomedical Research Unit at Barts, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Marc E. Miquel
- NIHR Cardiovascular Biomedical Research Unit at Barts, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Clinical Physics, Barts Health NHS Trust, London, United Kingdom
| |
Collapse
|
18
|
Katsika A. The role of prominence in determining the scope of boundary-related lengthening in Greek. JOURNAL OF PHONETICS 2016; 55:149-181. [PMID: 27773955 PMCID: PMC5072286 DOI: 10.1016/j.wocn.2015.12.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This study aims at examining and accounting for the scope of the temporal effect of phrase boundaries. Previous research has indicated that there is an interaction between boundary-related lengthening and prominence such that the former extends towards the nearby prominent syllable. However, it is unclear whether this interaction is due to lexical stress and/or phrasal prominence (marked by pitch accent) and how far towards the prominent syllable the effect extends. Here, we use an electromagnetic articulography (EMA) study of Greek to examine the scope of boundary-related lengthening as a function of lexical stress and pitch accent separately. Boundaries are elicited by the means of a variety of syntactic constructions.. The results show an effect of lexical stress. Phrase-final lengthening affects the articulatory gestures of the phrase-final syllable that are immediately adjacent to the boundary in words with final stress, but is initiated earlier within phrase-final words with non-final stress. Similarly, the articulatory configurations during inter-phrasal pauses reach their point of achievement later in words with final stress than in words with non-final stress. These effects of stress hold regardless of whether the phrase-final word is accented or de-accented. Phrase-initial lengthening, on the other hand, is consistently detected on the phrase-initial constriction, independently of where the stress is within the preceding, phrase-final, word. These results indicate that the lexical aspect of prominence plays a role in determining the scope of boundary-related lengthening in Greek. Based on these results, a gestural account of prosodic boundaries in Greek is proposed in which lexical and phrasal prosody interact in a systematic and coordinated fashion. The cross-linguistic dimensions of this account and its implications for prosodic structure are discussed.
Collapse
Affiliation(s)
- Argyro Katsika
- Haskins Laboratories, 300 George Street, Suite 900 New Haven, CT 06511, Tel.: + 1 203 865 6163, ext 269, ,
| |
Collapse
|
19
|
|
20
|
Anticipatory Posturing of the Vocal Tract Reveals Dissociation of Speech Movement Plans from Linguistic Units. PLoS One 2016; 11:e0146813. [PMID: 26760511 PMCID: PMC4711920 DOI: 10.1371/journal.pone.0146813] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 12/22/2015] [Indexed: 11/19/2022] Open
Abstract
Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors.
Collapse
|
21
|
Krivokapić J. Gestural coordination at prosodic boundaries and its role for prosodic structure and speech planning processes. Philos Trans R Soc Lond B Biol Sci 2015; 369:20130397. [PMID: 25385775 DOI: 10.1098/rstb.2013.0397] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Prosodic structure is a grammatical component that serves multiple functions in the production, comprehension and acquisition of language. Prosodic boundaries are critical for the understanding of the nature of the prosodic structure of language, and important progress has been made in the past decades in illuminating their properties. We first review recent prosodic boundary research from the point of view of gestural coordination. We then go on to tie in this work to questions of speech planning and manual and head movement. We conclude with an outline of a new direction of research which is needed for a full understanding of prosodic boundaries and their role in the speech production process.
Collapse
Affiliation(s)
- Jelena Krivokapić
- Department of Linguistics, University of Michigan, 440 Lorch Hall, 611 Tappan St., Ann Arbor, MI 48109-1220, USA Haskins Laboratories, 300 George Street No. 900, New Haven, CT 06511, USA
| |
Collapse
|
22
|
Evaluating velopharyngeal closure with real-time MRI. Pediatr Radiol 2015; 45:941-2. [PMID: 25399057 DOI: 10.1007/s00247-014-3230-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 11/02/2014] [Indexed: 10/24/2022]
|
23
|
Lammert A, Goldstein L, Ramanarayanan V, Narayanan S. Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI. PHONETICA 2015; 71:229-248. [PMID: 25997724 DOI: 10.1159/000371820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 12/31/2014] [Indexed: 06/04/2023]
Abstract
The English past tense allomorph following a coronal stop (e.g., /bɑndəd/) includes a vocoid that has traditionally been transcribed as a schwa or as a barred i. Previous evidence has suggested that this entity does not involve a specific articulatory gesture of any kind. Rather, its presence may simply result from temporal coordination of the two temporally adjacent coronal gestures, while the interval between those two gestures remains voiced and is acoustically reminiscent of a schwa. The acoustic and articulatory characteristics of this vocoid are reexamined in this work using real-time MRI with synchronized audio which affords complete midsagittal views of the vocal tract. A novel statistical analysis is developed to address the issue of articulatory targetlessness based on previous models that predict articulatory action from segmental context. Results reinforce the idea that this vocoid is different, both acoustically and articulatorily, than lexical schwa, but its targetless nature is not supported. Data suggest that an articulatory target does exist, especially in the pharynx where it is revealed by the new data acquisition methodology. Moreover, substantial articulatory differences are observed between subjects, which highlights both the difficulty in characterizing this entity previously, and the need for further study with additional subjects.
Collapse
Affiliation(s)
- Adam Lammert
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, Calif., USA
| | | | | | | |
Collapse
|
24
|
Are articulatory settings mechanically advantageous for speech motor control? PLoS One 2014; 9:e104168. [PMID: 25133544 PMCID: PMC4136795 DOI: 10.1371/journal.pone.0104168] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 07/10/2014] [Indexed: 12/02/2022] Open
Abstract
We address the hypothesis that postures adopted during grammatical pauses in speech production are more “mechanically advantageous” than absolute rest positions for facilitating efficient postural motor control of vocal tract articulators. We quantify vocal tract posture corresponding to inter-speech pauses, absolute rest intervals as well as vowel and consonant intervals using automated analysis of video captured with real-time magnetic resonance imaging during production of read and spontaneous speech by 5 healthy speakers of American English. We then use locally-weighted linear regression to estimate the articulatory forward map from low-level articulator variables to high-level task/goal variables for these postures. We quantify the overall magnitude of the first derivative of the forward map as a measure of mechanical advantage. We find that postures assumed during grammatical pauses in speech as well as speech-ready postures are significantly more mechanically advantageous than postures assumed during absolute rest. Further, these postures represent empirical extremes of mechanical advantage, between which lie the postures assumed during various vowels and consonants. Relative mechanical advantage of different postures might be an important physical constraint influencing planning and control of speech production.
Collapse
|
25
|
Scott AD, Wylezinska M, Birch MJ, Miquel ME. Speech MRI: morphology and function. Phys Med 2014; 30:604-18. [PMID: 24880679 DOI: 10.1016/j.ejmp.2014.05.001] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 04/24/2014] [Accepted: 05/01/2014] [Indexed: 11/27/2022] Open
Abstract
Magnetic Resonance Imaging (MRI) plays an increasing role in the study of speech. This article reviews the MRI literature of anatomical imaging, imaging for acoustic modelling and dynamic imaging. It describes existing imaging techniques attempting to meet the challenges of imaging the upper airway during speech and examines the remaining hurdles and future research directions.
Collapse
Affiliation(s)
- Andrew D Scott
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; NIHR Cardiovascular Biomedical Research Unit, The Royal Brompton Hospital, Sydney Street, London SW3 6NP, United Kingdom
| | - Marzena Wylezinska
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom
| | - Malcolm J Birch
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom
| | - Marc E Miquel
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom.
| |
Collapse
|