1
|
Aruffo C. Reading Scripted Dialogue: Pretending to Take Turns. DISCOURSE PROCESSES 2020. [DOI: 10.1080/0163853x.2019.1651588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
2
|
Levinson SC, Torreira F. Timing in turn-taking and its implications for processing models of language. Front Psychol 2015; 6:731. [PMID: 26124727 PMCID: PMC4464110 DOI: 10.3389/fpsyg.2015.00731] [Citation(s) in RCA: 130] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 05/16/2015] [Indexed: 12/03/2022] Open
Abstract
The core niche for language use is in verbal interaction, involving the rapid exchange of turns at talking. This paper reviews the extensive literature about this system, adding new statistical analyses of behavioral data where they have been missing, demonstrating that turn-taking has the systematic properties originally noted by Sacks et al. (1974; hereafter SSJ). This system poses some significant puzzles for current theories of language processing: the gaps between turns are short (of the order of 200 ms), but the latencies involved in language production are much longer (over 600 ms). This seems to imply that participants in conversation must predict (or 'project' as SSJ have it) the end of the current speaker's turn in order to prepare their response in advance. This in turn implies some overlap between production and comprehension despite their use of common processing resources. Collecting together what is known behaviorally and experimentally about the system, the space for systematic explanations of language processing for conversation can be significantly narrowed, and we sketch some first model of the mental processes involved for the participant preparing to speak next.
Collapse
Affiliation(s)
- Stephen C. Levinson
- Language and Cognition Department, Max Planck Institute for PsycholinguisticsNijmegen, Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud UniversityNijmegen, Netherlands
| | - Francisco Torreira
- Language and Cognition Department, Max Planck Institute for PsycholinguisticsNijmegen, Netherlands
| |
Collapse
|
3
|
Patel S, Nishimura C, Lodhavia A, Korzyukov O, Parkinson A, Robin DA, Larson CR. Understanding the mechanisms underlying voluntary responses to pitch-shifted auditory feedback. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:3036-3044. [PMID: 24815283 PMCID: PMC4032396 DOI: 10.1121/1.4870490] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 01/11/2014] [Accepted: 03/25/2014] [Indexed: 06/03/2023]
Abstract
Previous research has shown that vocal errors can be simulated using a pitch perturbation technique. Two types of responses are observed when subjects are asked to ignore changes in pitch during a steady vowel production, a compensatory response countering the direction of the perceived change in pitch and a following response in the same direction as the pitch perturbation. The present study investigated the nature of these responses by asking subjects to volitionally change their voice fundamental frequency either in the opposite direction ("opposing" group) or the same direction ("following" group) as the pitch shifts (±100 cents, 1000 ms) presented during the speaker's production of an /a/ vowel. Results showed that voluntary responses that followed the stimulus directions had significantly shorter latencies (150 ms) than opposing responses (360 ms). In addition, prior to the slower voluntary opposing responses, there were short latency involuntary responses that followed the stimulus direction. These following responses may involve mechanisms of imitation or vocal shadowing of acoustical stimuli when subjects are predisposed to respond to a change in frequency of a sound. The slower opposing responses may represent a control strategy that requires monitoring and correcting for errors between the feedback signal and the intended vocal goal.
Collapse
Affiliation(s)
- Sona Patel
- Northwestern University, Evanston, Illinois 60208
| | | | | | | | - Amy Parkinson
- University of Texas Health Sciences Center, San Antonio, Texas 78229
| | - Donald A Robin
- University of Texas Health Sciences Center, San Antonio, Texas 78229
| | | |
Collapse
|
4
|
Aubanel V, Cooke M. Strategies adopted by talkers faced with fluctuating and competing-speech maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2884-2894. [PMID: 24116425 DOI: 10.1121/1.4818757] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Studying how interlocutors exchange information efficiently during conversations in less-than-ideal acoustic conditions promises to both further the understanding of links between perception and production and inform the design of human-computer dialogue systems. The current study explored how interlocutors' speech changes in the presence of fluctuating noise. Pairs of talkers were recorded while solving puzzles cooperatively in quiet and with modulated-noise or competing speech maskers whose silent intervals were manipulated to produce either temporally sparse or dense maskers. Talkers responded to masked conditions by both increasing the amount of speech produced and locally changing their speech activity patterns, resulting in a net reduction in the proportion of speech in temporal overlap with the maskers, with larger relative reductions for sparse maskers. An analysis of talker activity in the vicinity of masker onset and offset events showed a significant reduction in onsets following masker onsets, and a similar increase in onsets following masker offsets. These findings demonstrate that talkers are sensitive to masking noise and respond to its fluctuations by adopting a "wait-and-talk" strategy.
Collapse
Affiliation(s)
- Vincent Aubanel
- Language and Speech Laboratory, Universidad del País Vasco, Paseo de la Universidad 5, 01006 Vitoria, Spain
| | | |
Collapse
|
5
|
Kunc L, Míkovec Z, Slavík P. Avatar and Dialog Turn-Yielding Phenomena. INTERNATIONAL JOURNAL OF TECHNOLOGY AND HUMAN INTERACTION 2013. [DOI: 10.4018/jthi.2013040105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Turn-taking and turn-yielding phenomena in dialogs receive increasing attention nowadays. A growing number of spoken dialog systems inspire application designers to humanize people’s interaction experience with computers. The knowledge of psychology in discourse structure could be helpful in this effort. In this paper the authors explore effectiveness of selected visual and vocal turn-yielding cues in dialog systems using synthesized speech and an avatar. The aim of this work is to detect the role of visual and vocal cues on dialog turn-change judgment using a conversational agent. The authors compare and study the cues in two experiments. Findings of those experiments suggest that the selected visual turn-yielding cues are more effective than the vocal cues in increasing correct judgment of dialog turn-change. Vocal cues in the experiment show quite poor results and the conclusion discusses possible explanations of that.
Collapse
Affiliation(s)
- Ladislav Kunc
- Department of Computer Graphics and Interaction, Czech Technical University in Prague, Prague, Czech Republic
| | - Zdenek Míkovec
- Department of Computer Graphics and Interaction, Czech Technical University in Prague, Prague, Czech Republic
| | - Pavel Slavík
- Department of Computer Graphics and Interaction, Czech Technical University in Prague, Prague, Czech Republic
| |
Collapse
|
6
|
Laganaro M, Valente A, Perret C. Time course of word production in fast and slow speakers: A high density ERP topographic study. Neuroimage 2012; 59:3881-8. [DOI: 10.1016/j.neuroimage.2011.10.082] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Revised: 09/28/2011] [Accepted: 10/20/2011] [Indexed: 11/28/2022] Open
|
7
|
Heldner M. Detection thresholds for gaps, overlaps, and no-gap-no-overlaps. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:508-513. [PMID: 21786916 DOI: 10.1121/1.3598457] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Detection thresholds for gaps and overlaps, that is acoustic and perceived silences and stretches of overlapping speech in speaker changes, were determined. Subliminal gaps and overlaps were categorized as no-gap-no-overlaps. The established gap and overlap detection thresholds both corresponded to the duration of a long vowel, or about 120 ms. These detection thresholds are valuable for mapping the perceptual speaker change categories gaps, overlaps, and no-gap-no-overlaps into the acoustic domain. Furthermore, the detection thresholds allow generation and understanding of gaps, overlaps, and no-gap-no-overlaps in human-like spoken dialogue systems.
Collapse
Affiliation(s)
- Mattias Heldner
- Department of Speech, Music and Hearing, KTH, Lindstedtsvägen 24, SE-100 44 Stockholm, Sweden.
| |
Collapse
|
8
|
Sapir S, Baker KK, Larson CR, Ramig LO. Short-latency changes in voice F0 and neck surface EMG induced by mechanical perturbations of the larynx during sustained vowel phonation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2000; 43:268-76. [PMID: 10668668 DOI: 10.1044/jslhr.4301.268] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Nineteen healthy young adult males with normal voice and speech attempted to sustain the vowel /u/ at a constant pitch (target: 180 Hz) and a constant and comfortable loudness level while receiving a sudden mechanical perturbation to the larynx (thyroid prominence) via a servo-controlled probe. The probe moved toward or away from the larynx in a ramp-and-hold fashion (3.3-mm displacement, 0.7 N force, 20-ms rise time, 250-ms duration) as the subjects attempted to maintain a constant probe-larynx pressure. Eighty stimuli were applied in each direction, one stimulus per phonation. Pairs of surface electromyography (EMG) electrodes were attached to the skin of the anterior neck over laryngeal, infralaryngeal, and supralaryngeal areas. The rectified EMG signals, the voltage analog of the voice fundamental frequency (VAF0), and the voltage analog of the probe displacement were digitized and signal-averaged relative to the onset of the stimulus. Sudden perturbation of the larynx induced an instantaneous decrease or increase in VAF0, depending on the direction of the probe's movement, and a short-latency increase in the EMG (30-35 ms) and VAF0 (55-65 ms). We argue that the instantaneous VAF0 change was related to a mechanical effect, and the short-latency VAF0 and EMG changes to reflexogenic effects-the latter most likely associated with both intrinsic and extrinsic laryngeal sensorimotor mechanisms. Further physiological studies are needed to elucidate the sources of the VAF0 and EMG responses. Once elucidated, the present method may provide a powerful noninvasive tool for studying laryngeal neurophysiology. The theoretical and clinical implications of the present findings are addressed.
Collapse
Affiliation(s)
- S Sapir
- The Wilbur James Gould Voice Research Center, The Denver Center for the Performing Arts, CO 80204, USA.
| | | | | | | |
Collapse
|
9
|
Jiang J, Lin E, Sheynin B, Hanson DG. Voice target time in Parkinson's disease: A preliminary report. Otolaryngol Head Neck Surg 1999; 121:87-91. [PMID: 10388885 DOI: 10.1016/s0194-5998(99)70131-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A target-matching paradigm was developed to assess the vocal equivalents of reaction and movement time in Parkinson's disease. Six patients with Parkinson's disease and 6 age- and gender-matched control subjects were asked to enunciate /pa/ to reach a target frequency and intensity level in response to a light stimulus. The stimulus and acoustic responses were simultaneously recorded. Measures included laryngeal reaction time, time between stimulus and phonation onset; frequency voice target time, time from phonation onset to target level of frequency; and amplitude voice target time, time from phonation onset to target level of intensity. The 2 subject groups were significantly differentiated by laryngeal reaction time (t = 299.67, df = 10, P = 0.005) and frequency voice target time (t = 148, df = 10, P = 0.014). These data suggest voice target time is a viable tool for assessing the effects of neurologic disorders on voice execution in Parkinson's disease.
Collapse
Affiliation(s)
- J Jiang
- Department of Otolaryngology Head and Neck Surgery, Northwestern University School of Medicine, Chicago, Illinois, USA
| | | | | | | |
Collapse
|
10
|
Abstract
Active and passive characteristics of the canine cricothyroid muscle were investigated through a series of experiments conducted in vitro and compared with their counterparts in the thyroarytenoid muscle. Samples from separate portions of canine cricothyroid muscle, namely, the pars recta and pars obliqua, were dissected from dog larynges excised a few minutes before death and kept in Krebs-Ringer solution at a temperature of 37 degrees C +/- 1 degrees C and a pH of 7.4+/-0.05. Active tetanic stress was obtained in isometric and isotonic conditions by applying field stimulation to the muscle samples through a pair of parallel-plate platinum electrodes and using a train of square pulses of 0.1-ms duration and 85-V amplitude. Force and elongation of the samples were obtained electronically with a dual-servo system (ergometer). The results indicate that the dynamic response of the canine cricothyroid muscle is almost twice as slow as that of the thyroarytenoid muscle. The average 50% tetanic contraction times for pars recta and pars obliqua were 84 ms and 109 ms, respectively, in comparison to 50 ms for thyroarytenoid. The examination of force-velocity response of this muscle indicates a maximum shortening velocity of 2 to 3 times its length per second, which is about half of the thyroarytenoid shortening speed. The passive properties of the pars recta and pars obliqua portions are similar to those of thyroarytenoid muscle.
Collapse
Affiliation(s)
- F Alipour
- Department of Speech Pathology and Audiology, National Center for Voice and Speech, The University of Iowa, Iowa City 52242-1012, USA.
| | | |
Collapse
|
11
|
van Lieshout PH, Hulstijn W, Peters HF. Speech production in people who stutter: testing the motor plan assembly hypothesis. JOURNAL OF SPEECH AND HEARING RESEARCH 1996; 39:76-92. [PMID: 8820700 DOI: 10.1044/jshr.3901.76] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
The main purpose of the present study was to test the hypothesis that persons who stutter, when compared to persons who do not stutter, are less able to assemble abstract motor plans for short verbal responses. Subjects were adult males who stutter and age- and sex-matched control speakers, who were tested on naming pictures and words, using a choice-reaction time paradigm for both tasks. Words varied in the number of syllables (1, 2, and 3 syllables) and, for the bisyllabic words, also in the number of consonants (one or more) at the onset of the second syllable. Measurements consisted of speech reaction times, word durations, and measures of relative timing of specific motor events in the respiratory, phonatory, and articulatory subsystems. Results indicated that, in spite of longer speech reaction times for persons who stutter in comparison to control speakers, there was no interaction with word size, a finding that does not lend support to the abovementioned hypothesis. Word durations were found to be longer for persons who stutter, and, in addition, there was an interaction of group with word size. Both findings were associated with longer delays for persons who stutter in the onset of upper lip integrated electromyographic (IEMG) activity and thoracic compression, and a group effect on the order of upper lip and lower lip IEMG onset. Findings are taken to suggest the possibility that persons who stutter may use different motor control strategies to compensate for a reduced verbal motor skill, and although the nature of this reduced skill is unknown, it is speculated that it relates to the processes involved in the integration of sensory-motor information.
Collapse
Affiliation(s)
- P H van Lieshout
- Nijmegen Institute of Cognition and Information, The Netherlands.
| | | | | |
Collapse
|
12
|
Sapir S, Li L, Ragin AB, Dod JM. Changes in auditory-vocal reaction times within and across experimental sessions: preliminary observations. JOURNAL OF SPEECH AND HEARING RESEARCH 1993; 36:466-471. [PMID: 8331904 DOI: 10.1044/jshr.3603.466] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Changes in auditory-vocal reaction times (AVRTs) within and across experimental sessions were studied in 13 healthy university students, all females. Subjects were required to listen to a series of synthesized vowels and utter each of the vowels as soon as they heard it. The vowels were /i/, /u/, /a/, /o/, and /ae/, each presented 14 times and all presented in random order and at irregular intervals (2.5-4.5 sec). The stimuli and the instructions were prerecorded and presented to the subjects binaurally at a comfortable intensity level via headphones in an IAC booth. Each subject performed the experimental task twice, a week apart. The stimuli and the vocal responses were tape recorded and later digitized and computer analyzed. Serial analysis of successive AVRTs revealed significant intra- and intersession decreases in AVRTs in the majority of the subjects. Increases in AVRTs were also seen, but much less frequently. The implications of these findings are discussed.
Collapse
Affiliation(s)
- S Sapir
- Northwestern University, Evanston, IL
| | | | | | | |
Collapse
|
13
|
|