1
|
Liu W, Wang T, Huang X. The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation? J Acoust Soc Am 2023; 154:1903-1920. [PMID: 37756574 DOI: 10.1121/10.0021077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023]
Abstract
The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Tianyu Wang
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Xianjun Huang
- School of Psychology, Capital Normal University, 105 North West 3rd Ring Road, Beijing 100048, China
| |
Collapse
|
2
|
Abstract
The McGurk effect is an illusion in which visible articulations alter the perception of auditory speech (e.g., video 'da' dubbed with audio 'ba' may be heard as 'da'). To test the timing of the multisensory processes that underlie the McGurk effect, Ostrand et al. Cognition 151, 96-107, 2016 used incongruent stimuli, such as auditory 'bait' + visual 'date' as primes in a lexical decision task. These authors reported that the auditory word, but not the perceived (visual) word, induced semantic priming, suggesting that the auditory signal alone can provide the input for lexical access, before multisensory integration is complete. Here, we conceptually replicate the design of Ostrand et al. (2016), using different stimuli chosen to optimize the success of the McGurk illusion. In contrast to the results of Ostrand et al. (2016), we find that the perceived (i.e., visual) word of the incongruent stimulus usually induced semantic priming. We further find that the strength of this priming corresponded to the magnitude of the McGurk effect for each word combination. These findings suggest, in contrast to the findings of Ostrand et al. (2016), that lexical access makes use of integrated multisensory information which is perceived by the listener. These findings further suggest that which unimodal signal of a multisensory stimulus is used in lexical access is dependent on the perception of that stimulus.
Collapse
Affiliation(s)
- Josh Dorsi
- Department of Psychology, University of California, Riverside, 900 University Ave, Riverside, CA, 92521, USA.
- Penn State University, College of Medicine, State College, PA, USA.
| | | | - Lawrence D Rosenblum
- Department of Psychology, University of California, Riverside, 900 University Ave, Riverside, CA, 92521, USA
| |
Collapse
|
3
|
Stilp C. Acoustic context effects in speech perception. Wiley Interdiscip Rev Cogn Sci 2019; 11:e1517. [PMID: 31453667 DOI: 10.1002/wcs.1517] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 07/31/2019] [Accepted: 08/01/2019] [Indexed: 11/07/2022]
Abstract
The extreme acoustic variability of speech is well established, which makes the proficiency of human speech perception all the more impressive. Speech perception, like perception in any modality, is relative to context, and this provides a means to normalize the acoustic variability in the speech signal. Acoustic context effects in speech perception have been widely documented, but a clear understanding of how these effects relate to each other across stimuli, timescales, and acoustic domains is lacking. Here we review the influences that spectral context, temporal context, and spectrotemporal context have on speech perception. Studies are organized in terms of whether the context precedes the target (forward effects) or follows it (backward effects), and whether the context is adjacent to the target (proximal) or temporally removed from it (distal). Special cases where proximal and distal contexts have competing influences on perception are also considered. Across studies, a common theme emerges: acoustic differences between contexts and targets are perceptually magnified, producing contrast effects that facilitate perception of target sounds and words. This indicates enhanced sensitivity to changes in the acoustic environment, which maximizes the amount of potential information that can be transmitted to the perceiver. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Christian Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky
| |
Collapse
|
4
|
Ward RM, Kelty-Stephen DG. Bringing the Nonlinearity of the Movement System to Gestural Theories of Language Use: Multifractal Structure of Spoken English Supports the Compensation for Coarticulation in Human Speech Perception. Front Physiol 2018; 9:1152. [PMID: 30233386 PMCID: PMC6129613 DOI: 10.3389/fphys.2018.01152] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Accepted: 07/31/2018] [Indexed: 01/13/2023] Open
Abstract
Coarticulation is the tendency for speech vocalization and articulation even at the phonemic level to change with context, and compensation for coarticulation (CfC) reflects the striking human ability to perceive phonemic stability despite this variability. A current controversy centers on whether CfC depends on contrast between formants of a speech-signal spectrogram-specifically, contrast between offset formants concluding context stimuli and onset formants opening the target sound-or on speech-sound variability specific to the coordinative movement of speech articulators (e.g., vocal folds, postural muscles, lips, tongues). This manuscript aims to encode that coordinative-movement context in terms of speech-signal multifractal structure and to determine whether speech's multifractal structure might explain the crucial gestural support for any proposed spectral contrast. We asked human participants to categorize individual target stimuli drawn from an 11-step [ga]-to-[da] continuum as either phonemes "GA" or "DA." Three groups each heard a specific-type context stimulus preceding target stimuli: either real-speech [al] or [a], sine-wave tones at the third-formant offset frequency of either [al] or [aɹ], and either simulated-speech contexts [al] or [aɹ]. Here, simulating speech contexts involved randomizing the sequence of relatively homogeneous pitch periods within vowel-sound [a] of each [al] and [aɹ]. Crucially, simulated-speech contexts had the same offset and extremely similar vowel formants as and, to additional naïve participants, sounded identical to real-speech contexts. However, randomization distorted original speech-context multifractality, and effects of spectral contrast following speech only appeared after regression modeling of trial-by-trial "GA" judgments controlled for context-stimulus multifractality. Furthermore, simulated-speech contexts elicited faster responses (like tone contexts do) and weakened known biases in CfC, suggesting that spectral contrast depends on the nonlinear interactions across multiple scales that articulatory gestures express through the speech signal. Traditional mouse-tracking behaviors measured as participants moved their computer-mouse cursor to register their "GA"-or-"DA" decisions with mouse-clicks suggest that listening to speech leads the movement system to resonate with the multifractality of context stimuli. We interpret these results as shedding light on a new multifractal terrain upon which to found a better understanding in which movement systems play an important role in shaping how speech perception makes use of acoustic information.
Collapse
|
5
|
Icht M, Ben-David BM. Sibilant production in Hebrew-speaking adults: Apical versus laminal. Clin Linguist Phon 2017; 32:193-212. [PMID: 28727493 DOI: 10.1080/02699206.2017.1335780] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 05/24/2017] [Indexed: 06/07/2023]
Abstract
The Hebrew IPA charts describe the sibilants /s, z/ as 'alveolar fricatives', where the place of articulation on the palate is the alveolar ridge. The point of constriction on the tongue is not defined - apical (tip) or laminal (blade). Usually, speech and language pathologists (SLPs) use the apical placement in Hebrew articulation therapy. Some researchers and SLPs suggested that acceptable /s, z/ could be also produced with the laminal placement (i.e. the tip of the tongue approximating the lower incisors). The present study focused at the clinical level, attempting to determine the prevalence of these alternative points of constriction on the tongue for /s/ and /z/ in three different samples of Hebrew-speaking young adults (total n = 242), with typical articulation. Around 60% of the participants reported using the laminal position, regardless of several speaker-related variables (e.g. tongue-thrust swallowing, gender). Laminal production was more common in /s/ (than /z/), coda (than onset) position of the sibilant, mono- (than di-) syllabic words, and with non-alveolar (than alveolar) adjacent consonants. Experiment 3 revealed no acoustical differences between apical and laminal productions of /s/ and of /z/. From a clinical perspective, we wish to raise the awareness of SLPs to the prevalence of the two placements when treating Hebrew speakers, noting that tongue placements were highly correlated across sibilants. Finally, we recommend adopting a client-centred practice, where tongue placement is matched to the client. We further recommend selecting targets for intervention based on our findings, and separating between different prosodic positions in treatment.
Collapse
Affiliation(s)
- Michal Icht
- a Communication Disorders Department, Ariel University, Ariel, Israel
| | - Boaz M Ben-David
- b Communication, Aging and Neuropsychology Lab (CANlab) , Baruch Ivcher School of Psychology, Interdisciplinary Center (IDC) Herzliya , Herzliya , Israel
- c Department of Speech-Language Pathology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- d Toronto Rehabilitation Institute , University Health Network , Toronto, Ontario , Canada
| |
Collapse
|
6
|
|
7
|
Viswanathan N, Magnuson JS, Fowler CA. Information for coarticulation: Static signal properties or formant dynamics? J Exp Psychol Hum Percept Perform 2014; 40:1228-36. [PMID: 24730744 DOI: 10.1037/a0036214] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perception of a speech segment changes depending on properties of surrounding segments in a phenomenon called compensation for coarticulation (Mann, 1980). The nature of information that drives these perceptual changes is a matter of debate. One account attributes perceptual shifts to low-level auditory system contrast effects based on static portions of the signal (e.g., third formant [F3] center or average frequency; Lotto & Kluender, 1998). An alternative account is that listeners' perceptual shifts result from listeners attuning to the acoustic effects of gestural overlap and that this information for coarticulation is necessarily dynamic (Fowler, 2006). In a pair of experiments, we used sinewave speech precursors to investigate the nature of information for compensation for coarticulation. In Experiment 1, as expected by both accounts, we found that sinewave speech precursors produce shifts in following segments. In Experiment 2, we investigated whether effects in Experiment 1 were driven by static F3 offsets of sinewave speech precursors, or by dynamic relationships among their formants. We temporally reversed F1 and F2 in sinewave precursors, preserving static F3 offset and average F1, F2 and F3 frequencies, but disrupting dynamic formant relationships. Despite having identical F3s, selectively reversed precursors produced effects that were significantly smaller and restricted to only a small portion of the continuum. We conclude that dynamic formant relations rather than static properties of the precursor provide information for compensation for coarticulation.
Collapse
|
8
|
Winn MB, Rhone AE, Chatterjee M, Idsardi WJ. The use of auditory and visual context in speech perception by listeners with normal hearing and listeners with cochlear implants. Front Psychol 2013; 4:824. [PMID: 24204359 PMCID: PMC3817459 DOI: 10.3389/fpsyg.2013.00824] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 10/17/2013] [Indexed: 11/13/2022] Open
Abstract
There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing (NH) accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants (CIs) can do the same. In this study, listeners with NH and listeners with CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /∫/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with CIs not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e., faces) as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with NH. Spectrally-degraded stimuli heard by listeners with NH generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with CIs are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.
Collapse
Affiliation(s)
- Matthew B Winn
- Waisman Center & Department of Surgery, University of Wisconsin-Madison , Madison, WI, USA
| | | | | | | |
Collapse
|
9
|
Abstract
Inner speech is one of the most common, but least investigated, mental activities humans perform. It is an internal copy of one's external voice and so is similar to a well-established component of motor control: corollary discharge. Corollary discharge is a prediction of the sound of one's voice generated by the motor system. This prediction is normally used to filter self-caused sounds from perception, which segregates them from externally caused sounds and prevents the sensory confusion that would otherwise result. The similarity between inner speech and corollary discharge motivates the theory, tested here, that corollary discharge provides the sensory content of inner speech. The results reported here show that inner speech attenuates the impact of external sounds. This attenuation was measured using a context effect (an influence of contextual speech sounds on the perception of subsequent speech sounds), which weakens in the presence of speech imagery that matches the context sound. Results from a control experiment demonstrated this weakening in external speech as well. Such sensory attenuation is a hallmark of corollary discharge.
Collapse
Affiliation(s)
- Mark Scott
- Department of Linguistics, University of British Columbia
| |
Collapse
|
10
|
Viswanathan N, Magnuson JS, Fowler CA. Similar response patterns do not imply identical origins: an energetic masking account of nonspeech effects in compensation for coarticulation. J Exp Psychol Hum Percept Perform 2012; 39:1181-92. [PMID: 23148469 DOI: 10.1037/a0030735] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Nonspeech materials are widely used to identify basic mechanisms underlying speech perception. For instance, they have been used to examine the origin of compensation for coarticulation, the observation that listeners' categorization of phonetic segments depends on neighboring segments (Mann, 1980). Specifically, nonspeech precursors matched to critical formant frequencies of speech precursors have been shown to produce similar categorization shifts as speech contexts. This observation has been interpreted to mean that spectrally contrastive frequency relations between neighboring segments underlie the categorization shifts observed after speech, as well as nonspeech precursors (Lotto & Kluender, 1998). From the gestural perspective, however, categorization shifts in speech contexts occur because of listeners' sensitivity to acoustic information for coarticulatory gestural overlap in production; in nonspeech contexts, this occurs because of energetic masking of acoustic information for gestures. In 2 experiments, we distinguish the energetic masking and spectral contrast accounts. In Experiment 1, we investigated the effects of varying precursor tone frequency on speech categorization. Consistent only with the masking account, tonal effects were greater for frequencies close enough to those in the target syllables for masking to occur. In Experiment 2, we filtered the target stimuli to simulate effects of masking and obtained behavioral outcomes that closely resemble those with nonspeech tones. We conclude that masking provides the more plausible account of nonspeech context effects. More generally, we suggest that similar results from the use of speech and nonspeech materials do not automatically imply identical origins and that the use of nonspeech in speech studies entails careful examination of the nature of information in the nonspeech materials.
Collapse
Affiliation(s)
- Navin Viswanathan
- Department of Psychology, State University of New York, New Paltz, NY 12561-2440, USA.
| | | | | |
Collapse
|
11
|
Kingston J, Kawahara S, Mash D, Chambless D. Auditory contrast versus compensation for coarticulation: data from Japanese and English listeners. Lang Speech 2011; 54:499-525. [PMID: 22338789 DOI: 10.1177/0023830911404959] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
English listeners categorize more of a [k-t] continuum as "t" after [character: see text] than [s] (Mann & Repp, 1981). This bias could be due to compensation for coarticulation (Mann & Repp, 1981) or auditory contrast between the fricatives and the stops (Lotto & Kluender, 1998). In Japanese, surface [[character: see text]k, [character: see text] t, sk, st] clusters arise via palatalization and vowel devoicing from /sik, sit, suk, sut/, and acoustic vestiges of the devoiced vowels remain in the fricative. On the one hand, compensation for coarticulation with the devoiced vowel would cancel out compensation for coarticulation with the fricative, and listeners would not show any response bias. On the other hand, if the stop contrasts spectrally with the fricative, listeners should respond "t" more often after [[character: see text] i] than [s[character: see text]]. Experiment I establishes that [k] and [t] coarticulate with preceding voiced [i, u], voiceless [[character: see text], [character: see text]], and [[character: see text], s]. Experiment 2 shows that both Japanese and English listeners respond "t" more often after [characters: see text] than [s[character: see text]], as predicted by auditory contrast. English listeners' "t" responses also varied after voiced vowels, but those of Japanese listeners did not. Experiment 3 shows that this difference reflects differences in their phonetic experience.
Collapse
Affiliation(s)
- John Kingston
- Linguistics Department, University of Massachusetts, 150 Hicks Way, 226 South College, Amherst, MA 01003-9274, USA.
| | | | | | | |
Collapse
|
12
|
Mitterer H, Csépe V, Honbolygo F, Blomert L. The Recognition of Phonologically Assimilated Words Does Not Depend on Specific Language Experience. Cogn Sci 2010; 30:451-79. [DOI: 10.1207/s15516709cog0000_57] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
13
|
Huang J, Holt LL. General perceptual contributions to lexical tone normalization. J Acoust Soc Am 2009; 125:3983-94. [PMID: 19507980 PMCID: PMC2806435 DOI: 10.1121/1.3125342] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2008] [Revised: 04/01/2009] [Accepted: 04/05/2009] [Indexed: 05/27/2023]
Abstract
Within tone languages that use pitch variations to contrast meaning, large variability exists in the pitches produced by different speakers. Context-dependent perception may help to resolve this perceptual challenge. However, whether speakers rely on context in contour tone perception is unclear; previous studies have produced inconsistent results. The present study aimed to provide an unambiguous test of the effect of context on contour lexical tone perception and to explore its underlying mechanisms. In three experiments, Mandarin listeners' perception of Mandarin first and second (high-level and mid-rising) tones was investigated with preceding speech and non-speech contexts. Results indicate that the mean fundamental frequency (f0) of a preceding sentence affects perception of contour lexical tones and the effect is contrastive. Following a sentence with a higher-frequency mean f0, the following syllable is more likely to be perceived as a lower frequency lexical tone and vice versa. Moreover, non-speech precursors modeling the mean spectrum of f0 also elicit this effect, suggesting general perceptual processing rather than articulatory-based or speaker-identity-driven mechanisms.
Collapse
Affiliation(s)
- Jingyuan Huang
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
| | | |
Collapse
|
14
|
Viswanathan N, Fowler CA, Magnuson JS. A critical examination of the spectral contrast account of compensation for coarticulation. Psychon Bull Rev 2009; 16:74-9. [PMID: 19145013 DOI: 10.3758/PBR.16.1.74] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Vocal tract gestures for adjacent phones overlap temporally, rendering the acoustic speech signal highly context dependent. For example, following a segment with an anterior place of articulation, a posterior segment's place of articulation is pulled frontward, and listeners' category boundaries shift appropriately. Some theories assume that listeners perceptually attune or compensate for coarticulatory context. An alternative is that shifts result from spectral contrast. Indeed, shifts occur when speech precursors are replaced by pure tones, frequency matched to the formant offset at the assumed locus of contrast (Lotto & Kluender, 1998). However, tone analogues differ from natural formants in several ways, raising the possibility that conditions for contrast may not exist in natural speech. When we matched tones to natural formant intensities and trajectories, boundary shifts diminished. When we presented only the critical spectral region of natural speech tokens, no compensation was observed. These results suggest that conditions for spectral contrast do not exist in typical speech.
Collapse
|
15
|
Abstract
Adjacent speech, and even nonspeech, contexts influence phonetic categorization. Four experiments investigated how preceding sequences of sine-wave tones influence phonetic categorization. This experimental paradigm provides a means of investigating the statistical regularities of acoustic events that influence online speech categorization and, reciprocally, reveals regularities of the sound environment tracked by auditory processing. The tones comprising the sequences were drawn from distributions sampling different acoustic frequencies. Results indicate that whereas the mean of the distributions predicts contrastive shifts in speech categorization, variability of the distributions has little effect. Moreover, speech categorization is influenced by the global mean of the tone sequence, without significant influence of local statistical regularities within the tone sequence. Further arguing that the effect is strongly related to the average spectrum of the sequence, notched noise spectral complements of the tone sequences produce a complementary effect on speech categorization. Lastly, these effects are modulated by the number of tones in the acoustic history and the overall duration of the sequence, but not by the density with which the distribution defining the sequence is sampled. Results are discussed in light of stimulus-specific adaptation to statistical regularity in the acoustic input and a speculative link to talker normalization is postulated.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA.
| |
Collapse
|
16
|
Abstract
We investigated how spoken words are recognized when they have been altered by phonological assimilation. Previous research has shown that there is a process of perceptual compensation for phonological assimilations. Three recently formulated proposals regarding the mechanisms for compensation for assimilation make different predictions with regard to the level at which compensation is supposed to occur as well as regarding the role of specific language experience. In the present study, Hungarian words and nonwords, in which a viable and an unviable liquid assimilation was applied, were presented to Hungarian and Dutch listeners in an identification task and a discrimination task. Results indicate that viably changed forms are difficult to distinguish from canonical forms independent of experience with the assimilation rule applied in the utterances. This reveals that auditory processing contributes to perceptual compensation for assimilation, while language experience has only a minor role to play when identification is required.
Collapse
|
17
|
Abstract
This study examined whether compensation for coarticulation in fricative-vowel syllables is phonologically mediated or a consequence of auditory processes. Smits (2001a) had shown that compensation occurs for anticipatory lip rounding in a fricative caused by a following rounded vowel in Dutch. In a first experiment, the possibility that compensation is due to general auditory processing was investigated using nonspeech sounds. These did not cause context effects akin to compensation for coarticulation, although nonspeech sounds influenced speech sound identification in an integrative fashion. In a second experiment, a possible phonological basis for compensation for coarticulation was assessed by using audiovisual speech. Visual displays, which induced the perception of a rounded vowel, also influenced compensation for anticipatory lip rounding in the fricative. These results indicate that compensation for anticipatory lip rounding in fricative-vowel syllables is phonologically mediated. This result is discussed in the light of other compensation-for-coarticulation findings and general theories of speech perception.
Collapse
Affiliation(s)
- Holger Mitterer
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| |
Collapse
|
18
|
Abstract
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
| |
Collapse
|
19
|
Abstract
Reports of sex differences in language processing are inconsistent and are thought to vary by task type and difficulty. In two experiments, we investigated a sex difference in visual influence onheard speech (the McGurk effect). First, incongruent consonant-vowel stimuli were presented where the visual portion of the signal was brief (100 msec) or full (temporally equivalent to the auditory). Second, to determine whether men and women differed in their ability to extract visual speech information from these brief stimuli, the same stimuli were presented to new participants with an additional visual-only (lipread) condition. In both experiments, women showed a significantly greater visual influence on heard speech than did men for the brief visual stimuli. No sex differences for the full stimuli or in the ability to lipread were found. These findings indicate that the more challenging brief visual stimuli elicit sex differences in the processing of audiovisual speech.
Collapse
Affiliation(s)
- Julia R Irwin
- Haskins Laboratories, New Haven, Connecticut 06511, USA.
| | | | | |
Collapse
|
20
|
Abstract
This article reports three experiments designed to explore the basis for speech perceivers' apparent compensations for coarticulation. In the first experiment, the stimuli were members of three /da/-to-/ga/ continua hybridized from natural speech. The monosyllables had originally been produced in disyllables /ada/ and /aga/ to make Continuum 1, /alda/ and /alga/ (Continuum 2), and /arda/ and /arga/ (Continuum 3). Members of the second and third continua were influenced by carryover coarticulation from the preceding /l/ or /r/ context. Listeners showed compensation for this carryover coarticulation in the absence of the precursor /al/ or /ar/ syllables. This rules out an account in which compensation for coarticulation reflects a spectral contrast effect exerted by a precursor syllable, as previously has been proposed by Lotto, Holt, and colleagues (e.g., Lotto, Kluender, & Holt, 1997; Lotto & Kluender, 1998). The second experiment showed an enhancing effect of the endpoint monosyllables in Experiment 1 on identifications of preceding natural hybrids along an /al/-to-/ar/ continuum. That is, coarticulatory /l/ and /r/ information in /da/ and /ga/ syllables led to increased judgments of /l/ and /r/, respectively, in the precursor /al/-to-/ar/ continuum members. This was opposite to the effect, in Experiment 3, of /da/ and /ga/ syllables on preceding tones synthesized to range in frequency from approximately the ending F3 of /ar/ to the ending F3 of /al/. The enhancing, not contrastive, effect in Experiment 2, juxtaposed to the contrastive effect in Experiment 3, further disconfirms the spectral contrast account of compensation for coarticulation. A review of the literature buttresses that conclusion and provides strong support for an account that invokes listeners' attention to information in speech for the occurrence of gestural overlap.
Collapse
Affiliation(s)
- Carol A Fowler
- Haskins Laboratories, New Haven, Connecticut 06511, USA.
| |
Collapse
|
21
|
Abstract
On the basis of a review of the literature and three new experiments, Fowler (2006) concludes that a contrast account for phonetic context effects is not tenable and is inferior to a gestural account. We believe that this conclusion is premature and that it is based on a restricted set of assumptions about a general perceptual account. Here, we briefly address the criticisms of Fowler (2006), with the intent of clarifying what a general auditory and learning approach to speech perception entails.
Collapse
Affiliation(s)
- Andrew J Lotto
- Center for Perceptual Systems, University of Texas, Austin 78712, USA.
| | | |
Collapse
|
22
|
Abstract
Nonspeech stimuli influence phonetic categorization, but effects observed so far have been limited to precursors' influence on perception of following speech. However, both preceding and following speech affect phonetic categorization. This asymmetry raises questions about whether general auditory processes play a role in context-dependent speech perception. This study tested whether the asymmetry stems from methodological issues or genuine mechanistic limitations. To determine whether and how backward effects of nonspeech context on speech may occur, one experiment examined perception of CVC words with [ga]-[da] series onsets followed by one of two possible embedded tones and one of two possible final consonants. When the tone was separated from the target onset by 100 ms, contrastive effects of tone frequency similar to those of previous studies were observed; however, when the tone was moved closer to the target segment assimilative effects were observed. In another experiment, contrastive effects of a following tone were observed in both CVC words and CV nonwords, although the size of the effects depended on syllable structure. Results are discussed with respect to contrastive mechanisms not speech-specific but operating at a relatively high level, taking into account spectrotemporal patterns occurring over extended periods before and after target events.
Collapse
Affiliation(s)
- Travis Wade
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
| | | |
Collapse
|
23
|
Abstract
Fowler, Brown, and Mann (2000) have reported a visually moderated phonetic context effect in which a video disambiguates an acoustically ambiguous precursor syllable, which, in turn, influences perception of a subsequent syllable. In the present experiments, we explored this finding and the claims that stem from it. Experiment 1 failed to replicate Fowler et al. with novel materials modeled after the original study, but Experiment 2 successfully replicated the effect, using Fowler et al.'s stimulus materials. This discrepancy was investigated in Experiments 3 and 4, which demonstrate that variation in visual information concurrent with the test syllable is sufficient to account for the original results. Fowler et al.'s visually moderated phonetic context effect appears to have been a demonstration of audiovisual interaction between concurrent stimuli, and not an effect whereby preceding visual information elicits changes in the perception of subsequent speech sounds.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
| | | | | |
Collapse
|
24
|
Ramirez J, Mann V. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy. J Acoust Soc Am 2005; 118:1122-33. [PMID: 16158666 DOI: 10.1121/1.1940509] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Collapse
Affiliation(s)
- Joshua Ramirez
- Cognitive Science, University of California, Irvine, 3151 Social Science Plaza, Irvine, California 92697, USA
| | | |
Collapse
|
25
|
Abstract
Speech perception is an ecologically important example of the highly context-dependent nature of perception; adjacent speech, and even nonspeech, sounds influence how listeners categorize speech. Some theories emphasize linguistic or articulation-based processes in speech-elicited context effects and peripheral (cochlear) auditory perceptual interactions in non-speech-elicited context effects. The present studies challenge this division. Results of three experiments indicate that acoustic histories composed of sine-wave tones drawn from spectral distributions with different mean frequencies robustly affect speech categorization. These context effects were observed even when the acoustic context temporally adjacent to the speech stimulus was held constant and when more than a second of silence or multiple intervening sounds separated the nonlinguistic acoustic context and speech targets. These experiments indicate that speech categorization is sensitive to statistical distributions of spectral information, even if the distributions are composed of nonlinguistic elements. Acoustic context need be neither linguistic nor local to influence speech perception.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA.
| |
Collapse
|
26
|
Abstract
This chapter focuses on one of the first steps in comprehending spoken language: How do listeners extract the most fundamental linguistic elements-consonants and vowels, or the distinctive features which compose them-from the acoustic signal? We begin by describing three major theoretical perspectives on the perception of speech. Then we review several lines of research that are relevant to distinguishing these perspectives. The research topics surveyed include categorical perception, phonetic context effects, learning of speech and related nonspeech categories, and the relation between speech perception and production. Finally, we describe challenges facing each of the major theoretical perspectives on speech perception.
Collapse
Affiliation(s)
- Randy L Diehl
- Department of Psychology and Center for Perceptual Systems, University of Texas, Austin, Texas 78712-0187, USA.
| | | | | |
Collapse
|
27
|
Abstract
Phoneme identification with audiovisually discrepant stimuli is influenced hy information in the visual signal (the McGurk effect). Additionally, lexical status affects identification of auditorily presented phonemes. The present study tested for lexical influences on the McGurk effect. Participants identified phonemes in audiovisually discrepant stimuli in which lexical status of the auditory component and of a visually influenced percept was independently varied. Visually influenced (McGurk) responses were more frequent when they formed a word and when the auditory signal was a nonword (Experiment 1). Lexical effects were larger for slow than for fast responses (Experiment 2), as with auditory speech, and were replicated with stimuli matched on physical properties (Experiment 3). These results are consistent with models in which lexical processing of speech is modality independent.
Collapse
Affiliation(s)
- Lawrence Brancazio
- Department of Psychology, Southern Connecticut State University, New Haven, CT, USA.
| | | |
Collapse
|
28
|
|
29
|
Abstract
Previous studies using speech and nonspeech analogs have shown that auditory mechanisms which serve to enhance spectral contrast contribute to perception of coarticulated speech for which spectral properties assimilate over time. In order to better understand the nature of contrastive auditory processes, a series of CV syllables varying acoustically in F2-onset frequency and perceptually from /ba/ to /da/ was identified following a variety of spectra including three-peak renditions of [e] and [o], one-peak simulations of only F2, and spectral complements of these spectra for which peaks are replaced with troughs. Results for three-versus one-peak (or trough) precursor spectra were practically indistinguishable, suggesting that effects were spectrally local and not dependent upon perception of precursors as speech. Effects of complementary (trough) spectra had complementary effects on perception of following stops; however, effects for spectral complements were particularly dependent upon the interval between precursor and CV onsets. Results from these studies cannot be explained by simple masking or adaptation of suppression. Instead, they provide evidence for the existence of processes that selectively enhance contrast between onset spectra of neighboring sounds, and these processes are relevant for perception of connected speech.
Collapse
|
30
|
Abstract
The pronunciation of the same word may vary considerably as a consequence of its context. The Dutch word tuin (English, garden) may be pronounced tuim if followed by bank (English, bench), but not if followed by stoel (English, chair). In a series of four experiments, we examined how Dutch listeners cope with this context sensitivity in their native language. A first word identification experiment showed that the perception of a word-final nasal depends on the subsequent context. Viable assimilations, but not unviable assimilations, were often confused perceptually with canonical word forms in a word identification task. Two control experiments ruled out the possibility that this effect was caused by perceptual masking or was influenced by lexical top-down effects. A passive-listening study in which electrophysiological measurements were used showed that only unviable, but not viable, phonological changes elicited a significant mismatch negativity. The results indicate that phonological assimilations are dealt with by an early prelexical mechanism.
Collapse
|
31
|
Abstract
Previous work has demonstrated that the graded internal structure of phonetic categories is sensitive to a variety of contextual factors. One such factor is place of articulation: The best exemplars of voiceless stop consonants along auditory bilabial and velar voice onset time (VOT) continua occur over different ranges of VOTs (Volaitis & Miller, 1992). In the present study, we exploited the McGurk effect to examine whether visual information for place of articulation also shifts the best exemplar range for voiceless consonants, following Green and Kuhl's (1989) demonstration of effects of visual place of articulation on the location of voicing boundaries. In Experiment 1, we established that /p/ and /t/ have different best exemplar ranges along auditory bilabial and alveolar VOT continua. We then found, in Experiment 2, a similar shift in the best-exemplar range for /t/ relative to that for /p/ when there was a change in visual place of articulation, with auditory place of articulation held constant. These findings indicate that the perceptual mechanisms that determine internal phonetic category structure are sensitive to visual, as well as to auditory, information.
Collapse
|
32
|
|
33
|
|
34
|
Green KP, Norrix LW. Perception of /r/ and /l/ in a stop cluster: Evidence of cross-modal context effects. J Exp Psychol Hum Percept Perform 2001; 27:166-77. [DOI: 10.1037/0096-1523.27.1.166] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|