1
|
Zhu Y, Li C, Hendry C, Glass J, Canseco-Gonzalez E, Pitts MA, Dykstra AR. Isolating Neural Signatures of Conscious Speech Perception with a No-Report Sine-Wave Speech Paradigm. J Neurosci 2024; 44:e0145232023. [PMID: 38191569 PMCID: PMC10883607 DOI: 10.1523/jneurosci.0145-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 11/21/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open
Abstract
Identifying neural correlates of conscious perception is a fundamental endeavor of cognitive neuroscience. Most studies so far have focused on visual awareness along with trial-by-trial reports of task-relevant stimuli, which can confound neural measures of perceptual awareness with postperceptual processing. Here, we used a three-phase sine-wave speech paradigm that dissociated between conscious speech perception and task relevance while recording EEG in humans of both sexes. Compared with tokens perceived as noise, physically identical sine-wave speech tokens that were perceived as speech elicited a left-lateralized, near-vertex negativity, which we interpret as a phonological version of a perceptual awareness negativity. This response appeared between 200 and 300 ms after token onset and was not present for frequency-flipped control tokens that were never perceived as speech. In contrast, the P3b elicited by task-irrelevant tokens did not significantly differ when the tokens were perceived as speech versus noise and was only enhanced for tokens that were both perceived as speech and relevant to the task. Our results extend the findings from previous studies on visual awareness and speech perception and suggest that correlates of conscious perception, across types of conscious content, are most likely to be found in midlatency negative-going brain responses in content-specific sensory areas.
Collapse
Affiliation(s)
- Yunkai Zhu
- Department of Biomedical Engineering, University of Miami, Coral Gables, Florida 33143
| | - Charlotte Li
- Department of Psychology, Reed College, Portland, Oregon 97202
| | - Camille Hendry
- Department of Psychology, Reed College, Portland, Oregon 97202
| | - James Glass
- Department of Psychology, Reed College, Portland, Oregon 97202
| | | | - Michael A Pitts
- Department of Psychology, Reed College, Portland, Oregon 97202
| | - Andrew R Dykstra
- Department of Biomedical Engineering, University of Miami, Coral Gables, Florida 33143
| |
Collapse
|
2
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
3
|
Mai G, Wang WSY. Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing. Hum Brain Mapp 2023; 44:6149-6172. [PMID: 37818940 PMCID: PMC10619373 DOI: 10.1002/hbm.26503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 08/17/2023] [Accepted: 09/13/2023] [Indexed: 10/13/2023] Open
Abstract
The brain tracks and encodes multi-level speech features during spoken language processing. It is evident that this speech tracking is dominant at low frequencies (<8 Hz) including delta and theta bands. Recent research has demonstrated distinctions between delta- and theta-band tracking but has not elucidated how they differentially encode speech across linguistic levels. Here, we hypothesised that delta-band tracking encodes prediction errors (enhanced processing of unexpected features) while theta-band tracking encodes neural sharpening (enhanced processing of expected features) when people perceive speech with different linguistic contents. EEG responses were recorded when normal-hearing participants attended to continuous auditory stimuli that contained different phonological/morphological and semantic contents: (1) real-words, (2) pseudo-words and (3) time-reversed speech. We employed multivariate temporal response functions to measure EEG reconstruction accuracies in response to acoustic (spectrogram), phonetic and phonemic features with the partialling procedure that singles out unique contributions of individual features. We found higher delta-band accuracies for pseudo-words than real-words and time-reversed speech, especially during encoding of phonetic features. Notably, individual time-lag analyses showed that significantly higher accuracies for pseudo-words than real-words started at early processing stages for phonetic encoding (<100 ms post-feature) and later stages for acoustic and phonemic encoding (>200 and 400 ms post-feature, respectively). Theta-band accuracies, on the other hand, were higher when stimuli had richer linguistic content (real-words > pseudo-words > time-reversed speech). Such effects also started at early stages (<100 ms post-feature) during encoding of all individual features or when all features were combined. We argue these results indicate that delta-band tracking may play a role in predictive coding leading to greater tracking of pseudo-words due to the presence of unexpected/unpredicted semantic information, while theta-band tracking encodes sharpened signals caused by more expected phonological/morphological and semantic contents. Early presence of these effects reflects rapid computations of sharpening and prediction errors. Moreover, by measuring changes in EEG alpha power, we did not find evidence that the observed effects can be solitarily explained by attentional demands or listening efforts. Finally, we used directed information analyses to illustrate feedforward and feedback information transfers between prediction errors and sharpening across linguistic levels, showcasing how our results fit with the hierarchical Predictive Coding framework. Together, we suggest the distinct roles of delta and theta neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing.
Collapse
Affiliation(s)
- Guangting Mai
- Hearing Theme, National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham, UK
- Academic Unit of Mental Health and Clinical Neurosciences, School of Medicine, The University of Nottingham, Nottingham, UK
- Division of Psychology and Language Sciences, Faculty of Brain Sciences, University College London, London, UK
| | - William S-Y Wang
- Department of Chinese and Bilingual Studies, Hong Kong Polytechnic University, Hung Hom, Hong Kong
- Language Engineering Laboratory, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
4
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
5
|
Jiang J, Johnson JCS, Requena-Komuro MC, Benhamou E, Sivasathiaseelan H, Chokesuwattanaskul A, Nelson A, Nortley R, Weil RS, Volkmer A, Marshall CR, Bamiou DE, Warren JD, Hardy CJD. Comprehension of acoustically degraded speech in Alzheimer's disease and primary progressive aphasia. Brain 2023; 146:4065-4076. [PMID: 37184986 PMCID: PMC10545509 DOI: 10.1093/brain/awad163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/20/2023] [Accepted: 04/27/2023] [Indexed: 05/17/2023] Open
Abstract
Successful communication in daily life depends on accurate decoding of speech signals that are acoustically degraded by challenging listening conditions. This process presents the brain with a demanding computational task that is vulnerable to neurodegenerative pathologies. However, despite recent intense interest in the link between hearing impairment and dementia, comprehension of acoustically degraded speech in these diseases has been little studied. Here we addressed this issue in a cohort of 19 patients with typical Alzheimer's disease and 30 patients representing the three canonical syndromes of primary progressive aphasia (non-fluent/agrammatic variant primary progressive aphasia; semantic variant primary progressive aphasia; logopenic variant primary progressive aphasia), compared to 25 healthy age-matched controls. As a paradigm for the acoustically degraded speech signals of daily life, we used noise-vocoding: synthetic division of the speech signal into frequency channels constituted from amplitude-modulated white noise, such that fewer channels convey less spectrotemporal detail thereby reducing intelligibility. We investigated the impact of noise-vocoding on recognition of spoken three-digit numbers and used psychometric modelling to ascertain the threshold number of noise-vocoding channels required for 50% intelligibility by each participant. Associations of noise-vocoded speech intelligibility threshold with general demographic, clinical and neuropsychological characteristics and regional grey matter volume (defined by voxel-based morphometry of patients' brain images) were also assessed. Mean noise-vocoded speech intelligibility threshold was significantly higher in all patient groups than healthy controls, and significantly higher in Alzheimer's disease and logopenic variant primary progressive aphasia than semantic variant primary progressive aphasia (all P < 0.05). In a receiver operating characteristic analysis, vocoded intelligibility threshold discriminated Alzheimer's disease, non-fluent variant and logopenic variant primary progressive aphasia patients very well from healthy controls. Further, this central hearing measure correlated with overall disease severity but not with peripheral hearing or clear speech perception. Neuroanatomically, after correcting for multiple voxel-wise comparisons in predefined regions of interest, impaired noise-vocoded speech comprehension across syndromes was significantly associated (P < 0.05) with atrophy of left planum temporale, angular gyrus and anterior cingulate gyrus: a cortical network that has previously been widely implicated in processing degraded speech signals. Our findings suggest that the comprehension of acoustically altered speech captures an auditory brain process relevant to daily hearing and communication in major dementia syndromes, with novel diagnostic and therapeutic implications.
Collapse
Affiliation(s)
- Jessica Jiang
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Jeremy C S Johnson
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Maï-Carmen Requena-Komuro
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
- Kidney Cancer Program, UT Southwestern Medical Centre, Dallas, TX 75390, USA
| | - Elia Benhamou
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Harri Sivasathiaseelan
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Anthipa Chokesuwattanaskul
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
- Division of Neurology, Department of Internal Medicine, King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok 10330, Thailand
| | - Annabel Nelson
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Ross Nortley
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
- Wexham Park Hospital, Frimley Health NHS Foundation Trust, Slough SL2 4HL, UK
| | - Rimona S Weil
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Anna Volkmer
- Division of Psychology and Language Sciences, University College London, London WC1H 0AP, UK
| | - Charles R Marshall
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
| | - Doris-Eva Bamiou
- UCL Ear Institute and UCL/UCLH Biomedical Research Centre, National Institute of Health Research, University College London, London WC1X 8EE, UK
| | - Jason D Warren
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| | - Chris J D Hardy
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London WC1N 3AR, UK
| |
Collapse
|
6
|
Bernstein LE, Auer ET, Eberhardt SP. Modality-Specific Perceptual Learning of Vocoded Auditory versus Lipread Speech: Different Effects of Prior Information. Brain Sci 2023; 13:1008. [PMID: 37508940 PMCID: PMC10377548 DOI: 10.3390/brainsci13071008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/26/2023] [Accepted: 06/08/2023] [Indexed: 07/30/2023] Open
Abstract
Traditionally, speech perception training paradigms have not adequately taken into account the possibility that there may be modality-specific requirements for perceptual learning with auditory-only (AO) versus visual-only (VO) speech stimuli. The study reported here investigated the hypothesis that there are modality-specific differences in how prior information is used by normal-hearing participants during vocoded versus VO speech training. Two different experiments, one with vocoded AO speech (Experiment 1) and one with VO, lipread, speech (Experiment 2), investigated the effects of giving different types of prior information to trainees on each trial during training. The training was for four ~20 min sessions, during which participants learned to label novel visual images using novel spoken words. Participants were assigned to different types of prior information during training: Word Group trainees saw a printed version of each training word (e.g., "tethon"), and Consonant Group trainees saw only its consonants (e.g., "t_th_n"). Additional groups received no prior information (i.e., Experiment 1, AO Group; Experiment 2, VO Group) or a spoken version of the stimulus in a different modality from the training stimuli (Experiment 1, Lipread Group; Experiment 2, Vocoder Group). That is, in each experiment, there was a group that received prior information in the modality of the training stimuli from the other experiment. In both experiments, the Word Groups had difficulty retaining the novel words they attempted to learn during training. However, when the training stimuli were vocoded, the Word Group improved their phoneme identification. When the training stimuli were visual speech, the Consonant Group improved their phoneme identification and their open-set sentence lipreading. The results are considered in light of theoretical accounts of perceptual learning in relationship to perceptual modality.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Speech, Language, and Hearing Sciences Department, George Washington University, Washington, DC 20052, USA
| | - Edward T Auer
- Speech, Language, and Hearing Sciences Department, George Washington University, Washington, DC 20052, USA
| | - Silvio P Eberhardt
- Speech, Language, and Hearing Sciences Department, George Washington University, Washington, DC 20052, USA
| |
Collapse
|
7
|
Cope TE, Sohoglu E, Peterson KA, Jones PS, Rua C, Passamonti L, Sedley W, Post B, Coebergh J, Butler CR, Garrard P, Abdel-Aziz K, Husain M, Griffiths TD, Patterson K, Davis MH, Rowe JB. Temporal lobe perceptual predictions for speech are instantiated in motor cortex and reconciled by inferior frontal cortex. Cell Rep 2023; 42:112422. [PMID: 37099422 DOI: 10.1016/j.celrep.2023.112422] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 12/23/2022] [Accepted: 04/05/2023] [Indexed: 04/27/2023] Open
Abstract
Humans use predictions to improve speech perception, especially in noisy environments. Here we use 7-T functional MRI (fMRI) to decode brain representations of written phonological predictions and degraded speech signals in healthy humans and people with selective frontal neurodegeneration (non-fluent variant primary progressive aphasia [nfvPPA]). Multivariate analyses of item-specific patterns of neural activation indicate dissimilar representations of verified and violated predictions in left inferior frontal gyrus, suggestive of processing by distinct neural populations. In contrast, precentral gyrus represents a combination of phonological information and weighted prediction error. In the presence of intact temporal cortex, frontal neurodegeneration results in inflexible predictions. This manifests neurally as a failure to suppress incorrect predictions in anterior superior temporal gyrus and reduced stability of phonological representations in precentral gyrus. We propose a tripartite speech perception network in which inferior frontal gyrus supports prediction reconciliation in echoic memory, and precentral gyrus invokes a motor model to instantiate and refine perceptual predictions for speech.
Collapse
Affiliation(s)
- Thomas E Cope
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Cambridge University Hospitals NHS Trust, Cambridge CB2 0QQ, UK.
| | - Ediz Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; School of Psychology, University of Sussex, Brighton BN1 9RH, UK
| | - Katie A Peterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Department of Radiology, University of Cambridge, Cambridge CB2 0QQ, UK
| | - P Simon Jones
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - Catarina Rua
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - Luca Passamonti
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - William Sedley
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Brechtje Post
- Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge CB3 9DA, UK
| | - Jan Coebergh
- Ashford and St Peter's Hospital, Ashford TW15 3AA, UK; St George's Hospital, London SW17 0QT, UK
| | - Christopher R Butler
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK; Faculty of Medicine, Department of Brain Sciences, Imperial College London, London W12 0NN, UK
| | - Peter Garrard
- St George's Hospital, London SW17 0QT, UK; Molecular and Clinical Sciences Research Institute, St. George's, University of London, London SW17 0RE, UK
| | - Khaled Abdel-Aziz
- Ashford and St Peter's Hospital, Ashford TW15 3AA, UK; St George's Hospital, London SW17 0QT, UK
| | - Masud Husain
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK
| | - Timothy D Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Karalyn Patterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK
| | - James B Rowe
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Cambridge University Hospitals NHS Trust, Cambridge CB2 0QQ, UK
| |
Collapse
|
8
|
Rimmele JM, Sun Y, Michalareas G, Ghitza O, Poeppel D. Dynamics of Functional Networks for Syllable and Word-Level Processing. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:120-144. [PMID: 37229144 PMCID: PMC10205074 DOI: 10.1162/nol_a_00089] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 11/07/2022] [Indexed: 05/27/2023]
Abstract
Speech comprehension requires the ability to temporally segment the acoustic input for higher-level linguistic analysis. Oscillation-based approaches suggest that low-frequency auditory cortex oscillations track syllable-sized acoustic information and therefore emphasize the relevance of syllabic-level acoustic processing for speech segmentation. How syllabic processing interacts with higher levels of speech processing, beyond segmentation, including the anatomical and neurophysiological characteristics of the networks involved, is debated. In two MEG experiments, we investigate lexical and sublexical word-level processing and the interactions with (acoustic) syllable processing using a frequency-tagging paradigm. Participants listened to disyllabic words presented at a rate of 4 syllables/s. Lexical content (native language), sublexical syllable-to-syllable transitions (foreign language), or mere syllabic information (pseudo-words) were presented. Two conjectures were evaluated: (i) syllable-to-syllable transitions contribute to word-level processing; and (ii) processing of words activates brain areas that interact with acoustic syllable processing. We show that syllable-to-syllable transition information compared to mere syllable information, activated a bilateral superior, middle temporal and inferior frontal network. Lexical content resulted, additionally, in increased neural activity. Evidence for an interaction of word- and acoustic syllable-level processing was inconclusive. Decreases in syllable tracking (cerebroacoustic coherence) in auditory cortex and increases in cross-frequency coupling between right superior and middle temporal and frontal areas were found when lexical content was present compared to all other conditions; however, not when conditions were compared separately. The data provide experimental insight into how subtle and sensitive syllable-to-syllable transition information for word-level processing is.
Collapse
Affiliation(s)
- Johanna M. Rimmele
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Max Planck NYU Center for Language, Music and Emotion, Frankfurt am Main, Germany; New York, NY, USA
| | - Yue Sun
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Georgios Michalareas
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Oded Ghitza
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- College of Biomedical Engineering & Hearing Research Center, Boston University, Boston, MA, USA
| | - David Poeppel
- Departments of Neuroscience and Cognitive Neuropsychology, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology and Center for Neural Science, New York University, New York, NY, USA
- Max Planck NYU Center for Language, Music and Emotion, Frankfurt am Main, Germany; New York, NY, USA
- Ernst Strüngmann Institute for Neuroscience, Frankfurt am Main, Germany
| |
Collapse
|
9
|
Reuter T, Mazzei C, Lew-Williams C, Emberson L. Infants' lexical comprehension and lexical anticipation abilities are closely linked in early language development. INFANCY 2023; 28:532-549. [PMID: 36808682 DOI: 10.1111/infa.12534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 11/23/2022] [Accepted: 01/15/2023] [Indexed: 02/22/2023]
Abstract
Theories across cognitive domains propose that anticipating upcoming sensory input supports information processing. In line with this view, prior findings indicate that adults and children anticipate upcoming words during real-time language processing, via such processes as prediction and priming. However, it is unclear if anticipatory processes are strictly an outcome of prior language development or are more entwined with language learning and development. We operationalized this theoretical question as whether developmental emergence of comprehension of lexical items occurs before or concurrently with the anticipation of these lexical items. To this end, we tested infants of ages 12, 15, 18, and 24 months (N = 67) on their abilities to comprehend and anticipate familiar nouns. In an eye-tracking task, infants viewed pairs of images and heard sentences with either informative words (e.g., eat) that allowed them to anticipate an upcoming noun (e.g., cookie), or uninformative words (e.g., see). Findings indicated that infants' comprehension and anticipation abilities are closely linked over developmental time and within individuals. Importantly, we do not find evidence for lexical comprehension in the absence of lexical anticipation. Thus, anticipatory processes are present early in infants' second year, suggesting they are a part of language development rather than solely an outcome of it.
Collapse
Affiliation(s)
- Tracy Reuter
- Department of Psychology, Princeton University, New Jersey, Princeton, USA
| | - Carolyn Mazzei
- Department of Psychology, Princeton University, New Jersey, Princeton, USA.,Faculty of Education, Cambridge University, Cambridge, UK
| | - Casey Lew-Williams
- Department of Psychology, Princeton University, New Jersey, Princeton, USA
| | - Lauren Emberson
- Department of Psychology, Princeton University, New Jersey, Princeton, USA.,Psychology Department, University of British Columbia, British Columbia, Vancouver, Canada
| |
Collapse
|
10
|
Benetti S, Ferrari A, Pavani F. Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscience. Front Hum Neurosci 2023; 17:1108354. [PMID: 36816496 PMCID: PMC9932987 DOI: 10.3389/fnhum.2023.1108354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 01/11/2023] [Indexed: 02/05/2023] Open
Abstract
In face-to-face communication, humans are faced with multiple layers of discontinuous multimodal signals, such as head, face, hand gestures, speech and non-speech sounds, which need to be interpreted as coherent and unified communicative actions. This implies a fundamental computational challenge: optimally binding only signals belonging to the same communicative action while segregating signals that are not connected by the communicative content. How do we achieve such an extraordinary feat, reliably, and efficiently? To address this question, we need to further move the study of human communication beyond speech-centred perspectives and promote a multimodal approach combined with interdisciplinary cooperation. Accordingly, we seek to reconcile two explanatory frameworks recently proposed in psycholinguistics and sensory neuroscience into a neurocognitive model of multimodal face-to-face communication. First, we introduce a psycholinguistic framework that characterises face-to-face communication at three parallel processing levels: multiplex signals, multimodal gestalts and multilevel predictions. Second, we consider the recent proposal of a lateral neural visual pathway specifically dedicated to the dynamic aspects of social perception and reconceive it from a multimodal perspective ("lateral processing pathway"). Third, we reconcile the two frameworks into a neurocognitive model that proposes how multiplex signals, multimodal gestalts, and multilevel predictions may be implemented along the lateral processing pathway. Finally, we advocate a multimodal and multidisciplinary research approach, combining state-of-the-art imaging techniques, computational modelling and artificial intelligence for future empirical testing of our model.
Collapse
Affiliation(s)
- Stefania Benetti
- Centre for Mind/Brain Sciences, University of Trento, Trento, Italy,Interuniversity Research Centre “Cognition, Language, and Deafness”, CIRCLeS, Catania, Italy,*Correspondence: Stefania Benetti,
| | - Ambra Ferrari
- Max Planck Institute for Psycholinguistics, Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Francesco Pavani
- Centre for Mind/Brain Sciences, University of Trento, Trento, Italy,Interuniversity Research Centre “Cognition, Language, and Deafness”, CIRCLeS, Catania, Italy
| |
Collapse
|
11
|
Zoefel B, Gilbert RA, Davis MH. Intelligibility improves perception of timing changes in speech. PLoS One 2023; 18:e0279024. [PMID: 36634109 PMCID: PMC9836318 DOI: 10.1371/journal.pone.0279024] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 11/28/2022] [Indexed: 01/13/2023] Open
Abstract
Auditory rhythms are ubiquitous in music, speech, and other everyday sounds. Yet, it is unclear how perceived rhythms arise from the repeating structure of sounds. For speech, it is unclear whether rhythm is solely derived from acoustic properties (e.g., rapid amplitude changes), or if it is also influenced by the linguistic units (syllables, words, etc.) that listeners extract from intelligible speech. Here, we present three experiments in which participants were asked to detect an irregularity in rhythmically spoken speech sequences. In each experiment, we reduce the number of possible stimulus properties that differ between intelligible and unintelligible speech sounds and show that these acoustically-matched intelligibility conditions nonetheless lead to differences in rhythm perception. In Experiment 1, we replicate a previous study showing that rhythm perception is improved for intelligible (16-channel vocoded) as compared to unintelligible (1-channel vocoded) speech-despite near-identical broadband amplitude modulations. In Experiment 2, we use spectrally-rotated 16-channel speech to show the effect of intelligibility cannot be explained by differences in spectral complexity. In Experiment 3, we compare rhythm perception for sine-wave speech signals when they are heard as non-speech (for naïve listeners), and subsequent to training, when identical sounds are perceived as speech. In all cases, detection of rhythmic regularity is enhanced when participants perceive the stimulus as speech compared to when they do not. Together, these findings demonstrate that intelligibility enhances the perception of timing changes in speech, which is hence linked to processes that extract abstract linguistic units from sound.
Collapse
Affiliation(s)
- Benedikt Zoefel
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre National de la Recherche Scientifique (CNRS), Centre de Recherche Cerveau et Cognition (CerCo), Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Rebecca A. Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
12
|
Wang H, Chen R, Yan Y, McGettigan C, Rosen S, Adank P. Perceptual Learning of Noise-Vocoded Speech Under Divided Attention. Trends Hear 2023; 27:23312165231192297. [PMID: 37547940 PMCID: PMC10408355 DOI: 10.1177/23312165231192297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Speech perception performance for degraded speech can improve with practice or exposure. Such perceptual learning is thought to be reliant on attention and theoretical accounts like the predictive coding framework suggest a key role for attention in supporting learning. However, it is unclear whether speech perceptual learning requires undivided attention. We evaluated the role of divided attention in speech perceptual learning in two online experiments (N = 336). Experiment 1 tested the reliance of perceptual learning on undivided attention. Participants completed a speech recognition task where they repeated forty noise-vocoded sentences in a between-group design. Participants performed the speech task alone or concurrently with a domain-general visual task (dual task) at one of three difficulty levels. We observed perceptual learning under divided attention for all four groups, moderated by dual-task difficulty. Listeners in easy and intermediate visual conditions improved as much as the single-task group. Those who completed the most challenging visual task showed faster learning and achieved similar ending performance compared to the single-task group. Experiment 2 tested whether learning relies on domain-specific or domain-general processes. Participants completed a single speech task or performed this task together with a dual task aiming to recruit domain-specific (lexical or phonological), or domain-general (visual) processes. All secondary task conditions produced patterns and amount of learning comparable to the single speech task. Our results demonstrate that the impact of divided attention on perceptual learning is not strictly dependent on domain-general or domain-specific processes and speech perceptual learning persists under divided attention.
Collapse
Affiliation(s)
- Han Wang
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Rongru Chen
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Yu Yan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Stuart Rosen
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Patti Adank
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
13
|
Murai SA, Riquimaroux H. Long-term changes in cortical representation through perceptual learning of spectrally degraded speech. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:163-172. [PMID: 36464716 DOI: 10.1007/s00359-022-01593-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 12/07/2022]
Abstract
Listeners can adapt to acoustically degraded speech with perceptual training. The learning processes for long periods underlies the rehabilitation of patients with hearing aids or cochlear implants. Perceptual learning of acoustically degraded speech has been associated with the frontotemporal cortices. However, neural processes during and after long-term perceptual learning remain unclear. Here we conducted perceptual training of noise-vocoded speech sounds (NVSS), which is spectrally degraded signals, and measured the cortical activity for seven days and the follow up testing (approximately 1 year later) to investigate changes in neural activation patterns using functional magnetic resonance imaging. We demonstrated that young adult participants (n = 5) improved their performance across seven experimental days, and the gains were maintained after 10 months or more. Representational similarity analysis showed that the neural activation patterns of NVSS relative to clear speech in the left posterior superior temporal sulcus (pSTS) were significantly different across seven training days, accompanying neural changes in frontal cortices. In addition, the distinct activation patterns to NVSS in the frontotemporal cortices were also observed 10-13 months after the training. We, therefore, propose that perceptual training can induce plastic changes and long-term effects on neural representations of the trained degraded speech in the frontotemporal cortices. These behavioral improvements and neural changes induced by the perceptual learning of degraded speech will provide insights into cortical mechanisms underlying adaptive processes in difficult listening situations and long-term rehabilitation of auditory disorders.
Collapse
Affiliation(s)
- Shota A Murai
- Faculty of Life and Medical Sciences, Doshisha University, 1-3 Miyakodani, Tatara, Kyotanabe, Kyoto, 610-0321, Japan.,International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroshi Riquimaroux
- Faculty of Life and Medical Sciences, Doshisha University, 1-3 Miyakodani, Tatara, Kyotanabe, Kyoto, 610-0321, Japan.
| |
Collapse
|
14
|
MacGregor LJ, Gilbert RA, Balewski Z, Mitchell DJ, Erzinçlioğlu SW, Rodd JM, Duncan J, Fedorenko E, Davis MH. Causal Contributions of the Domain-General (Multiple Demand) and the Language-Selective Brain Networks to Perceptual and Semantic Challenges in Speech Comprehension. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2022; 3:665-698. [PMID: 36742011 PMCID: PMC9893226 DOI: 10.1162/nol_a_00081] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 09/07/2022] [Indexed: 06/18/2023]
Abstract
Listening to spoken language engages domain-general multiple demand (MD; frontoparietal) regions of the human brain, in addition to domain-selective (frontotemporal) language regions, particularly when comprehension is challenging. However, there is limited evidence that the MD network makes a functional contribution to core aspects of understanding language. In a behavioural study of volunteers (n = 19) with chronic brain lesions, but without aphasia, we assessed the causal role of these networks in perceiving, comprehending, and adapting to spoken sentences made more challenging by acoustic-degradation or lexico-semantic ambiguity. We measured perception of and adaptation to acoustically degraded (noise-vocoded) sentences with a word report task before and after training. Participants with greater damage to MD but not language regions required more vocoder channels to achieve 50% word report, indicating impaired perception. Perception improved following training, reflecting adaptation to acoustic degradation, but adaptation was unrelated to lesion location or extent. Comprehension of spoken sentences with semantically ambiguous words was measured with a sentence coherence judgement task. Accuracy was high and unaffected by lesion location or extent. Adaptation to semantic ambiguity was measured in a subsequent word association task, which showed that availability of lower-frequency meanings of ambiguous words increased following their comprehension (word-meaning priming). Word-meaning priming was reduced for participants with greater damage to language but not MD regions. Language and MD networks make dissociable contributions to challenging speech comprehension: Using recent experience to update word meaning preferences depends on language-selective regions, whereas the domain-general MD network plays a causal role in reporting words from degraded speech.
Collapse
Affiliation(s)
- Lucy J. MacGregor
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Rebecca A. Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Zuzanna Balewski
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA
| | - Daniel J. Mitchell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | | | - Jennifer M. Rodd
- Psychology and Language Sciences, University College London, London, UK
| | - John Duncan
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
15
|
Lanzilotti C, Andéol G, Micheyl C, Scannella S. Cocktail party training induces increased speech intelligibility and decreased cortical activity in bilateral inferior frontal gyri. A functional near-infrared study. PLoS One 2022; 17:e0277801. [PMID: 36454948 PMCID: PMC9714910 DOI: 10.1371/journal.pone.0277801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 11/03/2022] [Indexed: 12/03/2022] Open
Abstract
The human brain networks responsible for selectively listening to a voice amid other talkers remain to be clarified. The present study aimed to investigate relationships between cortical activity and performance in a speech-in-speech task, before (Experiment I) and after training-induced improvements (Experiment II). In Experiment I, 74 participants performed a speech-in-speech task while their cortical activity was measured using a functional near infrared spectroscopy (fNIRS) device. One target talker and one masker talker were simultaneously presented at three different target-to-masker ratios (TMRs): adverse, intermediate and favorable. Behavioral results show that performance may increase monotonically with TMR in some participants and failed to decrease, or even improved, in the adverse-TMR condition for others. On the neural level, an extensive brain network including the frontal (left prefrontal cortex, right dorsolateral prefrontal cortex and bilateral inferior frontal gyri) and temporal (bilateral auditory cortex) regions was more solicited by the intermediate condition than the two others. Additionally, bilateral frontal gyri and left auditory cortex activities were found to be positively correlated with behavioral performance in the adverse-TMR condition. In Experiment II, 27 participants, whose performance was the poorest in the adverse-TMR condition of Experiment I, were trained to improve performance in that condition. Results show significant performance improvements along with decreased activity in bilateral inferior frontal gyri, the right dorsolateral prefrontal cortex, the left inferior parietal cortex and the right auditory cortex in the adverse-TMR condition after training. Arguably, lower neural activity reflects higher efficiency in processing masker inhibition after speech-in-speech training. As speech-in-noise tasks also imply frontal and temporal regions, we suggest that regardless of the type of masking (speech or noise) the complexity of the task will prompt the implication of a similar brain network. Furthermore, the initial significant cognitive recruitment will be reduced following a training leading to an economy of cognitive resources.
Collapse
Affiliation(s)
- Cosima Lanzilotti
- Département Neuroscience et Sciences Cognitives, Institut de Recherche Biomédicale des Armées, Brétigny sur Orge, France
- ISAE-SUPAERO, Université de Toulouse, Toulouse, France
- Thales SIX GTS France, Gennevilliers, France
| | - Guillaume Andéol
- Département Neuroscience et Sciences Cognitives, Institut de Recherche Biomédicale des Armées, Brétigny sur Orge, France
| | | | | |
Collapse
|
16
|
Schwarz J, Li KK, Sim JH, Zhang Y, Buchanan-Worster E, Post B, Gibson JL, McDougall K. Semantic Cues Modulate Children’s and Adults’ Processing of Audio-Visual Face Mask Speech. Front Psychol 2022; 13:879156. [PMID: 35928422 PMCID: PMC9343587 DOI: 10.3389/fpsyg.2022.879156] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 05/25/2022] [Indexed: 12/03/2022] Open
Abstract
During the COVID-19 pandemic, questions have been raised about the impact of face masks on communication in classroom settings. However, it is unclear to what extent visual obstruction of the speaker’s mouth or changes to the acoustic signal lead to speech processing difficulties, and whether these effects can be mitigated by semantic predictability, i.e., the availability of contextual information. The present study investigated the acoustic and visual effects of face masks on speech intelligibility and processing speed under varying semantic predictability. Twenty-six children (aged 8-12) and twenty-six adults performed an internet-based cued shadowing task, in which they had to repeat aloud the last word of sentences presented in audio-visual format. The results showed that children and adults made more mistakes and responded more slowly when listening to face mask speech compared to speech produced without a face mask. Adults were only significantly affected by face mask speech when both the acoustic and the visual signal were degraded. While acoustic mask effects were similar for children, removal of visual speech cues through the face mask affected children to a lesser degree. However, high semantic predictability reduced audio-visual mask effects, leading to full compensation of the acoustically degraded mask speech in the adult group. Even though children did not fully compensate for face mask speech with high semantic predictability, overall, they still profited from semantic cues in all conditions. Therefore, in classroom settings, strategies that increase contextual information such as building on students’ prior knowledge, using keywords, and providing visual aids, are likely to help overcome any adverse face mask effects.
Collapse
Affiliation(s)
- Julia Schwarz
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Julia Schwarz,
| | - Katrina Kechun Li
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- Katrina Kechun Li,
| | - Jasper Hong Sim
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Yixin Zhang
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Elizabeth Buchanan-Worster
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Brechtje Post
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | | | - Kirsty McDougall
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
17
|
Mankel K, Shrestha U, Tipirneni-Sajja A, Bidelman GM. Functional Plasticity Coupled With Structural Predispositions in Auditory Cortex Shape Successful Music Category Learning. Front Neurosci 2022; 16:897239. [PMID: 35837119 PMCID: PMC9274125 DOI: 10.3389/fnins.2022.897239] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Categorizing sounds into meaningful groups helps listeners more efficiently process the auditory scene and is a foundational skill for speech perception and language development. Yet, how auditory categories develop in the brain through learning, particularly for non-speech sounds (e.g., music), is not well understood. Here, we asked musically naïve listeners to complete a brief (∼20 min) training session where they learned to identify sounds from a musical interval continuum (minor-major 3rds). We used multichannel EEG to track behaviorally relevant neuroplastic changes in the auditory event-related potentials (ERPs) pre- to post-training. To rule out mere exposure-induced changes, neural effects were evaluated against a control group of 14 non-musicians who did not undergo training. We also compared individual categorization performance with structural volumetrics of bilateral Heschl's gyrus (HG) from MRI to evaluate neuroanatomical substrates of learning. Behavioral performance revealed steeper (i.e., more categorical) identification functions in the posttest that correlated with better training accuracy. At the neural level, improvement in learners' behavioral identification was characterized by smaller P2 amplitudes at posttest, particularly over right hemisphere. Critically, learning-related changes in the ERPs were not observed in control listeners, ruling out mere exposure effects. Learners also showed smaller and thinner HG bilaterally, indicating superior categorization was associated with structural differences in primary auditory brain regions. Collectively, our data suggest successful auditory categorical learning of music sounds is characterized by short-term functional changes (i.e., greater post-training efficiency) in sensory coding processes superimposed on preexisting structural differences in bilateral auditory cortex.
Collapse
Affiliation(s)
- Kelsey Mankel
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
| | - Utsav Shrestha
- Department of Biomedical Engineering, University of Memphis, Memphis, TN, United States
| | | | - Gavin M. Bidelman
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
| |
Collapse
|
18
|
Hauswald A, Keitel A, Chen Y, Rösch S, Weisz N. Degradation levels of continuous speech affect neural speech tracking and alpha power differently. Eur J Neurosci 2022; 55:3288-3302. [PMID: 32687616 PMCID: PMC9540197 DOI: 10.1111/ejn.14912] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 07/12/2020] [Accepted: 07/13/2020] [Indexed: 11/26/2022]
Abstract
Making sense of a poor auditory signal can pose a challenge. Previous attempts to quantify speech intelligibility in neural terms have usually focused on one of two measures, namely low-frequency speech-brain synchronization or alpha power modulations. However, reports have been mixed concerning the modulation of these measures, an issue aggravated by the fact that they have normally been studied separately. We present two MEG studies analyzing both measures. In study 1, participants listened to unimodal auditory speech with three different levels of degradation (original, 7-channel and 3-channel vocoding). Intelligibility declined with declining clarity, but speech was still intelligible to some extent even for the lowest clarity level (3-channel vocoding). Low-frequency (1-7 Hz) speech tracking suggested a U-shaped relationship with strongest effects for the medium-degraded speech (7-channel) in bilateral auditory and left frontal regions. To follow up on this finding, we implemented three additional vocoding levels (5-channel, 2-channel and 1-channel) in a second MEG study. Using this wider range of degradation, the speech-brain synchronization showed a similar pattern as in study 1, but further showed that when speech becomes unintelligible, synchronization declines again. The relationship differed for alpha power, which continued to decrease across vocoding levels reaching a floor effect for 5-channel vocoding. Predicting subjective intelligibility based on models either combining both measures or each measure alone showed superiority of the combined model. Our findings underline that speech tracking and alpha power are modified differently by the degree of degradation of continuous speech but together contribute to the subjective speech understanding.
Collapse
Affiliation(s)
- Anne Hauswald
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Anne Keitel
- Psychology, School of Social SciencesUniversity of DundeeDundeeUK
- Centre for Cognitive NeuroimagingUniversity of GlasgowGlasgowUK
| | - Ya‐Ping Chen
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Sebastian Rösch
- Department of OtorhinolaryngologyParacelsus Medical UniversitySalzburgAustria
| | - Nathan Weisz
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| |
Collapse
|
19
|
Distracting Linguistic Information Impairs Neural Tracking of Attended Speech. CURRENT RESEARCH IN NEUROBIOLOGY 2022; 3:100043. [DOI: 10.1016/j.crneur.2022.100043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/27/2022] [Accepted: 05/24/2022] [Indexed: 11/20/2022] Open
|
20
|
Cooke M, Scharenborg O, Meyer BT. The time course of adaptation to distorted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:2636. [PMID: 35461479 DOI: 10.1121/10.0010235] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 03/25/2022] [Indexed: 06/14/2023]
Abstract
When confronted with unfamiliar or novel forms of speech, listeners' word recognition performance is known to improve with exposure, but data are lacking on the fine-grained time course of adaptation. The current study aims to fill this gap by investigating the time course of adaptation to several different types of distorted speech. Keyword scores as a function of sentence position in a block of 30 sentences were measured in response to eight forms of distorted speech. Listeners recognised twice as many words in the final sentence compared to the initial sentence with around half of the gain appearing in the first three sentences, followed by gradual gains over the rest of the block. Rapid adaptation was apparent for most of the eight distortion types tested with differences mainly in the gradual phase. Adaptation to sine-wave speech improved if listeners had heard other types of distortion prior to exposure, but no similar facilitation occurred for the other types of distortion. Rapid adaptation is unlikely to be due to procedural learning since listeners had been familiarised with the task and sentence format through exposure to undistorted speech. The mechanisms that underlie rapid adaptation are currently unclear.
Collapse
Affiliation(s)
- Martin Cooke
- Ikerbasque (Basque Science Foundation), Bilbao, Spain
| | | | - Bernd T Meyer
- Communication Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University, Oldenburg, Germany
| |
Collapse
|
21
|
Corcoran AW, Perera R, Koroma M, Kouider S, Hohwy J, Andrillon T. Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech. Cereb Cortex 2022; 33:691-708. [PMID: 35253871 PMCID: PMC9890472 DOI: 10.1093/cercor/bhac094] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 02/04/2023] Open
Abstract
Online speech processing imposes significant computational demands on the listening brain, the underlying mechanisms of which remain poorly understood. Here, we exploit the perceptual "pop-out" phenomenon (i.e. the dramatic improvement of speech intelligibility after receiving information about speech content) to investigate the neurophysiological effects of prior expectations on degraded speech comprehension. We recorded electroencephalography (EEG) and pupillometry from 21 adults while they rated the clarity of noise-vocoded and sine-wave synthesized sentences. Pop-out was reliably elicited following visual presentation of the corresponding written sentence, but not following incongruent or neutral text. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that improvements in perceptual clarity were mediated via top-down signals that enhanced the quality of cortical speech representations. Spectral analysis further revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta-band power, alpha-band power, and pupil diameter were all increased following the provision of any written sentence information, irrespective of content. Together, these findings reveal distinctive profiles of neurophysiological activity that differentiate the content-specific processes associated with degraded speech comprehension from the context-specific processes invoked under adverse listening conditions.
Collapse
Affiliation(s)
- Andrew W Corcoran
- Corresponding author: Room E672, 20 Chancellors Walk, Clayton, VIC 3800, Australia.
| | - Ricardo Perera
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Matthieu Koroma
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Sid Kouider
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Jakob Hohwy
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia,Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Thomas Andrillon
- Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia,Paris Brain Institute, Sorbonne Université, Inserm-CNRS, Paris 75013, France
| |
Collapse
|
22
|
Hidalgo C, Mohamed I, Zielinski C, Schön D. The effect of speech degradation on the ability to track and predict turn structure in conversation. Cortex 2022; 151:105-115. [DOI: 10.1016/j.cortex.2022.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/15/2021] [Accepted: 01/20/2022] [Indexed: 11/03/2022]
|
23
|
Alderson-Day B, Moffatt J, Lima CF, Krishnan S, Fernyhough C, Scott SK, Denton S, Leong IYT, Oncel AD, Wu YL, Gurbuz Z, Evans S. Susceptibility to auditory hallucinations is associated with spontaneous but not directed modulation of top-down expectations for speech. Neurosci Conscious 2022; 2022:niac002. [PMID: 35145758 PMCID: PMC8824703 DOI: 10.1093/nc/niac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 01/13/2022] [Indexed: 11/29/2022] Open
Abstract
Auditory verbal hallucinations (AVHs)-or hearing voices-occur in clinical and non-clinical populations, but their mechanisms remain unclear. Predictive processing models of psychosis have proposed that hallucinations arise from an over-weighting of prior expectations in perception. It is unknown, however, whether this reflects (i) a sensitivity to explicit modulation of prior knowledge or (ii) a pre-existing tendency to spontaneously use such knowledge in ambiguous contexts. Four experiments were conducted to examine this question in healthy participants listening to ambiguous speech stimuli. In experiments 1a (n = 60) and 1b (n = 60), participants discriminated intelligible and unintelligible sine-wave speech before and after exposure to the original language templates (i.e. a modulation of expectation). No relationship was observed between top-down modulation and two common measures of hallucination-proneness. Experiment 2 (n = 99) confirmed this pattern with a different stimulus-sine-vocoded speech (SVS)-that was designed to minimize ceiling effects in discrimination and more closely model previous top-down effects reported in psychosis. In Experiment 3 (n = 134), participants were exposed to SVS without prior knowledge that it contained speech (i.e. naïve listening). AVH-proneness significantly predicted both pre-exposure identification of speech and successful recall for words hidden in SVS, indicating that participants could actually decode the hidden signal spontaneously. Altogether, these findings support a pre-existing tendency to spontaneously draw upon prior knowledge in healthy people prone to AVH, rather than a sensitivity to temporary modulations of expectation. We propose a model of clinical and non-clinical hallucinations, across auditory and visual modalities, with testable predictions for future research.
Collapse
Affiliation(s)
| | - Jamie Moffatt
- Department of Psychology, Durham University, Durham, UK
- Department of Psychology, University of Sussex, Brighton, UK
| | - César F Lima
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal
| | - Saloni Krishnan
- Department of Psychology, Royal Holloway, University of London, London, UK
| | | | - Sophie K Scott
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Sophie Denton
- Department of Psychology, Durham University, Durham, UK
| | | | - Alena D Oncel
- Department of Psychology, Durham University, Durham, UK
| | - Yu-Lin Wu
- Department of Psychology, Durham University, Durham, UK
| | - Zehra Gurbuz
- Department of Psychology, Durham University, Durham, UK
| | - Samuel Evans
- Department of Psychology, University of Westminster, London, UK
| |
Collapse
|
24
|
Moberly AC, Lewis JH, Vasil KJ, Ray C, Tamati TN. Bottom-Up Signal Quality Impacts the Role of Top-Down Cognitive-Linguistic Processing During Speech Recognition by Adults with Cochlear Implants. Otol Neurotol 2021; 42:S33-S41. [PMID: 34766942 PMCID: PMC8597903 DOI: 10.1097/mao.0000000000003377] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
HYPOTHESES Significant variability persists in speech recognition outcomes in adults with cochlear implants (CIs). Sensory ("bottom-up") and cognitive-linguistic ("top-down") processes help explain this variability. However, the interactions of these bottom-up and top-down factors remain unclear. One hypothesis was tested: top-down processes would contribute differentially to speech recognition, depending on the fidelity of bottom-up input. BACKGROUND Bottom-up spectro-temporal processing, assessed using a Spectral-Temporally Modulated Ripple Test (SMRT), is associated with CI speech recognition outcomes. Similarly, top-down cognitive-linguistic skills relate to outcomes, including working memory capacity, inhibition-concentration, speed of lexical access, and nonverbal reasoning. METHODS Fifty-one adult CI users were tested for word and sentence recognition, along with performance on the SMRT and a battery of cognitive-linguistic tests. The group was divided into "low-," "intermediate-," and "high-SMRT" groups, based on SMRT scores. Separate correlation analyses were performed for each subgroup between a composite score of cognitive-linguistic processing and speech recognition. RESULTS Associations of top-down composite scores with speech recognition were not significant for the low-SMRT group. In contrast, these associations were significant and of medium effect size (Spearman's rho = 0.44-0.46) for two sentence types for the intermediate-SMRT group. For the high-SMRT group, top-down scores were associated with both word and sentence recognition, with medium to large effect sizes (Spearman's rho = 0.45-0.58). CONCLUSIONS Top-down processes contribute differentially to speech recognition in CI users based on the quality of bottom-up input. Findings have clinical implications for individualized treatment approaches relying on bottom-up device programming or top-down rehabilitation approaches.
Collapse
Affiliation(s)
- Aaron C Moberly
- Department of Otolaryngology - Head & Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Jessica H Lewis
- Department of Otolaryngology - Head & Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Kara J Vasil
- Department of Otolaryngology - Head & Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Christin Ray
- Department of Otolaryngology - Head & Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Terrin N Tamati
- Department of Otolaryngology - Head & Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
- Department of Otorhinolaryngology - Head and Neck Surgery, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| |
Collapse
|
25
|
Wang YC, Sohoglu E, Gilbert RA, Henson RN, Davis MH. Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming. J Neurosci 2021; 41:6919-6932. [PMID: 34210777 PMCID: PMC8360690 DOI: 10.1523/jneurosci.1685-20.2021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 05/22/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022] Open
Abstract
Human listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g., TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g., hygiene and hijack share /haidʒ/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified. In contrast, predictive-selection accounts (e.g., Predictive-Coding) propose that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words, such as hygiene and hijack, will increase prediction error and hence neural activity only at later time points when different segments are predicted. We collected MEG data from male and female listeners to test these two Bayesian mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighboring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haidʒ/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.SIGNIFICANCE STATEMENT Effective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words (i.e., Bayesian perceptual inference). This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g., Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.
Collapse
Affiliation(s)
- Yingcan Carol Wang
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, BN1 9RH, United Kingdom
| | - Rebecca A Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Richard N Henson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
26
|
Klimovich-Gray A, Barrena A, Agirre E, Molinaro N. One Way or Another: Cortical Language Areas Flexibly Adapt Processing Strategies to Perceptual And Contextual Properties of Speech. Cereb Cortex 2021; 31:4092-4103. [PMID: 33825884 DOI: 10.1093/cercor/bhab071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 11/13/2022] Open
Abstract
Cortical circuits rely on the temporal regularities of speech to optimize signal parsing for sound-to-meaning mapping. Bottom-up speech analysis is accelerated by top-down predictions about upcoming words. In everyday communications, however, listeners are regularly presented with challenging input-fluctuations of speech rate or semantic content. In this study, we asked how reducing speech temporal regularity affects its processing-parsing, phonological analysis, and ability to generate context-based predictions. To ensure that spoken sentences were natural and approximated semantic constraints of spontaneous speech we built a neural network to select stimuli from large corpora. We analyzed brain activity recorded with magnetoencephalography during sentence listening using evoked responses, speech-to-brain synchronization and representational similarity analysis. For normal speech theta band (6.5-8 Hz) speech-to-brain synchronization was increased and the left fronto-temporal areas generated stronger contextual predictions. The reverse was true for temporally irregular speech-weaker theta synchronization and reduced top-down effects. Interestingly, delta-band (0.5 Hz) speech tracking was greater when contextual/semantic predictions were lower or if speech was temporally jittered. We conclude that speech temporal regularity is relevant for (theta) syllabic tracking and robust semantic predictions while the joint support of temporal and contextual predictability reduces word and phrase-level cortical tracking (delta).
Collapse
Affiliation(s)
| | - Ander Barrena
- Computer Science Faculty, University of the Basque Country, Donostia, 20018, San Sebastian, Spain
| | - Eneko Agirre
- Computer Science Faculty, University of the Basque Country, Donostia, 20018, San Sebastian, Spain
| | - Nicola Molinaro
- BCBL, Basque Center on Cognition, Brain and Language, Donostia, 20009, San Sebastian, Spain.,Ikerbasque, Basque Foundation for Science, 48009, Bilbao, Spain
| |
Collapse
|
27
|
Kocagoncu E, Klimovich-Gray A, Hughes LE, Rowe JB. Evidence and implications of abnormal predictive coding in dementia. Brain 2021; 144:3311-3321. [PMID: 34240109 PMCID: PMC8677549 DOI: 10.1093/brain/awab254] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 03/15/2021] [Accepted: 06/17/2021] [Indexed: 11/14/2022] Open
Abstract
The diversity of cognitive deficits and neuropathological processes associated with dementias has encouraged divergence in pathophysiological explanations of disease. Here, we review an alternative framework that emphasizes convergent critical features of cognitive pathophysiology. Rather than the loss of ‘memory centres’ or ‘language centres’, or singular neurotransmitter systems, cognitive deficits are interpreted in terms of aberrant predictive coding in hierarchical neural networks. This builds on advances in normative accounts of brain function, specifically the Bayesian integration of beliefs and sensory evidence in which hierarchical predictions and prediction errors underlie memory, perception, speech and behaviour. We describe how analogous impairments in predictive coding in parallel neurocognitive systems can generate diverse clinical phenomena, including the characteristics of dementias. The review presents evidence from behavioural and neurophysiological studies of perception, language, memory and decision-making. The reformulation of cognitive deficits in terms of predictive coding has several advantages. It brings diverse clinical phenomena into a common framework; it aligns cognitive and movement disorders; and it makes specific predictions on cognitive physiology that support translational and experimental medicine studies. The insights into complex human cognitive disorders from the predictive coding framework may therefore also inform future therapeutic strategies.
Collapse
Affiliation(s)
- Ece Kocagoncu
- Cambridge Centre for Frontotemporal Dementia, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | | | - Laura E Hughes
- Cambridge Centre for Frontotemporal Dementia, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.,Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - James B Rowe
- Cambridge Centre for Frontotemporal Dementia, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.,Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
28
|
Beach SD, Ozernov-Palchik O, May SC, Centanni TM, Gabrieli JDE, Pantazis D. Neural Decoding Reveals Concurrent Phonemic and Subphonemic Representations of Speech Across Tasks. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2021; 2:254-279. [PMID: 34396148 PMCID: PMC8360503 DOI: 10.1162/nol_a_00034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 02/21/2021] [Indexed: 06/13/2023]
Abstract
Robust and efficient speech perception relies on the interpretation of acoustically variable phoneme realizations, yet prior neuroimaging studies are inconclusive regarding the degree to which subphonemic detail is maintained over time as categorical representations arise. It is also unknown whether this depends on the demands of the listening task. We addressed these questions by using neural decoding to quantify the (dis)similarity of brain response patterns evoked during two different tasks. We recorded magnetoencephalography (MEG) as adult participants heard isolated, randomized tokens from a /ba/-/da/ speech continuum. In the passive task, their attention was diverted. In the active task, they categorized each token as ba or da. We found that linear classifiers successfully decoded ba vs. da perception from the MEG data. Data from the left hemisphere were sufficient to decode the percept early in the trial, while the right hemisphere was necessary but not sufficient for decoding at later time points. We also decoded stimulus representations and found that they were maintained longer in the active task than in the passive task; however, these representations did not pattern more like discrete phonemes when an active categorical response was required. Instead, in both tasks, early phonemic patterns gave way to a representation of stimulus ambiguity that coincided in time with reliable percept decoding. Our results suggest that the categorization process does not require the loss of subphonemic detail, and that the neural representation of isolated speech sounds includes concurrent phonemic and subphonemic information.
Collapse
Affiliation(s)
- Sara D. Beach
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| | - Ola Ozernov-Palchik
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sidney C. May
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, USA
| | - Tracy M. Centanni
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology, Texas Christian University, Fort Worth, TX, USA
| | - John D. E. Gabrieli
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Dimitrios Pantazis
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
29
|
Darriba Á, Van Ommen S, Hsu YF, Waszak F. Visual Predictions Operate on Different Timescales. J Cogn Neurosci 2021; 33:984-1002. [PMID: 34428794 DOI: 10.1162/jocn_a_01711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Humans live in a volatile environment, subject to changes occurring at different timescales. The ability to adjust internal predictions accordingly is critical for perception and action. We studied this ability with two EEG experiments in which participants were presented with sequences of four Gabor patches, simulating a rotation, and instructed to respond to the last stimulus (target) to indicate whether or not it continued the direction of the first three stimuli. Each experiment included a short-term learning phase in which the probabilities of these two options were very different (p = .2 vs. p = .8, Rules A and B, respectively), followed by a neutral test phase in which both probabilities were equal. In addition, in one of the experiments, prior to the short-term phase, participants performed a much longer long-term learning phase where the relative probabilities of the rules predicting targets were opposite to those of the short-term phase. Analyses of the RTs and P3 amplitudes showed that, in the neutral test phase, participants initially predicted targets according to the probabilities learned in the short-term phase. However, whereas participants not pre-exposed to the long-term learning phase gradually adjusted their predictions to the neutral probabilities, for those who performed the long-term phase, the short-term associations were spontaneously replaced by those learned in that phase. This indicates that the long-term associations remained intact whereas the short-term associations were learned, transiently used, and abandoned when the context changed. The spontaneous recovery suggests independent storage and control of long-term and short-term associations.
Collapse
Affiliation(s)
| | | | | | - Florian Waszak
- Université de Paris, CNRS, France.,Fondation Ophtalmologique Rothschild, Paris, France
| |
Collapse
|
30
|
Jiang J, Benhamou E, Waters S, Johnson JCS, Volkmer A, Weil RS, Marshall CR, Warren JD, Hardy CJD. Processing of Degraded Speech in Brain Disorders. Brain Sci 2021; 11:394. [PMID: 33804653 PMCID: PMC8003678 DOI: 10.3390/brainsci11030394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/15/2021] [Accepted: 03/18/2021] [Indexed: 11/30/2022] Open
Abstract
The speech we hear every day is typically "degraded" by competing sounds and the idiosyncratic vocal characteristics of individual speakers. While the comprehension of "degraded" speech is normally automatic, it depends on dynamic and adaptive processing across distributed neural networks. This presents the brain with an immense computational challenge, making degraded speech processing vulnerable to a range of brain disorders. Therefore, it is likely to be a sensitive marker of neural circuit dysfunction and an index of retained neural plasticity. Considering experimental methods for studying degraded speech and factors that affect its processing in healthy individuals, we review the evidence for altered degraded speech processing in major neurodegenerative diseases, traumatic brain injury and stroke. We develop a predictive coding framework for understanding deficits of degraded speech processing in these disorders, focussing on the "language-led dementias"-the primary progressive aphasias. We conclude by considering prospects for using degraded speech as a probe of language network pathophysiology, a diagnostic tool and a target for therapeutic intervention.
Collapse
Affiliation(s)
- Jessica Jiang
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Elia Benhamou
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Sheena Waters
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London EC1M 6BQ, UK;
| | - Jeremy C. S. Johnson
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Anna Volkmer
- Division of Psychology and Language Sciences, University College London, London WC1H 0AP, UK;
| | - Rimona S. Weil
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Charles R. Marshall
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London EC1M 6BQ, UK;
| | - Jason D. Warren
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Chris J. D. Hardy
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| |
Collapse
|
31
|
Tabas A, von Kriegstein K. Adjudicating Between Local and Global Architectures of Predictive Processing in the Subcortical Auditory Pathway. Front Neural Circuits 2021; 15:644743. [PMID: 33776657 PMCID: PMC7994860 DOI: 10.3389/fncir.2021.644743] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
Predictive processing, a leading theoretical framework for sensory processing, suggests that the brain constantly generates predictions on the sensory world and that perception emerges from the comparison between these predictions and the actual sensory input. This requires two distinct neural elements: generative units, which encode the model of the sensory world; and prediction error units, which compare these predictions against the sensory input. Although predictive processing is generally portrayed as a theory of cerebral cortex function, animal and human studies over the last decade have robustly shown the ubiquitous presence of prediction error responses in several nuclei of the auditory, somatosensory, and visual subcortical pathways. In the auditory modality, prediction error is typically elicited using so-called oddball paradigms, where sequences of repeated pure tones with the same pitch are at unpredictable intervals substituted by a tone of deviant frequency. Repeated sounds become predictable promptly and elicit decreasing prediction error; deviant tones break these predictions and elicit large prediction errors. The simplicity of the rules inducing predictability make oddball paradigms agnostic about the origin of the predictions. Here, we introduce two possible models of the organizational topology of the predictive processing auditory network: (1) the global view, that assumes that predictions on the sensory input are generated at high-order levels of the cerebral cortex and transmitted in a cascade of generative models to the subcortical sensory pathways; and (2) the local view, that assumes that independent local models, computed using local information, are used to perform predictions at each processing stage. In the global view information encoding is optimized globally but biases sensory representations along the entire brain according to the subjective views of the observer. The local view results in a diminished coding efficiency, but guarantees in return a robust encoding of the features of sensory input at each processing stage. Although most experimental results to-date are ambiguous in this respect, recent evidence favors the global model.
Collapse
Affiliation(s)
- Alejandro Tabas
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
32
|
Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behav Res Methods 2021; 53:1945-1953. [PMID: 33694079 PMCID: PMC8516752 DOI: 10.3758/s13428-021-01542-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2021] [Indexed: 11/08/2022]
Abstract
Many studies of speech perception assess the intelligibility of spoken sentence stimuli by means of transcription tasks ('type out what you hear'). The intelligibility of a given stimulus is then often expressed in terms of percentage of words correctly reported from the target sentence. Yet scoring the participants' raw responses for words correctly identified from the target sentence is a time-consuming task, and hence resource-intensive. Moreover, there is no consensus among speech scientists about what specific protocol to use for the human scoring, limiting the reliability of human scores. The present paper evaluates various forms of fuzzy string matching between participants' responses and target sentences, as automated metrics of listener transcript accuracy. We demonstrate that one particular metric, the token sort ratio, is a consistent, highly efficient, and accurate metric for automated assessment of listener transcripts, as evidenced by high correlations with human-generated scores (best correlation: r = 0.940) and a strong relationship to acoustic markers of speech intelligibility. Thus, fuzzy string matching provides a practical tool for assessment of listener transcript accuracy in large-scale speech intelligibility studies. See https://tokensortratio.netlify.app for an online implementation.
Collapse
|
33
|
van Bree S, Sohoglu E, Davis MH, Zoefel B. Sustained neural rhythms reveal endogenous oscillations supporting speech perception. PLoS Biol 2021; 19:e3001142. [PMID: 33635855 PMCID: PMC7946281 DOI: 10.1371/journal.pbio.3001142] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 03/10/2021] [Accepted: 02/08/2021] [Indexed: 12/23/2022] Open
Abstract
Rhythmic sensory or electrical stimulation will produce rhythmic brain responses. These rhythmic responses are often interpreted as endogenous neural oscillations aligned (or "entrained") to the stimulus rhythm. However, stimulus-aligned brain responses can also be explained as a sequence of evoked responses, which only appear regular due to the rhythmicity of the stimulus, without necessarily involving underlying neural oscillations. To distinguish evoked responses from true oscillatory activity, we tested whether rhythmic stimulation produces oscillatory responses which continue after the end of the stimulus. Such sustained effects provide evidence for true involvement of neural oscillations. In Experiment 1, we found that rhythmic intelligible, but not unintelligible speech produces oscillatory responses in magnetoencephalography (MEG) which outlast the stimulus at parietal sensors. In Experiment 2, we found that transcranial alternating current stimulation (tACS) leads to rhythmic fluctuations in speech perception outcomes after the end of electrical stimulation. We further report that the phase relation between electroencephalography (EEG) responses and rhythmic intelligible speech can predict the tACS phase that leads to most accurate speech perception. Together, we provide fundamental results for several lines of research-including neural entrainment and tACS-and reveal endogenous neural oscillations as a key underlying principle for speech perception.
Collapse
Affiliation(s)
- Sander van Bree
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre for Cognitive Neuroimaging, University of Glasgow, Glasgow, United Kingdom
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Ediz Sohoglu
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- School of Psychology, University of Sussex, Brighton, United Kingdom
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Benedikt Zoefel
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre de Recherche Cerveau et Cognition, CNRS, Toulouse, France
- Université Toulouse III Paul Sabatier, Toulouse, France
| |
Collapse
|
34
|
Kajiura M, Jeong H, Kawata NYS, Yu S, Kinoshita T, Kawashima R, Sugiura M. Brain activity predicts future learning success in intensive second language listening training. BRAIN AND LANGUAGE 2021; 212:104839. [PMID: 33271393 DOI: 10.1016/j.bandl.2020.104839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 06/03/2020] [Accepted: 07/14/2020] [Indexed: 06/12/2023]
Abstract
This study explores neural mechanisms underlying how prior knowledge gained from pre-listening transcript reading helps comprehend fast-rate speech in a second language (L2) and applies to L2 learning. Top-down predictive processing by prior knowledge may play an important role in L2 speech comprehension and improving listening skill. By manipulating the pre-listening transcript effect (pre-listening transcript reading [TR] vs. no transcript reading [NTR]) and type of languages (first language (L1) vs. L2), we measured brain activity in L2 learners, who performed fast-rate listening comprehension tasks during functional magnetic resonance imaging. Thereafter, we examined whether TR_L2-specific brain activity can predict individual learning success after an intensive listening training. The left angular and superior temporal gyri were key areas responsible for integrating prior knowledge to sensory input. Activity in these areas correlated significantly with gain scores on subsequent training, indicating that brain activity related to prior knowledge-sensory input integration predicts future learning success.
Collapse
Affiliation(s)
- Mayumi Kajiura
- Division of Foreign Language Education, Aichi Shukutoku University, Nagoya, Japan.
| | - Hyeonjeong Jeong
- Graduate School of International Cultural Studies, Tohoku University, Sendai, Japan; Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan.
| | - Natasha Y S Kawata
- Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan
| | - Shaoyun Yu
- Graduate School of Humanities, Nagoya University, Nagoya, Japan
| | - Toru Kinoshita
- Graduate School of Humanities, Nagoya University, Nagoya, Japan
| | - Ryuta Kawashima
- Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan
| | - Motoaki Sugiura
- Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan; International Research Institute for Disaster Science, Tohoku University, Sendai, Japan
| |
Collapse
|
35
|
Tabas A, Mihai G, Kiebel S, Trampel R, von Kriegstein K. Abstract rules drive adaptation in the subcortical sensory pathway. eLife 2020; 9:64501. [PMID: 33289479 PMCID: PMC7785290 DOI: 10.7554/elife.64501] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/03/2020] [Indexed: 01/19/2023] Open
Abstract
The subcortical sensory pathways are the fundamental channels for mapping the outside world to our minds. Sensory pathways efficiently transmit information by adapting neural responses to the local statistics of the sensory input. The long-standing mechanistic explanation for this adaptive behaviour is that neural activity decreases with increasing regularities in the local statistics of the stimuli. An alternative account is that neural coding is directly driven by expectations of the sensory input. Here, we used abstract rules to manipulate expectations independently of local stimulus statistics. The ultra-high-field functional-MRI data show that abstract expectations can drive the response amplitude to tones in the human auditory pathway. These results provide first unambiguous evidence of abstract processing in a subcortical sensory pathway. They indicate that the neural representation of the outside world is altered by our prior beliefs even at initial points of the processing hierarchy.
Collapse
Affiliation(s)
- Alejandro Tabas
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Glad Mihai
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Stefan Kiebel
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, Dresden, Germany
| | - Robert Trampel
- Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Katharina von Kriegstein
- Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
36
|
Sohoglu E, Davis MH. Rapid computations of spectrotemporal prediction error support perception of degraded speech. eLife 2020; 9:e58077. [PMID: 33147138 PMCID: PMC7641582 DOI: 10.7554/elife.58077] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 10/19/2020] [Indexed: 12/15/2022] Open
Abstract
Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of SussexBrightonUnited Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences UnitCambridgeUnited Kingdom
| |
Collapse
|
37
|
Lupyan G, Abdel Rahman R, Boroditsky L, Clark A. Effects of Language on Visual Perception. Trends Cogn Sci 2020; 24:930-944. [PMID: 33012687 DOI: 10.1016/j.tics.2020.08.005] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/22/2020] [Accepted: 08/25/2020] [Indexed: 11/24/2022]
Abstract
Does language change what we perceive? Does speaking different languages cause us to perceive things differently? We review the behavioral and electrophysiological evidence for the influence of language on perception, with an emphasis on the visual modality. Effects of language on perception can be observed both in higher-level processes such as recognition and in lower-level processes such as discrimination and detection. A consistent finding is that language causes us to perceive in a more categorical way. Rather than being fringe or exotic, as they are sometimes portrayed, we discuss how effects of language on perception naturally arise from the interactive and predictive nature of perception.
Collapse
Affiliation(s)
- Gary Lupyan
- University of Wisconsin-Madison, Madison, WI, USA.
| | | | | | - Andy Clark
- University of Sussex, Brighton, UK; Macquarie University, Sydney, Australia
| |
Collapse
|
38
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
39
|
Lenc T, Keller PE, Varlet M, Nozaradan S. Neural and Behavioral Evidence for Frequency-Selective Context Effects in Rhythm Processing in Humans. Cereb Cortex Commun 2020; 1:tgaa037. [PMID: 34296106 PMCID: PMC8152888 DOI: 10.1093/texcom/tgaa037] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 01/17/2023] Open
Abstract
When listening to music, people often perceive and move along with a periodic meter. However, the dynamics of mapping between meter perception and the acoustic cues to meter periodicities in the sensory input remain largely unknown. To capture these dynamics, we recorded the electroencephalography while nonmusician and musician participants listened to nonrepeating rhythmic sequences, where acoustic cues to meter frequencies either gradually decreased (from regular to degraded) or increased (from degraded to regular). The results revealed greater neural activity selectively elicited at meter frequencies when the sequence gradually changed from regular to degraded compared with the opposite. Importantly, this effect was unlikely to arise from overall gain, or low-level auditory processing, as revealed by physiological modeling. Moreover, the context effect was more pronounced in nonmusicians, who also demonstrated facilitated sensory-motor synchronization with the meter for sequences that started as regular. In contrast, musicians showed weaker effects of recent context in their neural responses and robust ability to move along with the meter irrespective of stimulus degradation. Together, our results demonstrate that brain activity elicited by rhythm does not only reflect passive tracking of stimulus features, but represents continuous integration of sensory input with recent context.
Collapse
Affiliation(s)
- Tomas Lenc
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
| | - Peter E Keller
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
| | - Manuel Varlet
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
- School of Psychology, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
| | - Sylvie Nozaradan
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
- Institute of Neuroscience (IONS), Université Catholique de Louvain (UCL), Brussels 1200, Belgium
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal QC H3C 3J7, Canada
| |
Collapse
|
40
|
Rotman T, Lavie L, Banai K. Rapid Perceptual Learning: A Potential Source of Individual Differences in Speech Perception Under Adverse Conditions? Trends Hear 2020; 24:2331216520930541. [PMID: 32552477 PMCID: PMC7303778 DOI: 10.1177/2331216520930541] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Challenging listening situations (e.g., when speech is rapid or noisy) result in substantial individual differences in speech perception. We propose that rapid auditory perceptual learning is one of the factors contributing to those individual differences. To explore this proposal, we assessed rapid perceptual learning of time-compressed speech in young adults with normal hearing and in older adults with age-related hearing loss. We also assessed the contribution of this learning as well as that of hearing and cognition (vocabulary, working memory, and selective attention) to the recognition of natural-fast speech (NFS; both groups) and speech in noise (younger adults). In young adults, rapid learning and vocabulary were significant predictors of NFS and speech in noise recognition. In older adults, hearing thresholds, vocabulary, and rapid learning were significant predictors of NFS recognition. In both groups, models that included learning fitted the speech data better than models that did not include learning. Therefore, under adverse conditions, rapid learning may be one of the skills listeners could employ to support speech recognition.
Collapse
Affiliation(s)
- Tali Rotman
- Department of Communication Sciences and Disorders, University of Haifa
| | - Limor Lavie
- Department of Communication Sciences and Disorders, University of Haifa
| | - Karen Banai
- Department of Communication Sciences and Disorders, University of Haifa
| |
Collapse
|
41
|
Abstract
Listeners exposed to accented speech must adjust how they map between acoustic features and lexical representations such as phonetic categories. A robust form of this adaptive perceptual learning is learning to perceive synthetic speech where the connections between acoustic features and phonetic categories must be updated. Both implicit learning through mere exposure and explicit learning through directed feedback have previously been shown to produce this type of adaptive learning. The present study crosses implicit exposure and explicit feedback with the presence or absence of a written identification task. We show that simple exposure produces some learning, but explicit feedback produces substantially stronger learning, whereas requiring written identification did not measurably affect learning. These results suggest that explicit feedback guides learning of new mappings between acoustic patterns and known phonetic categories. We discuss mechanisms that may support learning via implicit exposure.
Collapse
|
42
|
Casaponsa A, Sohoglu E, Moore DR, Füllgrabe C, Molloy K, Amitay S. Does training with amplitude modulated tones affect tone-vocoded speech perception? PLoS One 2019; 14:e0226288. [PMID: 31881550 PMCID: PMC6934405 DOI: 10.1371/journal.pone.0226288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 11/22/2019] [Indexed: 11/17/2022] Open
Abstract
Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored.
Collapse
Affiliation(s)
- Aina Casaponsa
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
- Department of Linguistics and English Language, Lancaster University, Lancaster, England, United Kingdom
| | - Ediz Sohoglu
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - David R. Moore
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Christian Füllgrabe
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Katharine Molloy
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Sygal Amitay
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| |
Collapse
|
43
|
Jenson D, Thornton D, Harkrider AW, Saltuklaroglu T. Influences of cognitive load on sensorimotor contributions to working memory: An EEG investigation of mu rhythm activity during speech discrimination. Neurobiol Learn Mem 2019; 166:107098. [DOI: 10.1016/j.nlm.2019.107098] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 09/11/2019] [Accepted: 10/09/2019] [Indexed: 11/16/2022]
|
44
|
Guediche S, Zhu Y, Minicucci D, Blumstein SE. Written sentence context effects on acoustic-phonetic perception: fMRI reveals cross-modal semantic-perceptual interactions. BRAIN AND LANGUAGE 2019; 199:104698. [PMID: 31586792 DOI: 10.1016/j.bandl.2019.104698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 09/15/2019] [Accepted: 09/18/2019] [Indexed: 06/10/2023]
Abstract
This study examines cross-modality effects of a semantically-biased written sentence context on the perception of an acoustically-ambiguous word target identifying neural areas sensitive to interactions between sentential bias and phonetic ambiguity. Of interest is whether the locus or nature of the interactions resembles those previously demonstrated for auditory-only effects. FMRI results show significant interaction effects in right mid-middle temporal gyrus (RmMTG) and bilateral anterior superior temporal gyri (aSTG), regions along the ventral language comprehension stream that map sound onto meaning. These regions are more anterior than those previously identified for auditory-only effects; however, the same cross-over interaction pattern emerged implying similar underlying computations at play. The findings suggest that the mechanisms that integrate information across modality and across sentence and phonetic levels of processing recruit amodal areas where reading and spoken lexical and semantic access converge. Taken together, results support interactive accounts of speech and language processing.
Collapse
Affiliation(s)
- Sara Guediche
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, United States; BCBL - Basque Center on Cognition, Brain and Language, Donostia-San Sebastian, Spain.
| | - Yuli Zhu
- Neuroscience Department, Brown University, United States
| | - Domenic Minicucci
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, United States
| | - Sheila E Blumstein
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, United States; Brown Institute for Brain Science, Brown University, United States
| |
Collapse
|
45
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
46
|
Hierarchical contributions of linguistic knowledge to talker identification: Phonological versus lexical familiarity. Atten Percept Psychophys 2019; 81:1088-1107. [PMID: 31218598 DOI: 10.3758/s13414-019-01778-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners' ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or "hybrid" speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers' phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
Collapse
|
47
|
Babel M, McAuliffe M, Norton C, Senior B, Vaughn C. The Goldilocks Zone of Perceptual Learning. PHONETICA 2019; 76:179-200. [PMID: 31112962 DOI: 10.1159/000494929] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 10/29/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND/AIMS Lexically guided perceptual learning in speech is the updating of linguistic categories based on novel input disambiguated by the structure provided in a recognized lexical item. We test the range of variation that allows for perceptual learning by presenting listeners with items that vary from subtle within-category variation to fully remapped cross-category variation. METHODS Experiment 1 uses a lexically guided perceptual learning paradigm with words containing noncanonical /s/ realizations from s/ʃ continua that correspond to "typical," "ambiguous," "atypical," and "remapped" steps. Perceptual learning is tested in an s/ʃ categorization task. Experiment 2 addresses listener sensitivity to variation in the exposure items using AX discrimination tasks. RESULTS Listeners in experiment 1 showed perceptual learning with the maximally ambiguous tokens. Performance of listeners in experiment 2 suggests that tokens which showed the most perceptual learning were not perceptually salient on their own. CONCLUSION These results demonstrate that perceptual learning is enhanced with maximally ambiguous stimuli. Excessively atypical pronunciations show attenuated perceptual learning, while typical pronunciations show no evidence for perceptual learning. AX discrimination illustrates that the maximally ambiguous stimuli are not perceptually unique. Together, these results suggest that perceptual learning relies on an interplay between confidence in phonetic and lexical predictions and category typicality.
Collapse
Affiliation(s)
- Molly Babel
- University of British Columbia, Vancouver, British Columbia, Canada,
| | | | - Carolyn Norton
- University of British Columbia, Vancouver, British Columbia, Canada
| | - Brianne Senior
- University of British Columbia, Vancouver, British Columbia, Canada
| | | |
Collapse
|
48
|
Vaughn CR. Expectations about the source of a speaker's accent affect accent adaptation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:3218. [PMID: 31153344 DOI: 10.1121/1.5108831] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 04/30/2019] [Indexed: 06/09/2023]
Abstract
When encountering speakers whose accents differ from the listener's own, listeners initially show a processing cost, but that cost can be attenuated after short term exposure. The extent to which processing foreign accents (L2-accents) and within-language accents (L1-accents) is similar is still an open question. This study considers whether listeners' expectations about the source of a speaker's accent-whether the speaker is purported to be an L1 or an L2 speaker-affect intelligibility. Prior work has indirectly manipulated expectations about a speaker's accent through photographs, but the present study primes listeners with a description of the speaker's accent itself. In experiment 1, native English listeners transcribed Spanish-accented English sentences in noise under three different conditions (speaker's accent: monolingual L1 Latinx English, L1-Spanish/L2-English, no information given). Results indicate that, by the end of the experiment, listeners given some information about the accent outperformed listeners given no information, and listeners told the speaker was L1-accented outperformed listeners told to expect L2-accented speech. Findings are interpreted in terms of listeners' expectations about task difficulty, and a follow-up experiment (experiment 2) found that priming listeners to expect that their ability to understand L2-accented speech can improve does in fact improve intelligibility.
Collapse
Affiliation(s)
- Charlotte R Vaughn
- Department of Linguistics, University of Oregon, 1290 University of Oregon, Eugene, Oregon 97403-1290, USA
| |
Collapse
|
49
|
Khoshkhoo S, Leonard MK, Mesgarani N, Chang EF. Neural correlates of sine-wave speech intelligibility in human frontal and temporal cortex. BRAIN AND LANGUAGE 2018; 187:83-91. [PMID: 29397190 PMCID: PMC6067983 DOI: 10.1016/j.bandl.2018.01.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 12/06/2017] [Accepted: 01/20/2018] [Indexed: 05/09/2023]
Abstract
Auditory speech comprehension is the result of neural computations that occur in a broad network that includes the temporal lobe auditory cortex and the left inferior frontal cortex. It remains unclear how representations in this network differentially contribute to speech comprehension. Here, we recorded high-density direct cortical activity during a sine-wave speech (SWS) listening task to examine detailed neural speech representations when the exact same acoustic input is comprehended versus not comprehended. Listeners heard SWS sentences (pre-exposure), followed by clear versions of the same sentences, which revealed the content of the sounds (exposure), and then the same SWS sentences again (post-exposure). Across all three task phases, high-gamma neural activity in the superior temporal gyrus was similar, distinguishing different words based on bottom-up acoustic features. In contrast, frontal regions showed a more pronounced and sudden increase in activity only when the input was comprehended, which corresponded with stronger representational separability among spatiotemporal activity patterns evoked by different words. We observed this effect only in participants who were not able to comprehend the stimuli during the pre-exposure phase, indicating a relationship between frontal high-gamma activity and speech understanding. Together, these results demonstrate that both frontal and temporal cortical networks are involved in spoken language understanding, and that under certain listening conditions, frontal regions are involved in discriminating speech sounds.
Collapse
Affiliation(s)
- Sattar Khoshkhoo
- School of Medicine, University of California, San Francisco, 505 Parnassus Ave., San Francisco, CA 94143, United States
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 505 Parnassus Ave., San Francisco, CA 94143, United States; Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States; Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, Mudd Building, Room 1339, 500 W 120th St., New York, NY 10027, United States
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 505 Parnassus Ave., San Francisco, CA 94143, United States; Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States; Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States.
| |
Collapse
|
50
|
Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context. J Neurosci 2018; 39:519-527. [PMID: 30459221 DOI: 10.1523/jneurosci.3573-17.2018] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 10/17/2018] [Accepted: 10/18/2018] [Indexed: 11/21/2022] Open
Abstract
Spoken word recognition in context is remarkably fast and accurate, with recognition times of ∼200 ms, typically well before the end of the word. The neurocomputational mechanisms underlying these contextual effects are still poorly understood. This study combines source-localized electroencephalographic and magnetoencephalographic (EMEG) measures of real-time brain activity with multivariate representational similarity analysis to determine directly the timing and computational content of the processes evoked as spoken words are heard in context, and to evaluate the respective roles of bottom-up and predictive processing mechanisms in the integration of sensory and contextual constraints. Male and female human participants heard simple (modifier-noun) English phrases that varied in the degree of semantic constraint that the modifier (W1) exerted on the noun (W2), as in pairs, such as "yellow banana." We used gating tasks to generate estimates of the probabilistic predictions generated by these constraints as well as measures of their interaction with the bottom-up perceptual input for W2. Representation similarity analysis models of these measures were tested against electroencephalographic and magnetoencephalographic brain data across a bilateral fronto-temporo-parietal language network. Consistent with probabilistic predictive processing accounts, we found early activation of semantic constraints in frontal cortex (LBA45) as W1 was heard. The effects of these constraints (at 100 ms after W2 onset in left middle temporal gyrus and at 140 ms in left Heschl's gyrus) were only detectable, however, after the initial phonemes of W2 had been heard. Within an overall predictive processing framework, bottom-up sensory inputs are still required to achieve early and robust spoken word recognition in context.SIGNIFICANCE STATEMENT Human listeners recognize spoken words in natural speech contexts with remarkable speed and accuracy, often identifying a word well before all of it has been heard. In this study, we investigate the brain systems that support this important capacity, using neuroimaging techniques that can track real-time brain activity during speech comprehension. This makes it possible to locate the brain areas that generate predictions about upcoming words and to show how these expectations are integrated with the evidence provided by the speech being heard. We use the timing and localization of these effects to provide the most specific account to date of how the brain achieves an optimal balance between prediction and sensory input in the interpretation of spoken language.
Collapse
|