1
|
Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretchinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1767-1779. [PMID: 38441439 DOI: 10.1121/10.0025132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024]
Abstract
Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.
Collapse
|
2
|
Sentence recognition with modulation-filtered speech segments for younger and older adults: Effects of hearing impairment and cognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3328-3343. [PMID: 37983296 PMCID: PMC10663055 DOI: 10.1121/10.0022445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 10/23/2023] [Accepted: 11/01/2023] [Indexed: 11/22/2023]
Abstract
This study investigated word recognition for sentences temporally filtered within and across acoustic-phonetic segments providing primarily vocalic or consonantal cues. Amplitude modulation was filtered at syllabic (0-8 Hz) or slow phonemic (8-16 Hz) rates. Sentence-level modulation properties were also varied by amplifying or attenuating segments. Participants were older adults with normal or impaired hearing. Older adult speech recognition was compared to groups of younger normal-hearing adults who heard speech unmodified or spectrally shaped with and without threshold matching noise that matched audibility to hearing-impaired thresholds. Participants also completed cognitive and speech recognition measures. Overall, results confirm the primary contribution of syllabic speech modulations to recognition and demonstrate the importance of these modulations across vowel and consonant segments. Group differences demonstrated a hearing loss-related impairment in processing modulation-filtered speech, particularly at 8-16 Hz. This impairment could not be fully explained by age or poorer audibility. Principal components analysis identified a single factor score that summarized speech recognition across modulation-filtered conditions; analysis of individual differences explained 81% of the variance in this summary factor among the older adults with hearing loss. These results suggest that a combination of cognitive abilities and speech glimpsing abilities contribute to speech recognition in this group.
Collapse
|
3
|
Checkerboard and interrupted speech: Intelligibility contrasts related to factor-analysis-based frequency bandsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2010-2020. [PMID: 37782122 DOI: 10.1121/10.0021165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023]
Abstract
It has been shown that the intelligibility of checkerboard speech stimuli, in which speech signals were periodically interrupted in time and frequency, drastically varied according to the combination of the number of frequency bands (2-20) and segment duration (20-320 ms). However, the effects of the number of frequency bands between 4 and 20 and the frequency division parameters on intelligibility have been largely unknown. Here, we show that speech intelligibility was lowest in four-band checkerboard speech stimuli, except for the 320-ms segment duration. Then, temporally interrupted speech stimuli and eight-band checkerboard speech stimuli came in this order (N = 19 and 20). At the same time, U-shaped intelligibility curves were observed for four-band and possibly eight-band checkerboard speech stimuli. Furthermore, different parameters of frequency division resulted in small but significant intelligibility differences at the 160- and 320-ms segment duration in four-band checkerboard speech stimuli. These results suggest that factor-analysis-based four frequency bands, representing groups of critical bands correlating with each other in speech power fluctuations, work as speech cue channels essential for speech perception. Moreover, a probability summation model for perceptual units, consisting of a sub-unit process and a supra-unit process that receives outputs of the speech cue channels, may account for the U-shaped intelligibility curves.
Collapse
|
4
|
A time-causal and time-recursive scale-covariant scale-space representation of temporal signals and past time. BIOLOGICAL CYBERNETICS 2023; 117:21-59. [PMID: 36689001 PMCID: PMC10160219 DOI: 10.1007/s00422-022-00953-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 11/21/2022] [Indexed: 05/05/2023]
Abstract
This article presents an overview of a theory for performing temporal smoothing on temporal signals in such a way that: (i) temporally smoothed signals at coarser temporal scales are guaranteed to constitute simplifications of corresponding temporally smoothed signals at any finer temporal scale (including the original signal) and (ii) the temporal smoothing process is both time-causal and time-recursive, in the sense that it does not require access to future information and can be performed with no other temporal memory buffer of the past than the resulting smoothed temporal scale-space representations themselves. For specific subsets of parameter settings for the classes of linear and shift-invariant temporal smoothing operators that obey this property, it is shown how temporal scale covariance can be additionally obtained, guaranteeing that if the temporal input signal is rescaled by a uniform temporal scaling factor, then also the resulting temporal scale-space representations of the rescaled temporal signal will constitute mere rescalings of the temporal scale-space representations of the original input signal, complemented by a shift along the temporal scale dimension. The resulting time-causal limit kernel that obeys this property constitutes a canonical temporal kernel for processing temporal signals in real-time scenarios when the regular Gaussian kernel cannot be used, because of its non-causal access to information from the future, and we cannot additionally require the temporal smoothing process to comprise a complementary memory of the past beyond the information contained in the temporal smoothing process itself, which in this way also serves as a multi-scale temporal memory of the past. We describe how the time-causal limit kernel relates to previously used temporal models, such as Koenderink's scale-time kernels and the ex-Gaussian kernel. We do also give an overview of how the time-causal limit kernel can be used for modelling the temporal processing in models for spatio-temporal and spectro-temporal receptive fields, and how it more generally has a high potential for modelling neural temporal response functions in a purely time-causal and time-recursive way, that can also handle phenomena at multiple temporal scales in a theoretically well-founded manner. We detail how this theory can be efficiently implemented for discrete data, in terms of a set of recursive filters coupled in cascade. Hence, the theory is generally applicable for both: (i) modelling continuous temporal phenomena over multiple temporal scales and (ii) digital processing of measured temporal signals in real time. We conclude by stating implications of the theory for modelling temporal phenomena in biological, perceptual, neural and memory processes by mathematical models, as well as implications regarding the philosophy of time and perceptual agents. Specifically, we propose that for A-type theories of time, as well as for perceptual agents, the notion of a non-infinitesimal inner temporal scale of the temporal receptive fields has to be included in representations of the present, where the inherent nonzero temporal delay of such time-causal receptive fields implies a need for incorporating predictions from the actual time-delayed present in the layers of a perceptual hierarchy, to make it possible for a representation of the perceptual present to constitute a representation of the environment with timing properties closer to the actual present.
Collapse
|
5
|
Auditory grouping is necessary to understand interrupted mosaic speech stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:970. [PMID: 36050149 PMCID: PMC9553289 DOI: 10.1121/10.0013425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/13/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough ( ≥ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.
Collapse
|
6
|
Perception of interrupted speech and text: Listener and modality factors. JASA EXPRESS LETTERS 2022; 2:064402. [PMID: 36154160 PMCID: PMC9909681 DOI: 10.1121/10.0011571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/12/2022] [Indexed: 05/19/2023]
Abstract
Interrupted speech and text are used to measure processes of linguistic closure that are important for recognition under adverse backgrounds. The present study compared recognition of speech and text that had been periodically interrupted with matched amounts of silence or white space, respectively. Recognition thresholds were obtained for younger and older adults with normal or simulated/impaired hearing and correlated with recognition of speech-in-babble. Results demonstrate domain-general, age-related processes in linguistic closure affecting high context sentences and domain-specific, hearing-related processes in speech recognition affecting low context sentences. Text recognition captures domain-general linguistic processes in speech recognition susceptible to age-related effects.
Collapse
|
7
|
Evidence for Top-down Meter Perception in Infancy as Shown by Primed Neural Responses to an Ambiguous Rhythm. Eur J Neurosci 2022; 55:2003-2023. [PMID: 35445451 DOI: 10.1111/ejn.15671] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 03/23/2022] [Accepted: 03/24/2022] [Indexed: 11/30/2022]
Abstract
From auditory rhythm patterns, listeners extract the underlying steady beat, and perceptually group beats to form meters. While previous studies show infants discriminate different auditory meters, it remains unknown whether they can maintain (imagine) a metrical interpretation of an ambiguous rhythm through top-down processes. We investigated this via electroencephalographic mismatch responses. We primed 6-month-old infants (N = 24) to hear a 6-beat ambiguous rhythm either in duple meter (n = 13), or in triple meter (n = 11) through loudness accents either on every second or every third beat. Periods of priming were inserted before sequences of the ambiguous unaccented rhythm. To elicit mismatch responses, occasional pitch deviants occurred on either beat 4 (strong beat in triple meter; weak in duple) or beat 5 (strong in duple; weak in triple) of the unaccented trials. At frontal left sites, we found a significant interaction between beat and priming group in the predicted direction. Post-hoc analyses showed mismatch response amplitudes were significantly larger for beat 5 in the duple- than triple-primed group (p = .047) and were non-significantly larger for beat 4 in the triple- than duple-primed group. Further, amplitudes were generally larger in infants with musically experienced parents. At frontal right sites, mismatch responses were generally larger for those in the duple compared to triple group, which may reflect a processing advantage for duple meter. These results indicate infants can impose a top-down, internally generated meter on ambiguous auditory rhythms, an ability that would aid early language and music learning.
Collapse
|
8
|
Cortical Tracking of Sung Speech in Adults vs Infants: A Developmental Analysis. Front Neurosci 2022; 16:842447. [PMID: 35495026 PMCID: PMC9039340 DOI: 10.3389/fnins.2022.842447] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 02/23/2022] [Indexed: 11/28/2022] Open
Abstract
Here we duplicate a neural tracking paradigm, previously published with infants (aged 4 to 11 months), with adult participants, in order to explore potential developmental similarities and differences in entrainment. Adults listened and watched passively as nursery rhymes were sung or chanted in infant-directed speech. Whole-head EEG (128 channels) was recorded, and cortical tracking of the sung speech in the delta (0.5–4 Hz), theta (4–8 Hz) and alpha (8–12 Hz) frequency bands was computed using linear decoders (multivariate Temporal Response Function models, mTRFs). Phase-amplitude coupling (PAC) was also computed to assess whether delta and theta phases temporally organize higher-frequency amplitudes for adults in the same pattern as found in the infant brain. Similar to previous infant participants, the adults showed significant cortical tracking of the sung speech in both delta and theta bands. However, the frequencies associated with peaks in stimulus-induced spectral power (PSD) in the two populations were different. PAC was also different in the adults compared to the infants. PAC was stronger for theta- versus delta- driven coupling in adults but was equal for delta- versus theta-driven coupling in infants. Adults also showed a stimulus-induced increase in low alpha power that was absent in infants. This may suggest adult recruitment of other cognitive processes, possibly related to comprehension or attention. The comparative data suggest that while infant and adult brains utilize essentially the same cortical mechanisms to track linguistic input, the operation of and interplay between these mechanisms may change with age and language experience.
Collapse
|
9
|
Large group differences in binaural sensitivity are represented in preattentive responses from auditory cortex. J Neurophysiol 2022; 127:660-672. [PMID: 35108112 PMCID: PMC8896993 DOI: 10.1152/jn.00360.2021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 01/04/2022] [Accepted: 01/25/2022] [Indexed: 11/22/2022] Open
Abstract
Correlated sounds presented to two ears are perceived as compact and centrally lateralized, whereas decorrelation between ears leads to intracranial image widening. Though most listeners have fine resolution for perceptual changes in interaural correlation (IAC), some investigators have reported large variability in IAC thresholds, and some normal-hearing listeners even exhibit seemingly debilitating IAC thresholds. It is unknown whether or not this variability across individuals and outlier manifestations are a product of task difficulty, poor training, or a neural deficit in the binaural auditory system. The purpose of this study was first to identify listeners with normal and abnormal IAC resolution, second to evaluate the neural responses elicited by IAC changes, and third to use a well-established model of binaural processing to determine a potential explanation for observed individual variability. Nineteen subjects were enrolled in the study, eight of whom were identified as poor performers in the IAC-threshold task. Global scalp responses (N1 and P2 amplitudes of an auditory change complex) in the individuals with poor IAC behavioral thresholds were significantly smaller than for listeners with better IAC resolution. Source-localized evoked responses confirmed this group effect in multiple subdivisions of the auditory cortex, including Heschl's gyrus, planum temporale, and the temporal sulcus. In combination with binaural modeling results, this study provides objective electrophysiological evidence of a binaural processing deficit linked to internal noise, that corresponds to very poor IAC thresholds in listeners that otherwise have normal audiometric profiles and lack spatial hearing complaints.NEW & NOTEWORTHY Group differences in the perception of interaural correlation (IAC) were observed in human adults with normal audiometric sensitivity. These differences were reflected in cortical-evoked activity measured via electroencephalography (EEG). For some participants, weak representation of the binaural cue at the cortical level in preattentive N1-P2 cortical responses may be indicative of a potential processing deficit. Such a deficit may be related to a poorly understood condition known as hidden hearing loss.
Collapse
|
10
|
The common limitations in auditory temporal processing for Mandarin Chinese and Japanese. Sci Rep 2022; 12:3002. [PMID: 35194098 PMCID: PMC8863933 DOI: 10.1038/s41598-022-06925-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/09/2022] [Indexed: 11/09/2022] Open
Abstract
The present investigation focused on how temporal degradation affected intelligibility in two types of languages, i.e., a tonal language (Mandarin Chinese) and a non-tonal language (Japanese). The temporal resolution of common daily-life sentences spoken by native speakers was systematically degraded with mosaicking (mosaicising), in which the power of original speech in each of regularly spaced time-frequency unit was averaged and temporal fine structure was removed. The results showed very similar patterns of variations in intelligibility for these two languages over a wide range of temporal resolution, implying that temporal degradation crucially affected speech cues other than tonal cues in degraded speech without temporal fine structure. Specifically, the intelligibility of both languages maintained a ceiling up to about the 40-ms segment duration, then the performance gradually declined with increasing segment duration, and reached a floor at about the 150-ms segment duration or longer. The same limitations for the ceiling performance up to 40 ms appeared for the other method of degradation, i.e., local time-reversal, implying that a common temporal processing mechanism was related to the limitations. The general tendency fitted to a dual time-window model of speech processing, in which a short (~ 20-30 ms) and a long (~ 200 ms) time-window run in parallel.
Collapse
|
11
|
Atypical processing in neural source analysis of speech envelope modulations in adolescents with dyslexia. Eur J Neurosci 2021; 54:7839-7859. [PMID: 34730259 DOI: 10.1111/ejn.15515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 10/01/2021] [Accepted: 10/28/2021] [Indexed: 11/28/2022]
Abstract
Different studies have suggested that language and developmental disorders such as dyslexia are associated with a disturbance of auditory entrainment and of the functional hemispheric asymmetries during speech processing. These disorders typically result from an issue in the phonological component of language that causes problems to represent and manipulate the phonological structure of words at the syllable and/or phoneme level. We used Auditory Steady-State Responses (ASSRs) in EEG recordings to investigate the brain activation and hemisphere asymmetry of theta, alpha, beta and low-gamma range oscillations in typical readers and readers with dyslexia. The aim was to analyse whether the group differences found in previous electrode level studies were caused by a different source activation pattern or conversely was an effect that could be found on the active brain sources. We could not find differences in the brain locations of the main active brain sources. However, we observed differences in the extracted waveforms. The group average of the first DSS component of all signal-to-noise ratios of ASSR at source level was higher than the group averages at the electrode level. These analyses included a lower alpha synchronisation in adolescents with dyslexia and the possibility of compensatory mechanisms in theta, beta and low-gamma frequency bands. The main brain auditory sources were located in cortical regions around the auditory cortex. Thus, the differences observed in auditory EEG experiments would, according to our findings, have their origin in the intrinsic oscillatory mechanisms of the brain cortical sources related to speech perception.
Collapse
|
12
|
The extended present: an informational context for perception. Acta Psychol (Amst) 2021; 220:103403. [PMID: 34454251 DOI: 10.1016/j.actpsy.2021.103403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/04/2021] [Accepted: 08/19/2021] [Indexed: 01/29/2023] Open
Abstract
Several previous authors have proposed a kind of specious or subjective present moment that covers a few seconds of recent information. This article proposes a new hypothesis about the subjective present, renamed the extended present, defined not in terms of time covered but as a thematically connected information structure held in working memory and in transiently accessible form in long-term memory. The three key features of the extended present are that information in it is thematically connected, both internally and to current attended perceptual input, it is organised in a hierarchical structure, and all information in it is marked with temporal information, specifically ordinal and duration information. Temporal boundaries to the information structure are determined by hierarchical structure processing and by limits on processing and storage capacity. Supporting evidence for the importance of hierarchical structure analysis is found in the domains of music perception, speech and language processing, perception and production of goal-directed action, and exact arithmetical calculation. Temporal information marking is also discussed and a possible mechanism for representing ordinal and duration information on the time scale of the extended present is proposed. It is hypothesised that the extended present functions primarily as an informational context for making sense of current perceptual input, and as an enabler for perception and generation of complex structures and operations in language, action, music, exact calculation, and other domains.
Collapse
|
13
|
The Role of Prenatal Experience and Basic Auditory Mechanisms in the Development of Language. MINNESOTA SYMPOSIA ON CHILD PSYCHOLOGY 2021. [DOI: 10.1002/9781119684527.ch4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
14
|
Checkerboard speech vs interrupted speech: Effects of spectrotemporal segmentation on intelligibility. JASA EXPRESS LETTERS 2021; 1:075204. [PMID: 36154646 DOI: 10.1121/10.0005600] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The intelligibility of interrupted speech (interrupted over time) and checkerboard speech (interrupted over time-by-frequency), both of which retained a half of the original speech, was examined. The intelligibility of interrupted speech stimuli decreased as segment duration increased. 20-band checkerboard speech stimuli brought nearly 100% intelligibility irrespective of segment duration, whereas, with 2 and 4 frequency bands, a trough of 35%-40% appeared at the 160-ms segment duration. Mosaic speech stimuli (power was averaged over a time-frequency unit) yielded generally poor intelligibility ( ⩽10%). The results revealed the limitations of underlying auditory organization for speech cues scattered in a time-frequency domain.
Collapse
|
15
|
Phonemic restoration of interrupted locally time-reversed speech : Effects of segment duration and noise levels. Atten Percept Psychophys 2021; 83:1928-1934. [PMID: 33851359 PMCID: PMC8213671 DOI: 10.3758/s13414-021-02292-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2021] [Indexed: 11/08/2022]
Abstract
Intelligibility of temporally degraded speech was investigated with locally time-reversed speech (LTR) and its interrupted version (ILTR). Control stimuli comprising interrupted speech (I) were also included. Speech stimuli consisted of 200 Japanese meaningful sentences. In interrupted stimuli, speech segments were alternated with either silent gaps or pink noise bursts. The noise bursts had a level of - 10, 0 or + 10 dB relative to the speech level. Segment duration varied from 20 to 160 ms for ILTR sentences, but was fixed at 160 ms for I sentences. At segment durations between 40 and 80 ms, severe reductions in intelligibility were observed for ILTR sentences, compared with LTR sentences. A substantial improvement in intelligibility (30-33%) was observed when 40-ms silent gaps in ILTR were replaced with 0- and + 10-dB noise. Noise with a level of - 10 dB had no effect on the intelligibility. These findings show that the combined effects of interruptions and temporal reversal of speech segments on intelligibility are greater than the sum of each individual effect. The results also support the idea that illusory continuity induced by high-level noise bursts improves the intelligibility of ILTR and I sentences.
Collapse
|
16
|
Stimulus-evoked phase-locked activity along the human auditory pathway strongly varies across individuals. Sci Rep 2021; 11:143. [PMID: 33420231 PMCID: PMC7794304 DOI: 10.1038/s41598-020-80229-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 12/14/2020] [Indexed: 11/23/2022] Open
Abstract
Phase-locking to the temporal envelope of speech is associated with envelope processing and speech perception. The phase-locked activity of the auditory pathway, across modulation frequencies, is generally assessed at group level and shows a decrease in response magnitude with increasing modulation frequency. With the exception of increased activity around 40 and 80 to 100 Hz. Furthermore, little is known about the phase-locked response patterns to modulation frequencies ≤ 20 Hz, which are modulations predominately present in the speech envelope. In the present study we assess the temporal modulation transfer function (TMTFASSR) of the phase-locked activity of the auditory pathway, from 0.5 to 100 Hz at a high-resolution and by means of auditory steady-state responses. Although the group-averaged TMTFASSR corresponds well with those reported in the literature, the individual TMTFASSR shows a remarkable intersubject variability. This intersubject variability is especially present for ASSRs that originate from the cortex and are evoked with modulation frequencies ≤ 20 Hz. Moreover, we found that these cortical phase-locked activity patterns are robust over time. These results show the importance of the individual TMTFASSR when assessing phase-locked activity to envelope fluctuations, which can potentially be used as a marker for auditory processing.
Collapse
|
17
|
The relation between neurofunctional and neurostructural determinants of phonological processing in pre-readers. Dev Cogn Neurosci 2020; 46:100874. [PMID: 33130464 PMCID: PMC7606842 DOI: 10.1016/j.dcn.2020.100874] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 10/15/2020] [Accepted: 10/16/2020] [Indexed: 12/29/2022] Open
Abstract
Phonological processing skills are known as the most robust cognitive predictor of reading ability. Therefore, the neural determinants of phonological processing have been extensively investigated by means of either neurofunctional or neurostructural techniques. However, to fully understand how the brain represents and processes phonological information, there is need for studies that combine both methods. The present study applies such a multimodal approach with the aim of investigating the pre-reading relation between neural measures of auditory temporal processing, white matter properties of the reading network and phonological processing skills. We administered auditory steady-state responses, diffusion-weighted MRI scans and phonological awareness tasks in 59 pre-readers. Our results demonstrate that a stronger rightward lateralization of syllable-rate (4 Hz) processing coheres with higher fractional anisotropy in the left fronto-temporoparietal arcuate fasciculus. Both neural features each in turn relate to better phonological processing skills. As such, the current study provides novel evidence for the existence of a pre-reading relation between functional measures of syllable-rate processing, structural organization of the arcuate fasciculus and cognitive precursors of reading development. Moreover, our findings demonstrate the value of combining different neural techniques to gain insight in the underlying neural systems for reading (dis)ability.
Collapse
|
18
|
Intelligibility of chimeric locally time-reversed speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL523. [PMID: 32611166 DOI: 10.1121/10.0001414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 05/27/2020] [Indexed: 06/11/2023]
Abstract
The intelligibility of chimeric locally time-reversed speech was investigated. Both (1) the boundary frequency between the temporally degraded band and the non-degraded band and (2) the segment duration were varied. Japanese mora accuracy decreased if the width of the degraded band or the segment duration increased. Nevertheless, the chimeric stimuli were more intelligible than the locally time-reversed controls. The results imply that the auditory system can use both temporally degraded speech information and undamaged speech information over different frequency regions in the processing of the speech signal, if the amplitude envelope in the frequency range of 840-1600 Hz was preserved.
Collapse
|
19
|
Asymmetric sampling in human auditory cortex reveals spectral processing hierarchy. PLoS Biol 2020; 18:e3000207. [PMID: 32119667 PMCID: PMC7067489 DOI: 10.1371/journal.pbio.3000207] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 03/12/2020] [Accepted: 02/13/2020] [Indexed: 11/18/2022] Open
Abstract
Speech perception is mediated by both left and right auditory cortices but with differential sensitivity to specific acoustic information contained in the speech signal. A detailed description of this functional asymmetry is missing, and the underlying models are widely debated. We analyzed cortical responses from 96 epilepsy patients with electrode implantation in left or right primary, secondary, and/or association auditory cortex (AAC). We presented short acoustic transients to noninvasively estimate the dynamical properties of multiple functional regions along the auditory cortical hierarchy. We show remarkably similar bimodal spectral response profiles in left and right primary and secondary regions, with evoked activity composed of dynamics in the theta (around 4–8 Hz) and beta–gamma (around 15–40 Hz) ranges. Beyond these first cortical levels of auditory processing, a hemispheric asymmetry emerged, with delta and beta band (3/15 Hz) responsivity prevailing in the right hemisphere and theta and gamma band (6/40 Hz) activity prevailing in the left. This asymmetry is also present during syllables presentation, but the evoked responses in AAC are more heterogeneous, with the co-occurrence of alpha (around 10 Hz) and gamma (>25 Hz) activity bilaterally. These intracranial data provide a more fine-grained and nuanced characterization of cortical auditory processing in the 2 hemispheres, shedding light on the neural dynamics that potentially shape auditory and speech processing at different levels of the cortical hierarchy. Capitalizing on intracranial data from 96 epileptic patients, this study precisely estimates the processing timescales along the cortical auditory hierarchy and reveals that an asymmetric sampling emerges in associative areas.
Collapse
|
20
|
Irrelevant speech effects with locally time-reversed speech: Native vs non-native language. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:3686. [PMID: 31255145 DOI: 10.1121/1.5112774] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 06/03/2019] [Indexed: 06/09/2023]
Abstract
Irrelevant speech is known to interfere with short-term memory of visually presented items. Here, this irrelevant speech effect was studied with a factorial combination of three variables: the participants' native language, the language the irrelevant speech was derived from, and the playback direction of the irrelevant speech. We used locally time-reversed speech as well to disentangle the contributions of local and global integrity. German and Japanese speech was presented to German (n = 79) and Japanese (n = 81) participants while participants were performing a serial-recall task. In both groups, any kind of irrelevant speech impaired recall accuracy as compared to a pink-noise control condition. When the participants' native language was presented, normal speech and locally time-reversed speech with short segment duration, preserving intelligibility, was the most disruptive. Locally time-reversed speech with longer segment durations and normal or locally time-reversed speech played entirely backward, both lacking intelligibility, was less disruptive. When the unfamiliar, incomprehensible signal was presented as irrelevant speech, no significant difference was found between locally time-reversed speech and its globally inverted version, suggesting that the effect of global inversion depends on the familiarity of the language.
Collapse
|
21
|
The effects of periodic interruptions on cortical entrainment to speech. Neuropsychologia 2018; 121:58-68. [DOI: 10.1016/j.neuropsychologia.2018.10.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 09/19/2018] [Accepted: 10/24/2018] [Indexed: 11/21/2022]
|
22
|
Development of perception and perceptual learning for multi-timescale filtered speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:667. [PMID: 30180675 DOI: 10.1121/1.5049369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 07/18/2018] [Indexed: 06/08/2023]
Abstract
The perception of temporally changing auditory signals has a gradual developmental trajectory. Speech is a time-varying signal, and slow changes in speech (filtered at 0-4 Hz) are preferentially processed by the right hemisphere, while the left extracts faster changes (filtered at 22-40 Hz). This work examined the ability of 8- to 19-year-olds to both perceive and learn to perceive filtered speech presented diotically for each filter type (low vs high) and dichotically for preferred or non-preferred laterality. Across conditions, performance improved with increasing age, indicating that the ability to perceive filtered speech continues to develop into adolescence. Across age, performance was best when both bands were presented dichotically, but with no benefit for presentation to the preferred hemisphere. Listeners thus integrated slow and fast transitions between the two ears, benefitting from more signal information, but not in a hemisphere-specific manner. After accounting for potential ceiling effects, learning was greatest when both bands were presented dichotically. These results do not support the idea that cochlear implants could be improved by providing differentially filtered information to each ear. Listeners who started with poorer performance learned more, a factor which could contribute to the positive cochlear implant outcomes typically seen in younger children.
Collapse
|
23
|
Neural envelope encoding predicts speech perception performance for normal-hearing and hearing-impaired adults. Hear Res 2018; 370:189-200. [PMID: 30131201 DOI: 10.1016/j.heares.2018.07.012] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 07/19/2018] [Accepted: 07/25/2018] [Indexed: 10/28/2022]
Abstract
Peripheral hearing impairment cannot fully account for speech perception difficulties that emerge with advancing age. As the fluctuating speech envelope bears crucial information for speech perception, changes in temporal envelope processing are thought to contribute to degraded speech perception. Previous research has demonstrated changes in neural encoding of envelope modulations throughout the adult lifespan, either due to age or due to hearing impairment. To date, however, it remains unclear whether such age- and hearing-related neural changes are associated with impaired speech perception. In the present study, we investigated the potential relationship between perception of speech in different types of masking sounds and neural envelope encoding for a normal-hearing and hearing-impaired adult population including young (20-30 years), middle-aged (50-60 years), and older (70-80 years) people. Our analyses show that enhanced neural envelope encoding in the cortex and in the brainstem, respectively, is related to worse speech perception for normal-hearing and for hearing-impaired adults. This neural-behavioral correlation is found for the three age groups and appears to be independent of the type of masking noise, i.e., background noise or competing speech. These findings provide promising directions for future research aiming to develop advanced rehabilitation strategies for speech perception difficulties that emerge throughout adult life.
Collapse
|
24
|
Temporal Resolution Needed for Auditory Communication: Measurement With Mosaic Speech. Front Hum Neurosci 2018; 12:149. [PMID: 29740295 PMCID: PMC5928238 DOI: 10.3389/fnhum.2018.00149] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 04/04/2018] [Indexed: 11/19/2022] Open
Abstract
Temporal resolution needed for Japanese speech communication was measured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for healthy listeners to perceive speech is introduced. As a first step, we report listeners' intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called “mosaic speech”: speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of ~40 ms into vivid motion.
Collapse
|
25
|
The role of phase synchronisation between low frequency amplitude modulations in child phonology and morphology speech tasks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1366. [PMID: 29604710 PMCID: PMC6485374 DOI: 10.1121/1.5026239] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recent models of the neural encoding of speech suggest a core role for amplitude modulation (AM) structure, particularly regarding AM phase alignment. Accordingly, speech tasks that measure linguistic development in children may exhibit systematic properties regarding AM structure. Here, the acoustic structure of spoken items in child phonological and morphological tasks, phoneme deletion and plural elicitation, was investigated. The phase synchronisation index (PSI), reflecting the degree of phase alignment between pairs of AMs, was computed for 3 AM bands (delta, theta, beta/low gamma; 0.9-2.5 Hz, 2.5-12 Hz, 12-40 Hz, respectively), for five spectral bands covering 100-7250 Hz. For phoneme deletion, data from 94 child participants with and without dyslexia was used to relate AM structure to behavioural performance. Results revealed that a significant change in magnitude of the phase synchronisation index (ΔPSI) of slower AMs (delta-theta) systematically accompanied both phoneme deletion and plural elicitation. Further, children with dyslexia made more linguistic errors as the delta-theta ΔPSI increased. Accordingly, ΔPSI between slower temporal modulations in the speech signal systematically distinguished test items from accurate responses and predicted task performance. This may suggest that sensitivity to slower AM information in speech is a core aspect of phonological and morphological development.
Collapse
|
26
|
Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 2017; 28:161-169.e5. [PMID: 29290557 DOI: 10.1016/j.cub.2017.11.033] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 10/26/2017] [Accepted: 11/15/2017] [Indexed: 01/02/2023]
Abstract
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and acoustic speech signal, listening task, and speech intelligibility have been observed repeatedly. However, a methodological bottleneck has prevented so far clarifying whether speech-brain entrainment contributes functionally to (i.e., causes) speech intelligibility or is merely an epiphenomenon of it. To address this long-standing issue, we experimentally manipulated speech-brain entrainment without concomitant acoustic and task-related variations, using a brain stimulation approach that enables modulating listeners' neural activity with transcranial currents carrying speech-envelope information. Results from two experiments involving a cocktail-party-like scenario and a listening situation devoid of aural speech-amplitude envelope input reveal consistent effects on listeners' speech-recognition performance, demonstrating a causal role of speech-brain entrainment in speech intelligibility. Our findings imply that speech-brain entrainment is critical for auditory speech comprehension and suggest that transcranial stimulation with speech-envelope-shaped currents can be utilized to modulate speech comprehension in impaired listening conditions.
Collapse
|
27
|
The Role of Slow Speech Amplitude Envelope for Speech Processing and Reading Development. Front Psychol 2017; 8:1497. [PMID: 28912746 PMCID: PMC5583220 DOI: 10.3389/fpsyg.2017.01497] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 08/18/2017] [Indexed: 11/13/2022] Open
Abstract
This study examined the putative link between the entrainment to the slow rhythmic structure of speech, speech intelligibility and reading by means of a behavioral paradigm. Two groups of 20 children (Grades 2 and 5) were asked to recall a pseudoword embedded in sentences presented either in quiet or noisy listening conditions. Half of the sentences were primed with their syllabic and prosodic amplitude envelope to determine whether a boost in auditory entrainment to these speech features enhanced pseudoword intelligibility. Priming improved pseudoword recall performance only for the older children both in a quiet and a noisy listening environment, and such benefit from the prime correlated with reading skills and pseudoword recall. Our results support the role of syllabic and prosodic tracking of speech in reading development.
Collapse
|
28
|
Abstract
The temporal modulation structure of adult-directed speech (ADS) is thought to be encoded by neuronal oscillations in the auditory cortex that fluctuate at different temporal rates. Oscillatory activity is thought to phase-align to amplitude modulations in speech at corresponding rates, thereby supporting parsing of the signal into linguistically relevant units. The temporal modulation structure of infant-directed speech (IDS) is unexplored. Here we compare the amplitude modulation (AM) structure of IDS recorded from mothers speaking, over three occasions, to their 7-, 9-, and 11-month-old infants, and the same mothers speaking ADS. Analysis of the modulation spectrum in each case revealed that modulation energy in the theta band was significantly greater in ADS than in IDS, whereas in the delta band, modulation energy was significantly greater for IDS than ADS. Furthermore, phase alignment between delta- and theta-band AMs was stronger in IDS compared to ADS. This remained the case when IDS and ADS were rate-normalized to control for differences in speech rate. These data indicate stronger rhythmic synchronization and acoustic temporal regularity in IDS compared to ADS, structural acoustic differences that may be important for early language learning.
Collapse
|
29
|
On the relevance of auditory-based Gabor features for deep learning in robust speech recognition. COMPUT SPEECH LANG 2017. [DOI: 10.1016/j.csl.2017.02.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
Abstract
The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brain's capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep.SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech parsing are also preserved. We used a novel approach for studying the depth of speech processing across wakefulness and sleep while tracking neuronal activity with EEG. We found that responses to the auditory sound stream remained intact; however, the sleeping brain did not show signs of hierarchical parsing of the continuous stream of syllables into words, phrases, and sentences. The results suggest that sleep imposes a functional barrier between basic sensory processing and high-level cognitive processing. This paradigm also holds promise for studying residual cognitive abilities in a wide array of unresponsive states.
Collapse
|
31
|
Intelligibility of locally time-reversed speech: A multilingual comparison. Sci Rep 2017; 7:1782. [PMID: 28496124 PMCID: PMC5431844 DOI: 10.1038/s41598-017-01831-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 04/05/2017] [Indexed: 11/08/2022] Open
Abstract
A set of experiments was performed to make a cross-language comparison of intelligibility of locally time-reversed speech, employing a total of 117 native listeners of English, German, Japanese, and Mandarin Chinese. The experiments enabled to examine whether the languages of three types of timing-stress-, syllable-, and mora-timed languages-exhibit different trends in intelligibility, depending on the duration of the segments that were temporally reversed. The results showed a strikingly similar trend across languages, especially when the time axis of segment duration was normalised with respect to the deviation of a talker's speech rate from the average in each language. This similarity is somewhat surprising given the systematic differences in vocalic proportions characterising the languages studied which had been shown in previous research and were largely replicated with the present speech material. These findings suggest that a universal temporal window shorter than 20-40 ms plays a crucial role in perceiving locally time-reversed speech by working as a buffer in which temporal reorganisation can take place with regard to lexical and semantic processing.
Collapse
|
32
|
Abstract
Natural sounds contain information on multiple timescales, so the auditory system must analyze and integrate acoustic information on those different scales to extract behaviorally relevant information. However, this multi-scale process in the auditory system is not widely investigated in the literature, and existing models of temporal integration are mainly built upon detection or recognition tasks on a single timescale. Here we use a paradigm requiring processing on relatively 'local' and 'global' scales and provide evidence suggesting that the auditory system extracts fine-detail acoustic information using short temporal windows and uses long temporal windows to abstract global acoustic patterns. Behavioral task performance that requires processing fine-detail information does not improve with longer stimulus length, contrary to predictions of previous temporal integration models such as the multiple-looks and the spectro-temporal excitation pattern model. Moreover, the perceptual construction of putatively 'unitary' auditory events requires more than hundreds of milliseconds. These findings support the hypothesis of a dual-scale processing likely implemented in the auditory cortex.
Collapse
|
33
|
Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments. Front Psychol 2016; 7:791. [PMID: 27303348 PMCID: PMC4885376 DOI: 10.3389/fpsyg.2016.00791] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 05/11/2016] [Indexed: 12/03/2022] Open
Abstract
Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.
Collapse
|
34
|
Abstract
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
Collapse
|