1
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
2
|
Becker R, Hervais-Adelman A. Individual theta-band cortical entrainment to speech in quiet predicts word-in-noise comprehension. Cereb Cortex Commun 2023; 4:tgad001. [PMID: 36726796 PMCID: PMC9883620 DOI: 10.1093/texcom/tgad001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 12/17/2022] [Accepted: 12/18/2022] [Indexed: 01/09/2023] Open
Abstract
Speech elicits brain activity time-locked to its amplitude envelope. The resulting speech-brain synchrony (SBS) is thought to be crucial to speech parsing and comprehension. It has been shown that higher speech-brain coherence is associated with increased speech intelligibility. However, studies depending on the experimental manipulation of speech stimuli do not allow conclusion about the causality of the observed tracking. Here, we investigate whether individual differences in the intrinsic propensity to track the speech envelope when listening to speech-in-quiet is predictive of individual differences in speech-recognition-in-noise, in an independent task. We evaluated the cerebral tracking of speech in source-localized magnetoencephalography, at timescales corresponding to the phrases, words, syllables and phonemes. We found that individual differences in syllabic tracking in right superior temporal gyrus and in left middle temporal gyrus (MTG) were positively associated with recognition accuracy in an independent words-in-noise task. Furthermore, directed connectivity analysis showed that this relationship is partially mediated by top-down connectivity from premotor cortex-associated with speech processing and active sensing in the auditory domain-to left MTG. Thus, the extent of SBS-even during clear speech-reflects an active mechanism of the speech processing system that may confer resilience to noise.
Collapse
Affiliation(s)
- Robert Becker
- Corresponding author: Neurolinguistics, Department of Psychology, University of Zurich (UZH), Zurich, Switzerland.
| | - Alexis Hervais-Adelman
- Neurolinguistics, Department of Psychology, University of Zurich, Zurich 8050, Switzerland,Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, Zurich 8057, Switzerland
| |
Collapse
|
3
|
Nusseck M, Immerz A, Richter B, Traser L. Vocal Behavior of Teachers Reading with Raised Voice in a Noisy Environment. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19158929. [PMID: 35897294 PMCID: PMC9331438 DOI: 10.3390/ijerph19158929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 07/20/2022] [Accepted: 07/21/2022] [Indexed: 01/27/2023]
Abstract
(1) Objective: Teaching is a particularly voice-demanding occupation. Voice training provided during teachers’ education is often insufficient and thus teachers are at risk of developing voice disorders. Vocal demands during teaching are not only characterized by speaking for long durations but also by speaking in noisy environments. This provokes the so-called Lombard effect, which intuitively leads to an increase in voice intensity, pitch and phonation time in laboratory studies. However, this effect has not been thoroughly investigated in realistic teaching scenarios. (2) Methods: This study thus examined how 13 experienced, but vocally untrained, teachers behaved when reading in a noisy compared to quiet background environment. The quiet and noisy conditions were provided by a live audience either listening quietly or making noise by talking to each other. By using a portable voice accumulator, the fundamental frequency, sound pressure level of the voice and the noise as well as the phonation time were recorded in both conditions. (3) Results: The results showed that the teachers mainly responded according to the Lombard effect. In addition, analysis of phonation time revealed that they failed to increase inhalation time and appeared to lose articulation through the shortening of voiceless consonants in the noisy condition. (4) Conclusions: The teachers demonstrated vocally demanding behavior when speaking in the noisy condition, which can lead to vocal fatigue and cause dysphonia. The findings underline the necessity for specific voice training in teachers’ education, and the content of such training is discussed in light of the results.
Collapse
|
4
|
Ten Oever S, Martin AE. An oscillating computational model can track pseudo-rhythmic speech by using linguistic predictions. eLife 2021; 10:68066. [PMID: 34338196 PMCID: PMC8328513 DOI: 10.7554/elife.68066] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 07/16/2021] [Indexed: 11/19/2022] Open
Abstract
Neuronal oscillations putatively track speech in order to optimize sensory processing. However, it is unclear how isochronous brain oscillations can track pseudo-rhythmic speech input. Here we propose that oscillations can track pseudo-rhythmic speech when considering that speech time is dependent on content-based predictions flowing from internal language models. We show that temporal dynamics of speech are dependent on the predictability of words in a sentence. A computational model including oscillations, feedback, and inhibition is able to track pseudo-rhythmic speech input. As the model processes, it generates temporal phase codes, which are a candidate mechanism for carrying information forward in time. The model is optimally sensitive to the natural temporal speech dynamics and can explain empirical data on temporal speech illusions. Our results suggest that speech tracking does not have to rely only on the acoustics but could also exploit ongoing interactions between oscillations and constraints flowing from internal language models.
Collapse
Affiliation(s)
- Sanne Ten Oever
- Language and Computation in Neural Systems group, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.,Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, Netherlands.,Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Andrea E Martin
- Language and Computation in Neural Systems group, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.,Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
5
|
Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behav Res Methods 2021; 53:1945-1953. [PMID: 33694079 PMCID: PMC8516752 DOI: 10.3758/s13428-021-01542-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2021] [Indexed: 11/08/2022]
Abstract
Many studies of speech perception assess the intelligibility of spoken sentence stimuli by means of transcription tasks ('type out what you hear'). The intelligibility of a given stimulus is then often expressed in terms of percentage of words correctly reported from the target sentence. Yet scoring the participants' raw responses for words correctly identified from the target sentence is a time-consuming task, and hence resource-intensive. Moreover, there is no consensus among speech scientists about what specific protocol to use for the human scoring, limiting the reliability of human scores. The present paper evaluates various forms of fuzzy string matching between participants' responses and target sentences, as automated metrics of listener transcript accuracy. We demonstrate that one particular metric, the token sort ratio, is a consistent, highly efficient, and accurate metric for automated assessment of listener transcripts, as evidenced by high correlations with human-generated scores (best correlation: r = 0.940) and a strong relationship to acoustic markers of speech intelligibility. Thus, fuzzy string matching provides a practical tool for assessment of listener transcript accuracy in large-scale speech intelligibility studies. See https://tokensortratio.netlify.app for an online implementation.
Collapse
|
6
|
Villegas J, Perkins J, Wilson I. Effects of task and language nativeness on the Lombard effect and on its onset and offset timing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1855. [PMID: 33765802 DOI: 10.1121/10.0003772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 02/24/2021] [Indexed: 06/12/2023]
Abstract
This study focuses on the differences in speech sound pressure levels (here, called speech loudness) of Lombard speech (i.e., speech produced in the presence of an energetic masker) associated with different tasks and language nativeness. Vocalizations were produced by native speakers of Japanese with normal hearing and limited English proficiency while performing four tasks: dialog, a competitive game (both communicative), soliloquy, and text passage reading (noncommunicative). Relative to the native language (L1), larger loudness increments were observed in the game and text reading when performed in the second language (L2). Communicative tasks yielded louder vocalizations and larger increments of speech loudness than did noncommunicative tasks regardless of the spoken language. The period in which speakers increased their loudness after the onset of the masker was about fourfold longer than the time in which they decreased their loudness after the offset of the masker. Results suggest that when relying on acoustic signals, speakers use similar vocalization strategies in L1 and L2, and these depend on the complexity of the task, the need for accurate pronunciation, and the presence of a listener. Results also suggest that speakers use different strategies depending on the onset or offset of an energetic masker.
Collapse
Affiliation(s)
- Julián Villegas
- Computer Arts Laboratory, University of Aizu, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
| | - Jeremy Perkins
- CLR Phonetics Laboratory, University of Aizu, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
| | - Ian Wilson
- CLR Phonetics Laboratory, University of Aizu, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
| |
Collapse
|
7
|
Lu M, Zhang G, Luo J. Echolocating bats exhibit differential amplitude compensation for noise interference at a sub-call level. J Exp Biol 2020; 223:jeb225284. [PMID: 32843365 DOI: 10.1242/jeb.225284] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 08/15/2020] [Indexed: 11/20/2022]
Abstract
Flexible vocal production control enables sound communication in both favorable and unfavorable conditions. The Lombard effect, which describes a rise in call amplitude with increasing ambient noise, is a widely exploited strategy by vertebrates to cope with interfering noise. In humans, the Lombard effect influences the lexical stress through differential amplitude modulation at a sub-call syllable level, which so far has not been documented in animals. Here, we bridge this knowledge gap with two species of Hipposideros bats, which produce echolocation calls consisting of two functionally well-defined units: the constant-frequency (CF) and frequency-modulated (FM) components. We show that ambient noise induced a strong, but differential, Lombard effect in the CF and FM components of the echolocation calls. We further report that the differential amplitude compensation occurred only in the spectrally overlapping noise conditions, suggesting a functional role in releasing masking. Lastly, we show that both species of bats exhibited a robust Lombard effect in the spectrally non-overlapping noise conditions, which contrasts sharply with the existing evidence. Our data highlight echolocating bats as a potential mammalian model for understanding vocal production control.
Collapse
Affiliation(s)
- Manman Lu
- School of Life Sciences and Hubei Key Lab of Genetic Regulation & Integrative Biology, Central China Normal University, Wuhan 430079, China
| | - Guimin Zhang
- School of Life Sciences and Hubei Key Lab of Genetic Regulation & Integrative Biology, Central China Normal University, Wuhan 430079, China
| | - Jinhong Luo
- School of Life Sciences and Hubei Key Lab of Genetic Regulation & Integrative Biology, Central China Normal University, Wuhan 430079, China
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
8
|
Thoret E, Varnet L, Boubenec Y, Férriere R, Le Tourneau FM, Krause B, Lorenzi C. Characterizing amplitude and frequency modulation cues in natural soundscapes: A pilot study on four habitats of a biosphere reserve. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3260. [PMID: 32486802 DOI: 10.1121/10.0001174] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/13/2020] [Indexed: 06/11/2023]
Abstract
Natural soundscapes correspond to the acoustical patterns produced by biological and geophysical sound sources at different spatial and temporal scales for a given habitat. This pilot study aims to characterize the temporal-modulation information available to humans when perceiving variations in soundscapes within and across natural habitats. This is addressed by processing soundscapes from a previous study [Krause, Gage, and Joo. (2011). Landscape Ecol. 26, 1247] via models of human auditory processing extracting modulation at the output of cochlear filters. The soundscapes represent combinations of elevation, animal, and vegetation diversity in four habitats of the biosphere reserve in the Sequoia National Park (Sierra Nevada, USA). Bayesian statistical analysis and support vector machine classifiers indicate that: (i) amplitude-modulation (AM) and frequency-modulation (FM) spectra distinguish the soundscapes associated with each habitat; and (ii) for each habitat, diurnal and seasonal variations are associated with salient changes in AM and FM cues at rates between about 1 and 100 Hz in the low (<0.5 kHz) and high (>1-3 kHz) audio-frequency range. Support vector machine classifications further indicate that soundscape variations can be classified accurately based on these perceptually inspired representations.
Collapse
Affiliation(s)
- Etienne Thoret
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| | - Léo Varnet
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| | - Yves Boubenec
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| | - Régis Férriere
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Université Paris Sciences et Lettres, CNRS, INSERM Paris, 75005, France
| | - François-Michel Le Tourneau
- International Center for Interdisciplinary Global Environmental Studies (iGLOBES), UMI 3157 CNRS, École normale supérieure, Université Paris Sciences et Lettres, University of Arizona, Tucson, Arizona 85721, USA
| | - Bernie Krause
- Wild Sanctuary, P.O. Box 536, Glen Ellen, California 95442, USA
| | - Christian Lorenzi
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| |
Collapse
|
9
|
Bosker HR, Cooke M. Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:721. [PMID: 32113258 DOI: 10.1121/10.0000646] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 01/10/2020] [Indexed: 06/10/2023]
Abstract
Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech produce more pronounced amplitude modulations in noise vs in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Psychology of Language department, Max Planck Institute for Psycholinguistics, Wundtlaan 1, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Martin Cooke
- Language and Speech Laboratory, Universidad del País Vasco, calle Justo Vélez de Elorriaga 1, Vitoria, 01006, Spain
| |
Collapse
|
10
|
Rollins MK, Leishman TW, Whiting JK, Hunter EJ, Eggett DL. Effects of added absorption on the vocal exertions of talkers in a reverberant room. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:775. [PMID: 30823814 PMCID: PMC6372363 DOI: 10.1121/1.5089891] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 01/13/2019] [Accepted: 01/21/2019] [Indexed: 06/09/2023]
Abstract
Occupational speech users such as schoolteachers develop voice disorders at higher rates than the general population. Previous research has suggested that room acoustics may influence these trends. The research reported in this paper utilized varying acoustical conditions in a reverberant room to assess the effects on vocal parameters of healthy talkers. Thirty-two participants were recorded while completing a battery of speech tasks under eight room conditions. Vocal parameters were derived from the recordings and the statistically significant effects of room acoustics were verified using mixed-model analysis of variance tests. Changes in reverberation time (T20), early decay time (EDT), clarity index (C50), speech transmission index (STI), and room gain (GRG) all showed highly correlated effects on certain vocal parameters, including speaking level standard deviation, speaking rate, and the acoustic vocal quality index. As T20, EDT, and GRG increased, and as C50 and STI decreased, vocal parameters showed tendencies toward dysphonic phonation. Empirically derived equations are proposed that describe the relationships between select room-acoustic parameters and vocal parameters. This study provides an increased understanding of the impact of room acoustics on voice production, which could assist acousticians in improving room designs to help mitigate unhealthy vocal exertion and, by extension, voice problems.
Collapse
Affiliation(s)
- Michael K Rollins
- Acoustics Research Group, Department of Physics and Astronomy, Brigham Young University, N283 Eyring Science Center, Provo, Utah 84602, USA
| | - Timothy W Leishman
- Acoustics Research Group, Department of Physics and Astronomy, Brigham Young University, N283 Eyring Science Center, Provo, Utah 84602, USA
| | - Jennifer K Whiting
- Acoustics Research Group, Department of Physics and Astronomy, Brigham Young University, N283 Eyring Science Center, Provo, Utah 84602, USA
| | - Eric J Hunter
- Department of Communicative Sciences and Disorders, Michigan State University, 113 Oyer Speech and Hearing Building, East Lansing, Michigan 48824, USA
| | - Dennis L Eggett
- Department of Statistics, Brigham Young University, 223 Talmage Math Computer Building, Provo, Utah 84602, USA
| |
Collapse
|
11
|
Garnier M, Ménard L, Alexandre B. Hyper-articulation in Lombard speech: An active communicative strategy to enhance visible speech cues? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1059. [PMID: 30180713 DOI: 10.1121/1.5051321] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 08/02/2018] [Indexed: 06/08/2023]
Abstract
This study investigates the hypothesis that speakers make active use of the visual modality in production to improve their speech intelligibility in noisy conditions. Six native speakers of Canadian French produced speech in quiet conditions and in 85 dB of babble noise, in three situations: interacting face-to-face with the experimenter (AV), using the auditory modality only (AO), or reading aloud (NI, no interaction). The audio signal was recorded with the three-dimensional movements of their lips and tongue, using electromagnetic articulography. All the speakers reacted similarly to the presence vs absence of communicative interaction, showing significant speech modifications with noise exposure in both interactive and non-interactive conditions, not only for parameters directly related to voice intensity or for lip movements (very visible) but also for tongue movements (less visible); greater adaptation was observed in interactive conditions, though. However, speakers reacted differently to the availability or unavailability of visual information: only four speakers enhanced their visible articulatory movements more in the AV condition. These results support the idea that the Lombard effect is at least partly a listener-oriented adaptation. However, to clarify their speech in noisy conditions, only some speakers appear to make active use of the visual modality.
Collapse
Affiliation(s)
- Maëva Garnier
- Centre National de la Recherche Scientifique, Laboratoire Grenoble Images Parole Signal Automatique, 11 rue des Mathématiques, Grenoble Campus, Boîte Postale 46, F-38402 Saint Martin d'Hères Cedex, France
| | - Lucie Ménard
- Département de Linguistique, Laboratoire de Phonétique, Center for Research on Brain, Language, and Music, Université du Québec à Montréal, 320, Ste-Catherine Est, Montréal, Quebec H2X 1L7, Canada
| | - Boris Alexandre
- Centre National de la Recherche Scientifique, Laboratoire Grenoble Images Parole Signal Automatique, 11 rue des Mathématiques, Grenoble Campus, Boîte Postale 46, F-38402 Saint Martin d'Hères Cedex, France
| |
Collapse
|