1
|
Wasiuk PA, Calandruccio L, Oleson JJ, Buss E. Predicting speech-in-speech recognition: Short-term audibility and spatial separation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1827-1837. [PMID: 37728286 DOI: 10.1121/10.0021069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/28/2023] [Indexed: 09/21/2023]
Abstract
Quantifying the factors that predict variability in speech-in-speech recognition represents a fundamental challenge in auditory science. Stimulus factors associated with energetic and informational masking (IM) modulate variability in speech-in-speech recognition, but energetic effects can be difficult to estimate in spectro-temporally dynamic speech maskers. The current experiment characterized the effects of short-term audibility and differences in target and masker location (or perceived location) on the horizontal plane for sentence recognition in two-talker speech. Thirty young adults with normal hearing (NH) participated. Speech reception thresholds and keyword recognition at a fixed signal-to-noise ratio (SNR) were measured in each spatial condition. Short-term audibility for each keyword was quantified using a glimpsing model. Results revealed that speech-in-speech recognition depended on the proportion of audible glimpses available in the target + masker keyword stimulus in each spatial condition, even across stimuli presented at a fixed global SNR. Short-term audibility requirements were greater for colocated than spatially separated speech-in-speech recognition, and keyword recognition improved more rapidly as a function of increases in target audibility with spatial separation. Results indicate that spatial cues enhance glimpsing efficiency in competing speech for young adults with NH and provide a quantitative framework for estimating IM for speech-in-speech recognition in different spatial configurations.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Communication Disorders, 493 Fitch Street, Southern Connecticut State University, New Haven, Connecticut 06515, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Jacob J Oleson
- Department of Biostatistics, 145 North Riverside Drive N300, College of Public Health, University of Iowa, Iowa City, Iowa 52242, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
2
|
Flaherty MM, Buss E, Libert K. Effects of Target and Masker Fundamental Frequency Contour Depth on School-Age Children's Speech Recognition in a Two-Talker Masker. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:400-414. [PMID: 36580582 DOI: 10.1044/2022_jslhr-22-00207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
PURPOSE Maturation of the ability to recognize target speech in the presence of a two-talker speech masker extends into early adolescence. This study evaluated whether children benefit from differences in fundamental frequency (f o) contour depth between the target and masker speech, a cue that has been shown to improve recognition in adults. METHOD Speech stimuli were recorded from talkers using three speaking styles, with f o contour depths that were Flat, Normal, or Exaggerated. Targets were open-set, declarative sentences produced by a female talker, and maskers were two streams of concatenated sentences produced by a second female talker. Listeners were children (ages 5-17 years) and adults (ages 18-24 years) with normal hearing. Each listener was tested in one of the three masker styles paired with all three target styles. Speech recognition thresholds (SRTs) corresponding to 50% correct were estimated by fitting psychometric functions to adaptive track data. RESULTS For adults, performance did not differ significantly across conditions with matched speaking styles. A mismatch benefit was observed when combining Flat targets with the Exaggerated masker and Exaggerated targets with the Flat masker, and for both Flat and Exaggerated targets paired with the Normal masker. For children, there was a significant effect of age in all conditions. Flat targets in the Flat masker were associated with lower SRTs than the other two matched conditions, and a mismatch benefit was observed for young children only when the target f o contour was less variable than the masker f o contour. CONCLUSIONS Whereas child-directed speech often has exaggerated pitch contours, young children were better able to recognize speech with less variable f o. Age effects were observed in the benefit of mismatched speaking styles for some conditions, which could be related to differences in baseline SRTs rather than differences in segregation abilities.
Collapse
Affiliation(s)
- Mary M Flaherty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, The University of North Carolina at Chapel Hill
| | - Kelsey Libert
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| |
Collapse
|
3
|
Prud'homme L, Lavandier M, Best V. Investigating the role of harmonic cancellation in speech-on-speech masking. Hear Res 2022; 426:108562. [PMID: 35768309 PMCID: PMC9722527 DOI: 10.1016/j.heares.2022.108562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 06/15/2022] [Indexed: 11/30/2022]
Abstract
This study investigated the role of harmonic cancellation in the intelligibility of speech in "cocktail party" situations. While there is evidence that harmonic cancellation plays a role in the segregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing non-stationary F0s and unvoiced segments is unclear. Here we focused on the energetic masking of speech targets caused by competing speech maskers. Speech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmonic complexes, monotonized speech, noise-vocoded speech, reversed speech and natural speech. These maskers enabled an estimate of how the masking potential of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. Measured speech reception thresholds were compared to the predictions of two computational models, with and without a harmonic cancellation component. Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixtures.
Collapse
Affiliation(s)
- Luna Prud'homme
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| |
Collapse
|
4
|
Wasiuk PA, Buss E, Oleson JJ, Calandruccio L. Predicting speech-in-speech recognition: Short-term audibility, talker sex, and listener factors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3010. [PMID: 36456289 DOI: 10.1121/10.0015228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/01/2022] [Indexed: 06/17/2023]
Abstract
Speech-in-speech recognition can be challenging, and listeners vary considerably in their ability to accomplish this complex auditory-cognitive task. Variability in performance can be related to intrinsic listener factors as well as stimulus factors associated with energetic and informational masking. The current experiments characterized the effects of short-term audibility of the target, differences in target and masker talker sex, and intrinsic listener variables on sentence recognition in two-talker speech and speech-shaped noise. Participants were young adults with normal hearing. Each condition included the adaptive measurement of speech reception thresholds, followed by testing at a fixed signal-to-noise ratio (SNR). Short-term audibility for each keyword was quantified using a computational glimpsing model for target+masker mixtures. Scores on a psychophysical task of auditory stream segregation predicted speech recognition, with stronger effects for speech-in-speech than speech-in-noise. Both speech-in-speech and speech-in-noise recognition depended on the proportion of audible glimpses available in the target+masker mixture, even across stimuli presented at the same global SNR. Short-term audibility requirements varied systematically across stimuli, providing an estimate of the greater informational masking for speech-in-speech than speech-in-noise recognition and quantifying informational masking for matched and mismatched talker sex.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Jacob J Oleson
- Department of Biostatistics, 145 North Riverside Drive, University of Iowa, Iowa City, Iowa 52242, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
5
|
Lavandier M, Mason CR, Baltzell LS, Best V. Individual differences in speech intelligibility at a cocktail party: A modeling perspective. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1076. [PMID: 34470293 PMCID: PMC8561716 DOI: 10.1121/10.0005851] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/07/2021] [Accepted: 07/21/2021] [Indexed: 06/13/2023]
Abstract
This study aimed at predicting individual differences in speech reception thresholds (SRTs) in the presence of symmetrically placed competing talkers for young listeners with sensorineural hearing loss. An existing binaural model incorporating the individual audiogram was revised to handle severe hearing losses by (a) taking as input the target speech level at SRT in a given condition and (b) introducing a floor in the model to limit extreme negative better-ear signal-to-noise ratios. The floor value was first set using SRTs measured with stationary and modulated noises. The model was then used to account for individual variations in SRTs found in two previously published data sets that used speech maskers. The model accounted well for the variation in SRTs across listeners with hearing loss, based solely on differences in audibility. When considering listeners with normal hearing, the model could predict the best SRTs, but not the poorer SRTs, suggesting that other factors limit performance when audibility (as measured with the audiogram) is not compromised.
Collapse
Affiliation(s)
- Mathieu Lavandier
- Univ. Lyon, ENTPE, Laboratoire de Tribologie et Dynamique des Systèmes UMR 5513, Rue Maurice Audin, F-69518 Vaulx-en-Velin Cedex, France
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Lucas S Baltzell
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
6
|
Liu JS, Liu YW, Yu YF, Galvin JJ, Fu QJ, Tao DD. Segregation of competing speech in adults and children with normal hearing and in children with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:339. [PMID: 34340485 DOI: 10.1121/10.0005597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/22/2021] [Indexed: 06/13/2023]
Abstract
Children with normal hearing (CNH) have greater difficulty segregating competing speech than do adults with normal hearing (ANH). Children with cochlear implants (CCI) have greater difficulty segregating competing speech than do CNH. In the present study, speech reception thresholds (SRTs) in competing speech were measured in Chinese Mandarin-speaking ANH, CNH, and CCIs. Target sentences were produced by a male Mandarin-speaking talker. Maskers were time-forward or -reversed sentences produced by a native Mandarin-speaking male (different from the target) or female or a non-native English-speaking male. The SRTs were lowest (best) for the ANH group, followed by the CNH and CCI groups. The masking release (MR) was comparable between the ANH and CNH group, but much poorer in the CCI group. The temporal properties differed between the native and non-native maskers and between forward and reversed speech. The temporal properties of the maskers were significantly associated with the SRTs for the CCI and CNH groups but not for the ANH group. Whereas the temporal properties of the maskers were significantly associated with the MR for all three groups, the association was stronger for the CCI and CNH groups than for the ANH group.
Collapse
Affiliation(s)
- Ji-Sheng Liu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| | - Yang-Wenyi Liu
- Department of Otology and Skull Base Surgery, Eye Ear Nose and Throat Hospital, Fudan University, Shanghai 200031, China
| | - Ya-Feng Yu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| | - John J Galvin
- House Ear Institute, Los Angeles, California 90057, USA
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California 90095, USA
| | - Duo-Duo Tao
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| |
Collapse
|
7
|
Versfeld NJ, Lie S, Kramer SE, Zekveld AA. Informational masking with speech-on-speech intelligibility: Pupil response and time-course of learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2353. [PMID: 33940918 DOI: 10.1121/10.0003952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 03/09/2021] [Indexed: 06/12/2023]
Abstract
Previous research has shown a learning effect on speech perception in nonstationary maskers. The present study addressed the time-course of this learning effect and the role of informational masking. To that end, speech reception thresholds (SRTs) were measured for speech in either a stationary noise masker, an interrupted noise masker, or a single-talker masker. The utterance of the single talker was either time-forward (intelligible) or time-reversed (unintelligible), and the sample of the utterance was either frozen (same utterance at each presentation) or random (different utterance at each presentation but from the same speaker). Simultaneously, the pupil dilation response was measured to assess differences in the listening effort between conditions and to track changes in the listening effort over time within each condition. The results showed a learning effect for all conditions but the stationary noise condition-that is, improvement in SRT over time while maintaining equal pupil responses. There were no significant differences in pupil responses between conditions despite large differences in the SRT. Time reversal of the frozen speech affected neither the SRT nor pupil responses.
Collapse
Affiliation(s)
- Niek J Versfeld
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Sisi Lie
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Sophia E Kramer
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Adriana A Zekveld
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|