1
|
Sheth J, Collina JS, Piasini E, Kording KP, Cohen YE, Geffen MN. The interplay of uncertainty, relevance and learning influences auditory categorization. Sci Rep 2025; 15:3348. [PMID: 39870756 PMCID: PMC11772889 DOI: 10.1038/s41598-025-86856-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 01/14/2025] [Indexed: 01/29/2025] Open
Abstract
Auditory perception requires categorizing sound sequences, such as speech or music, into classes, such as syllables or notes. Auditory categorization depends not only on the acoustic waveform, but also on variability and uncertainty in how the listener perceives the sound - including sensory and stimulus uncertainty, the listener's estimated relevance of the particular sound to the task, and their ability to learn the past statistics of the acoustic environment. Whereas these factors have been studied in isolation, whether and how these factors interact to shape categorization remains unknown. Here, we measured human participants' performance on a multi-tone categorization task and modeled each participant's behavior using a Bayesian framework. Task-relevant tones contributed more to category choice than task-irrelevant tones, confirming that participants combined information about sensory features with task relevance. Conversely, participants' poor estimates of task-relevant tones or high-sensory uncertainty adversely impacted category choice. Learning the statistics of sound category over both short and long timescales also affected decisions, biasing the decisions toward the overrepresented category. The magnitude of this effect correlated inversely with participants' relevance estimates. Our results demonstrate that individual participants idiosyncratically weigh sensory uncertainty, task relevance, and statistics over both short and long timescales, providing a novel understanding of and a computational framework for how sensory decisions are made under several simultaneous behavioral demands.
Collapse
Affiliation(s)
- Janaki Sheth
- Department of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jared S Collina
- Neuroscience Graduate Group, University of Pennsylvania, Philadelphia, PA, USA
| | - Eugenio Piasini
- Department of Neuroscience, International School for Advanced Studies, Trieste, Italy
| | - Konrad P Kording
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
| | - Yale E Cohen
- Department of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA, USA
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
| | - Maria N Geffen
- Department of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Neuroscience, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
2
|
Lutfi RA, Zandona M, Lee J. Simultaneous relative cue reliance in speech-on-speech masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2530-2538. [PMID: 37870932 PMCID: PMC10708949 DOI: 10.1121/10.0021874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/25/2023]
Abstract
Modern hearing research has identified the ability of listeners to segregate simultaneous speech streams with a reliance on three major voice cues, fundamental frequency, level, and location. Few of these studies evaluated reliance for these cues presented simultaneously as occurs in nature, and fewer still considered the listeners' relative reliance on these cues owing to the cues' different units of measure. In the present study trial-by-trial analyses were used to isolate the listener's simultaneous reliance on the three voice cues, with the behavior of an ideal observer [Green and Swets (1966). (Wiley, New York), pp.151-178] serving as a comparison standard for evaluating relative reliance. Listeners heard on each trial a pair of randomly selected, simultaneous recordings of naturally spoken sentences. One of the recordings was always from the same talker, a distracter, and the other, with equal probability, was from one of two target talkers differing in the three voice cues. The listener's task was to identify the target talker. Among 33 clinically normal-hearing adults only one relied predominantly on voice level, the remaining were split between voice fundamental frequency and/or location. The results are discussed regarding their implications for the common practice in studies of using target-distracter level as a dependent measure of speech-on-speech masking.
Collapse
Affiliation(s)
- R A Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - M Zandona
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - J Lee
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| |
Collapse
|
3
|
Lutfi RA, Rodriguez B, Lee J. The Listener Effect in Multitalker Speech Segregation and Talker Identification. Trends Hear 2021; 25:23312165211051886. [PMID: 34693853 PMCID: PMC8544763 DOI: 10.1177/23312165211051886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 09/20/2021] [Indexed: 12/04/2022] Open
Abstract
Over six decades ago, Cherry (1953) drew attention to what he called the "cocktail-party problem"; the challenge of segregating the speech of one talker from others speaking at the same time. The problem has been actively researched ever since but for all this time one observation has eluded explanation. It is the wide variation in performance of individual listeners. That variation was replicated here for four major experimental factors known to impact performance: differences in task (talker segregation vs. identification), differences in the voice features of talkers (pitch vs. location), differences in the voice similarity and uncertainty of talkers (informational masking), and the presence or absence of linguistic cues. The effect of these factors on the segregation of naturally spoken sentences and synthesized vowels was largely eliminated in psychometric functions relating the performance of individual listeners to that of an ideal observer, d'ideal. The effect of listeners remained as differences in the slopes of the functions (fixed effect) with little within-listener variability in the estimates of slope (random effect). The results make a case for considering the listener a factor in multitalker segregation and identification equal in status to any major experimental variable.
Collapse
Affiliation(s)
- Robert A. Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Briana Rodriguez
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Jungmee Lee
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| |
Collapse
|