1
|
Kazazis S, Depalle P, McAdams S. Ordinal scaling of temporal audio descriptors and perceptual significance of attack temporal centroid in timbre spaces. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3461. [PMID: 34852574 DOI: 10.1121/10.0006788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 10/05/2021] [Indexed: 06/13/2023]
Abstract
Temporal audio features play an important role in timbre perception and sound identification. An experiment was conducted to test whether listeners are able to rank order synthesized stimuli over a wide range of feature values restricted within the range of instrument sounds. The following audio descriptors were tested: attack and decay time, temporal centroid with fixed attack and decay time, and inharmonicity. The results indicate that these descriptors are susceptible to ordinal scaling. The spectral envelope played an important role when ordering stimuli with various inharmonicity levels, whereas the shape of the amplitude envelope was an important parameter when ordering stimuli with different attack and decay times. Linear amplitude envelopes made the ordering of attack times easier and caused the least amount of confusion among listeners, whereas exponential envelopes were more effective when ordering decay times. Although there were many confusions in ordering short attack and decay times, listeners performed well in ordering temporal centroids even at very short attack and decay times. A meta-analysis of six timbre spaces was therefore conducted to test the explanatory power of attack time versus the attack temporal centroid along a perceptual dimension. The results indicate that attack temporal centroid has greater overall explanatory power than attack time itself.
Collapse
Affiliation(s)
- Savvas Kazazis
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| | - Philippe Depalle
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| | - Stephen McAdams
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| |
Collapse
|
2
|
Schutz M, Stefanucci JK, Sarah HB, Roth A. Name that Percussive Tune: Associative Memory and Amplitude Envelope. Q J Exp Psychol (Hove) 2017; 70:1323-1343. [DOI: 10.1080/17470218.2016.1182562] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
A series of experiments demonstrated novel effects of amplitude envelope on associative memory, with tones exhibiting naturally decaying amplitude envelopes (e.g., those made by two wine glasses clinking) better associated with target objects than amplitude-invariant tones. In Experiment 1 participants learned associations between household objects and 4-note tone sequences constructed of spectrally matched pure tones with either “flat” or “percussive” amplitude envelopes. Those hearing percussive tones correctly recalled significantly more sequence–object associations. Experiment 2 demonstrated that participants hearing percussive tones learned the associations more quickly. Experiment 3 used “reverse percussive” tones (percussive tones played backwards) to test whether differences in overall energy might account for this effect, finding they did not lead to the same level of performance as percussive tones. Experiment 4 varied the envelope at encoding and retrieval to determine which stage of the task was most affected by the envelope manipulation. Participants hearing percussive tones at both encoding and retrieval performed significantly better than the other three groups (i.e., flat at encoding/percussive at retrieval, etc.). We conclude that amplitude envelope plays an important role in learning and memory, a finding with relevance to psychological research on audition and associative memory, as well as practical relevance for improving human–computer interface design.
Collapse
Affiliation(s)
- Michael Schutz
- School of the Arts, McMaster University, Hamilton, ON, Canada
| | | | - H. Baum Sarah
- Department of Psychology, The College of William & Mary, Williamsburg, VA, USA
| | - Amber Roth
- Department of Psychology, The College of William & Mary, Williamsburg, VA, USA
| |
Collapse
|
3
|
Tabas A, Siebert A, Supek S, Pressnitzer D, Balaguer-Ballester E, Rupp A. Insights on the Neuromagnetic Representation of Temporal Asymmetry in Human Auditory Cortex. PLoS One 2016; 11:e0153947. [PMID: 27096960 PMCID: PMC4838253 DOI: 10.1371/journal.pone.0153947] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 04/06/2016] [Indexed: 11/26/2022] Open
Abstract
Communication sounds are typically asymmetric in time and human listeners are highly sensitive to this short-term temporal asymmetry. Nevertheless, causal neurophysiological correlates of auditory perceptual asymmetry remain largely elusive to our current analyses and models. Auditory modelling and animal electrophysiological recordings suggest that perceptual asymmetry results from the presence of multiple time scales of temporal integration, central to the auditory periphery. To test this hypothesis we recorded auditory evoked fields (AEF) elicited by asymmetric sounds in humans. We found a strong correlation between perceived tonal salience of ramped and damped sinusoids and the AEFs, as quantified by the amplitude of the N100m dynamics. The N100m amplitude increased with stimulus half-life time, showing a maximum difference between the ramped and damped stimulus for a modulation half-life time of 4 ms which is greatly reduced at 0.5 ms and 32 ms. This behaviour of the N100m closely parallels psychophysical data in a manner that: i) longer half-life times are associated with a stronger tonal percept, and ii) perceptual differences between damped and ramped are maximal at 4 ms half-life time. Interestingly, differences in evoked fields were significantly stronger in the right hemisphere, indicating some degree of hemispheric specialisation. Furthermore, the N100m magnitude was successfully explained by a pitch perception model using multiple scales of temporal integration of auditory nerve activity patterns. This striking correlation between AEFs, perception, and model predictions suggests that the physiological mechanisms involved in the processing of pitch evoked by temporal asymmetric sounds are reflected in the N100m.
Collapse
Affiliation(s)
- Alejandro Tabas
- Faculty of Science and Technology, Bournemouth University, Bournemouth, England, United Kingdom
- * E-mail:
| | - Anita Siebert
- Institute of Pharmacology and Toxicology, University of Zurich, Zürich, Zürich, Switzerland
| | - Selma Supek
- Department of Physics, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - Daniel Pressnitzer
- Département d’Études Cognitives, École Normale Supérieure, Paris, France
| | - Emili Balaguer-Ballester
- Faculty of Science and Technology, Bournemouth University, Bournemouth, England, United Kingdom
- The Bernstein Center for Computational Neuroscience Heidelberg-Mannheim, Mannheim, Baden-Würtemberg, Germany
| | - André Rupp
- Department of Neurology, Heidelberg University, Heidelberg, Baden-Würtemberg, Germany
| |
Collapse
|
4
|
Rupp A, Spachmann A, Dettlaff A, Patterson RD. Cortical activity associated with the perception of temporal asymmetry in ramped and damped noises. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 787:427-33. [PMID: 23716249 DOI: 10.1007/978-1-4614-1590-9_47] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Human listeners are very sensitive to the asymmetry of time-reversed pairs of ramped and damped sounds. When the carrier is noise, the hiss -component of the perception is stronger in ramped sounds and the drumming component is stronger in damped sounds (Akeroyd and Patterson 1995). In the current study, a paired comparison technique was used to establish the relative "hissiness" of these noises, and the ratings were correlated with (a) components of the auditory evoked field (AEF) produced by these noises and (b) the magnitude of a hissiness feature derived from a model of the internal auditory images produced by these noises (Irino and Patterson 1998). An earlier AEF report indicated that the peak magnitude of the transient N100m response mirrors the perceived salience of the tonal perception (Rupp et al. 2005). The AEFs of 14 subjects were recorded in response to damped/ramped noises with half-lives between 1 and 64 ms and repetition rates between 12.5 and 100 ms. Spatio-temporal source analysis was used to fit the P50m, the P200m, and the sustained field (SF). These noise stimuli did not produce a reliable N100m. The hissiness feature from the auditory model was extracted from a time-averaged sequence of summary auditory images as in Patterson and Irino (1998). The results show that the perceptual measure of hissiness is highly correlated with the hissiness feature from the summary auditory image, and both are highly correlated with the magnitude of the transient P200m. There is a significant but weaker correlation with the SF and a nonsignificant correlation with the P50m. The results suggest that regularity in the carrier effects branching at an early stage of auditory processing with tonal and noisy sounds following separate spatio-temporal routes through the system.
Collapse
Affiliation(s)
- André Rupp
- Department of Neurology, University of Heidelberg, Heidelberg, Germany.
| | | | | | | |
Collapse
|
5
|
Abstract
Looming visual stimuli (log-increasing in proximal size over time) and auditory stimuli (of increasing sound intensity over time) have been shown to be perceived as longer than receding visual and auditory stimuli (i.e., looming stimuli reversed in time). Here, we investigated whether such asymmetry in subjective duration also occurs for audiovisual looming and receding stimuli, as well as for stationary stimuli (i.e., stimuli that do not change in size and/or intensity over time). Our results showed a great temporal asymmetry in audition but a null asymmetry in vision. In contrast, the asymmetry in audiovision was moderate, suggesting that multisensory percepts arise from the integration of unimodal percepts in a maximum-likelihood fashion.
Collapse
|
6
|
Jin F, Krishnan SS, Sattar F. Adventitious sounds identification and extraction using temporal-spectral dominance-based features. IEEE Trans Biomed Eng 2011; 58:3078-87. [PMID: 21712152 DOI: 10.1109/tbme.2011.2160721] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Respiratory sound (RS) signals carry significant information about the underlying functioning of the pulmonary system by the presence of adventitious sounds (ASs). Although many studies have addressed the problem of pathological RS classification, only a limited number of scientific works have focused on the analysis of the evolution of symptom-related signal components in joint time-frequency (TF) plane. This paper proposes a new signal identification and extraction method for various ASs based on instantaneous frequency (IF) analysis. The presented TF decomposition method produces a noise-resistant high definition TF representation of RS signals as compared to the conventional linear TF analysis methods, yet preserving the low computational complexity as compared to those quadratic TF analysis methods. The discarded phase information in conventional spectrogram has been adopted for the estimation of IF and group delay, and a temporal-spectral dominance spectrogram has subsequently been constructed by investigating the TF spreads of the computed time-corrected IF components. The proposed dominance measure enables the extraction of signal components correspond to ASs from noisy RS signal at high noise level. A new set of TF features has also been proposed to quantify the shapes of the obtained TF contours, and therefore strongly, enhances the identification of multicomponents signals such as polyphonic wheezes. An overall accuracy of 92.4±2.9% for the classification of real RS recordings shows the promising performance of the presented method.
Collapse
Affiliation(s)
- Feng Jin
- Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada.
| | | | | |
Collapse
|
7
|
Parthasarathy A, Bartlett EL. Age-related auditory deficits in temporal processing in F-344 rats. Neuroscience 2011; 192:619-30. [PMID: 21723376 DOI: 10.1016/j.neuroscience.2011.06.042] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2011] [Revised: 06/12/2011] [Accepted: 06/14/2011] [Indexed: 11/30/2022]
Abstract
Older human listeners demonstrate perceptual deficits in temporal processing even when audibility has been controlled. These age-related auditory deficits in temporal processing are thought to originate in the central auditory pathway. Precise temporal processing is necessary to detect and discriminate auditory cues such as modulation frequency, modulation depth and envelope shape which are critical for perception of speech and environmental sounds. This study aims to further understanding of temporal processing in aging using non-invasive electrophysiological measurements. Amplitude modulation following responses (AMFRs) and frequency modulation following responses (FMFRs) were recorded from aged (92-95-weeks old) and young (9-12-weeks old) Fischer-344 (F-344) rats for sinusoidally amplitude modulated (sAM) tones, sinusoidally frequency modulated (sFM) tones and ramped and damped amplitude modulation (AM) stimuli which differ in their envelope shapes. The modulation depth for the sAM and sFM stimuli and envelope shape for the ramped and damped stimuli were systematically varied. There was a monotonic decrease in AMFR and FMFR amplitudes with decreases in modulation depth across age for sAM and sFM stimuli. There was no significant difference between the response amplitudes of the young and aged animals for the largest modulation depths. However, a reduction in modulation depth resulted in a significant decrease in the response amplitudes and higher modulation detection thresholds for sAM and sFM stimuli with age. The aged animals showed significantly lower response amplitudes for ramped stimuli but not for damped stimuli. Cross correlating the responses with the ramped, symmetric, or damped stimulus envelopes revealed a decreased fidelity in encoding envelope shapes with age. These results indicate that age related temporal processing deficits become apparent only with reduced modulation depths or when discriminating envelope shapes. This has implications for psychophysical or diagnostic testing as well as for constraining potential cellular and network mechanisms responsible for these deficits.
Collapse
Affiliation(s)
- A Parthasarathy
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | | |
Collapse
|
8
|
Byrne AJ, Viemeister NF, Stellmack MA. Discrimination of temporally asymmetric modulation with triangular envelopes on a broadband-noise carrier (L). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:593-596. [PMID: 21361417 PMCID: PMC3070990 DOI: 10.1121/1.3531838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Revised: 11/18/2010] [Accepted: 11/26/2010] [Indexed: 05/30/2023]
Abstract
Highly detectable, time-reversed triangular amplitude modulation, with linear increases and decreases in amplitude, was used in an adaptive task to measure just-noticeable differences for changes in the direction of envelope temporal asymmetry for different modulation depths (m = 1.0 and 0.5) and rates (8, 16, and 32 Hz). Thresholds were analyzed using three different measures of the modulator's shape based on (1) the change in the position of the peak within a cycle, (2) the change in the slope of the modulator's increasing amplitude portion, and (3) the change in slope measured in units of amplitude per unit cycle rather than amplitude per unit time. The amplitude per unit cycle measure resulted in the best fit to all the data, and predicted additional data that were gathered with roved modulation frequency. The results suggest that a time normalization process may be involved in the perception and discrimination of envelope shape.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA.
| | | | | |
Collapse
|
9
|
Nelken I. Processing of complex sounds in the auditory system. Curr Opin Neurobiol 2008; 18:413-7. [PMID: 18805485 DOI: 10.1016/j.conb.2008.08.014] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2008] [Revised: 08/23/2008] [Accepted: 08/26/2008] [Indexed: 12/01/2022]
Abstract
The coding of complex sounds in the early auditory system has a 'standard model' based on the known physiology of the cochlea and main brainstem pathways. This model accounts for a wide range of perceptual capabilities. It is generally accepted that high cortical areas encode abstract qualities such as spatial location or speech sound identity. Between the early and late auditory system, the role of primary auditory cortex (A1) is still debated. A1 is clearly much more than a 'whiteboard' of acoustic information-neurons in A1 have complex response properties, showing sensitivity to both low-level and high-level features of sounds.
Collapse
Affiliation(s)
- Israel Nelken
- Department of Neurobiology, Silberman Institute of Life Sciences, and the Interdisciplinary Center for Neural Computation (ICNC), Hebrew University, Safra Campus, Jerusalem 91904, Israel.
| |
Collapse
|
10
|
DiGiovanni JJ, Schlauch RS. Mechanisms Responsible for Differences in Perceived Duration for Rising-Intensity and Falling-Intensity Sounds. ECOLOGICAL PSYCHOLOGY 2007. [DOI: 10.1080/10407410701432329] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
11
|
Abstract
Two experiments demonstrate that the perceived durations of sounds as long as 1 sec are influenced by the sounds' amplitude envelopes, extending Schlauch, Ries, and DiGiovanni's (2001) observations on sounds of 200-msec duration. Sounds with a monotonic decay (i.e., damped sounds) are heard as substantially shorter than both steady sounds and those with a monotonic increase of level (i.e., ramped sounds). Neither a reaction time (Experiments 1 and 2) nor a staircase (Experiment 2) procedure supported a sensory explanation for these different subjective durations. The results are compatible with the suggestion of Stecker and Hafter (2000) that listeners exclude part of the tails of damped sounds in the computation of their subjective durations.
Collapse
Affiliation(s)
- Massimo Grassi
- Dipartimento di Psicologia Generale, Università di Padova, via Venezia 8, 35131 Padova, Italy.
| | | |
Collapse
|
12
|
Irino T, Patterson RD, Kawahara H. Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2006; 14:2212-2221. [PMID: 20191101 PMCID: PMC2828642 DOI: 10.1109/tasl.2006.872611] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We propose a new method to segregate concurrent speech sounds using an auditory version of a channel vocoder. The auditory representation of sound, referred to as an "auditory image," preserves fine temporal information, unlike conventional window-based processing systems. This makes it possible to segregate speech sources with an event synchronous procedure. Fundamental frequency information is used to estimate the sequence of glottal pulse times for a target speaker, and to repress the glottal events of other speakers. The procedure leads to robust extraction of the target speech and effective segregation even when the signal-to-noise ratio is as low as 0 dB. Moreover, the segregation performance remains high when the speech contains jitter, or when the estimate of the fundamental frequency F0 is inaccurate. This contrasts with conventional comb-filter methods where errors in F0 estimation produce a marked reduction in performance. We compared the new method to a comb-filter method using a cross-correlation measure and perceptual recognition experiments. The results suggest that the new method has the potential to supplant comb-filter and harmonic-selection methods for speech enhancement.
Collapse
Affiliation(s)
- Toshio Irino
- Faculty of Systems Engineering, Wakayama University, Wakayama 640-8510, Japan
| | - Roy D. Patterson
- Centre for Neural Basis of Hearing, Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge CB2 3EG, U.K.
| | - Hideki Kawahara
- Faculty of Systems Engineering, Wakayama University, Wakayama 640-8510, Japan
| |
Collapse
|
13
|
van Dinther R, Patterson RD. Perception of acoustic scale and size in musical instrument sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2158-76. [PMID: 17069313 PMCID: PMC2821800 DOI: 10.1121/1.2338295] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.
Collapse
Affiliation(s)
- Ralph van Dinther
- Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG UK.
| | | |
Collapse
|
14
|
Seither-Preisler A, Patterson RD, Krumbholz K, Seither S, Lütkenhöner B. From noise to pitch: transient and sustained responses of the auditory evoked field. Hear Res 2006; 218:50-63. [PMID: 16814971 DOI: 10.1016/j.heares.2006.04.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2006] [Revised: 04/22/2006] [Accepted: 04/27/2006] [Indexed: 11/22/2022]
Abstract
In recent magnetoencephalographic studies, we established a novel component of the auditory evoked field, which is elicited by a transition from noise to pitch in the absence of a change in energy. It is referred to as the 'pitch onset response'. To extend our understanding of pitch-related neural activity, we compared transient and sustained auditory evoked fields in response to a 2000-ms segment of noise and a subsequent 1000-ms segment of regular interval sound (RIS). RIS provokes the same long-term spectral representation in the auditory system as noise, but is distinguished by a definite pitch, the salience of which depends on the degree of temporal regularity. The stimuli were presented at three steps of increasing regularity and two spectral bandwidths. The auditory evoked fields were recorded from both cerebral hemispheres of twelve subjects with a 37-channel magnetoencephalographic system. Both the transient and the sustained components evoked by noise and RIS were sensitive to spectral bandwidth. Moreover, the pitch salience of the RIS systematically affected the pitch onset response, the sustained field, and the off-response. This indicates that the underlying neural generators reflect the emergence, persistence and offset of perceptual attributes derived from the temporal regularity of a sound.
Collapse
Affiliation(s)
- A Seither-Preisler
- Department of Experimental Audiology, ENT Clinic, Münster University Hospital, Kardinal-von-Galen-Ring 10, D-48149 Münster, Germany.
| | | | | | | | | |
Collapse
|
15
|
Seither-Preisler A, Patterson R, Krumbholz K, Seither S, Lütkenhöner B. Evidence of pitch processing in the N100m component of the auditory evoked field. Hear Res 2006; 213:88-98. [PMID: 16464550 DOI: 10.1016/j.heares.2006.01.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/16/2005] [Revised: 12/23/2005] [Accepted: 01/02/2006] [Indexed: 11/19/2022]
Abstract
The latency of the N100m component of the auditory evoked field (AEF) is sensitive to the period and spectrum of a sound. However, little attention was paid so far to the wave shape at stimulus onset, which might have biased previous results. This problem was fixed in the present study by aligning the first major peaks in the acoustic waveforms. The stimuli were harmonic tones (spectral range: 800-5000 Hz) with periods corresponding to 100, 200, 400, and 800 Hz. The frequency components were in sine, alternating or random phase. Simulations with a computational model suggest that the auditory-nerve activity is strongly affected by both the period and the relative phase of the stimulus, whereas the output of the more central pitch processor only depends on the period. Our AEF data, recorded from the right hemisphere of seven subjects, are consistent with the latter prediction: The latency of the N100m depends on the period, but not on the relative phase of the stimulus components. This suggests that the N100m reflects temporal pitch extraction, not necessarily implying that the underlying generators are directly involved in this analysis.
Collapse
Affiliation(s)
- Annemarie Seither-Preisler
- Department of Experimental Audiology, ENT Clinic, Münster University Hospital, Kardinal von Galen-Ring 10, D-48129 Münster, Germany.
| | | | | | | | | |
Collapse
|
16
|
Krumbholz K, Bleeck S, Patterson RD, Senokozlieva M, Seither-Preisler A, Lütkenhöner B. The effect of cross-channel synchrony on the perception of temporal regularity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:946-54. [PMID: 16158650 DOI: 10.1121/1.1941090] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Temporal models of pitch are based on the assumption that the auditory system measures the time intervals between neural events, and that pitch corresponds to the most common time interval. The current experiments were designed to test whether time intervals are analyzed independently in each peripheral channel, or whether the time-interval analysis in one channel is affected by synchronous activity in other channels. Regular and irregular click trains were filtered into narrow frequency bands to produce target and flanker stimuli. The threshold for discriminating a regular target from an irregular distracter click train was measured in the presence of an irregular masker click train in the target band, as a function of the frequency separation between the target band and a flanker band. The flanker click train was either regular or irregular. The threshold for detecting the regular target was 5-7 dB lower when the flanker was regular. The data indicate that the detection of temporal regularity (and thus, pitch) involves cross-channel processes that can operate over widely separated channels. Model simulations suggest that these cross-channel processes occur after the time-interval extraction stage and that they depend on the similarity, or consistency, of the time-interval patterns in the relevant channels.
Collapse
Affiliation(s)
- Katrin Krumbholz
- Institute of Medicine (IME), Research Center Jülich, D-52425 Jülich, Germany.
| | | | | | | | | | | |
Collapse
|
17
|
Gardner TJ, Magnasco MO. Instantaneous frequency decomposition: an application to spectrally sparse sounds with fast frequency modulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:2896-903. [PMID: 15957760 DOI: 10.1121/1.1863072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Classical time-frequency analysis is based on the amplitude responses of bandpass filters, discarding phase information. Instantaneous frequency analysis, in contrast, is based on the derivatives of these phases. This method of frequency calculation is of interest for its high precision and for reasons of similarity to cochlear encoding of sound. This article describes a methodology for high resolution analysis of sparse sounds, based on instantaneous frequencies. In this method, a comparison between tonotopic and instantaneous frequency information is introduced to select filter positions that are well matched to the signal. Second, a cross-check that compares frequency estimates from neighboring channels is used to optimize filter bandwidth, and to signal the quality of the analysis. These cross-checks lead to an optimal time-frequency representation without requiring any prior information about the signal. When applied to a signal that is sufficiently sparse, the method decomposes the signal into separate time-frequency contours that are tracked with high precision. Alternatively, if the signal is spectrally too dense, neighboring channels generate inconsistent estimates-a feature that allows the method to assess its own validity in particular contexts. Similar optimization principles may be present in cochlear encoding.
Collapse
Affiliation(s)
- T J Gardner
- Laboratory of Mathematical Physics, The Rockefeller University, 1230 York Ave, New York, New York 10021, USA.
| | | |
Collapse
|
18
|
Krumbholz K, Patterson RD, Nobbe A, Fastl H. Microsecond temporal resolution in monaural hearing without spectral cues? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2003; 113:2790-2800. [PMID: 12765396 DOI: 10.1121/1.1547438] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The auditory system encodes the timing of peaks in basilar-membrane motion with exquisite precision, and perceptual models of binaural processing indicate that the limit of temporal resolution in humans is as little as 10-20 microseconds. In these binaural studies, pairs of continuous sounds with microsecond differences are presented simultaneously, one sound to each ear. In this paper, a monaural masking experiment is described in which pairs of continuous sounds with microsecond time differences were combined and presented to both ears. The stimuli were matched in terms of the excitation patterns they produced, and a perceptual model of monaural processing indicates that the limit of temporal resolution in this case is similar to that in the binaural system.
Collapse
Affiliation(s)
- Katrin Krumbholz
- Centre for the Neural Basis of Hearing, Department of Physiology, University of Cambridge, Downing Street, Cambridge CB2 3EG, United Kingdom
| | | | | | | |
Collapse
|
19
|
Krumbholz K, Patterson RD, Nobbe A. Asymmetry of masking between noise and iterated rippled noise: evidence for time-interval processing in the auditory system. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 110:2096-2107. [PMID: 11681387 DOI: 10.1121/1.1395583] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This study describes the masking asymmetry between noise and iterated rippled noise (IRN) as a function of spectral region and the IRN delay. Masking asymmetry refers to the fact that noise masks IRN much more effectively than IRN masks noise, even when the stimuli occupy the same spectral region. Detection thresholds for IRN masked by noise and for noise masked by IRN were measured with an adaptive two-alternative, forced choice (2AFC) procedure with signal level as the adaptive parameter. Masker level was randomly varied within a 10-dB range in order to reduce the salience of loudness as a cue for detection. The stimuli were filtered into frequency bands, 2.2-kHz wide, with lower cutoff frequencies ranging from 0.8 to 6.4 kHz. IRN was generated with 16 iterations and with varying delays. The reciprocal of the delay was 16, 32, 64, or 128 Hz. When the reciprocal of the IRN delay was within the pitch range, i.e., above 30 Hz, there was a substantial masking asymmetry between IRN and noise for all filter cutoff frequencies; threshold for IRN masked by noise was about 10 dB larger than threshold for noise masked by IRN. For the 16-Hz IRN, the masking asymmetry decreased progressively with increasing filter cutoff frequency, from about 9 dB for the lowest cutoff frequency to less than 1 dB for the highest cutoff frequency. This suggests that masking asymmetry may be determined by different cues for delays within and below the pitch range. The fact that masking asymmetry exists for conditions that combine very long IRN delays with very high filter cutoff frequencies means that it is unlikely that models based on the excitation patterns of the stimuli would be successful in explaining the threshold data. A range of time-domain models of auditory processing that focus on the time intervals in phase-locked neural activity patterns is reviewed. Most of these models were successful in accounting for the basic masking asymmetry between IRN and noise for conditions within the pitch range, and one of the models produced an exceptionally good fit to the data.
Collapse
Affiliation(s)
- K Krumbholz
- Centre for the Neural Basis of Hearing, Department of Physiology, University of Cambridge, United Kingdom
| | | | | |
Collapse
|
20
|
Neuert V, Pressnitzer D, Patterson RD, Winter IM. The responses of single units in the inferior colliculus of the guinea pig to damped and ramped sinusoids. Hear Res 2001; 159:36-52. [PMID: 11520633 DOI: 10.1016/s0378-5955(01)00318-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Temporal asymmetry can have a pronounced effect on the perception of a sinusoid. For instance, if a sinusoid is amplitude modulated by a decaying exponential that restarts every 50 ms, (a damped sinusoid) listeners report a two-component percept: a tonal component corresponding to the carrier and a drumming component corresponding to the envelope modulation period. When the amplitude modulation is reversed in time (a ramped sinusoid) the perception changes markedly; the tonal component increases while the drumming component decreases. The long-term Fourier energy spectra are identical for damped and ramped sinusoids with the same exponential half-life. Modelling studies suggest that this perceptual asymmetry must occur central to the peripheral stages of auditory processing (Patterson and Irino, 1998). To test this hypothesis, we have recorded the responses of single units in the inferior colliculus of the anaesthetised guinea pig. We divided single units into three groups: onset, on-sustained and sustained, based on their temporal adaptation properties to suprathreshold tone bursts at the unit's best frequency. The asymmetry observed in the neural responses of single units was quantified in two ways: a simple total spike count measure and a ratio of the tallest bin of the modulation period histogram to the total number of spikes. Responses were more diverse than those observed with similar stimuli in a previous study in the ventral cochlear nucleus (Pressnitzer et al., 2000). The main results were: (1) The shape of the responses of on-sustained units to ramped sinusoids resembled the shape of the responses to damped sinusoids. This is in contrast to the response shapes in the VCN, which were always similar to the stimulating sinusoid. (2) Units classified as onsets often responded only to the damped stimuli. (3) All units display significant asymmetry in discharge rate for at least one of the half-lives tested and 20% showed significant response asymmetry over all of the half-lives tested. (4) A summary population measure of temporal asymmetry based on total spike count reveals the same pattern of results as that obtained psychophysically using the same stimuli (Patterson, 1994a).
Collapse
Affiliation(s)
- V Neuert
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Cambridge, UK
| | | | | | | |
Collapse
|
21
|
Schlauch RS, Ries DT, DiGiovanni JJ. Duration discrimination and subjective duration for ramped and damped sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 109:2880-2887. [PMID: 11425130 DOI: 10.1121/1.1372913] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The perception of stimuli with ramped envelopes (gradual attack and abrupt decay) and damped envelopes (abrupt attack and gradual decay) was studied in subjective and objective tasks. Magnitude estimation (ME) of perceived duration was measured for broadband noise, 1.0-kHz, and 8.0-kHz tones for durations between 10 and 200 ms. Damped sounds were judged to be shorter than ramped sounds. Matching experiments between sounds with ramped, damped, and rectangular envelopes also showed that damped sounds are perceived to be shorter than ramped sounds, and, additionally, the reason for the effect is a result of the damped sound being judged shorter than a rectangular-gated sound rather than the ramped sound being judged longer than a rectangular-gated sound. These matching studies also demonstrate that the size of the effect is larger for tones (factor of 2.0) than for broadband noise (factor of 1.5). There are two plausible explanations for the finding that damped sounds are judged to be shorter than ramped or rectangular-gated sounds: (1) the abrupt offset at a high level of the ramped sound (or a rectangular-gated sound) results in a persistence of perception (forward masking) that is considered in judgments of the subjective duration; and (2) listeners may ignore a portion of the decay of a damped sound because they consider it an "echo" [Stecker and Hafter, J. Acoust. Soc. Am. 107, 3358-3368 (2000)]. In another experiment, duration discrimination for broadband noise with ramped, damped, and rectangular envelopes was studied as a function of duration (10 to 100 ms) to determine if differences in perceived duration are associated with the size of measured Weber fractions. A forced-choice adaptive procedure was used. Duration discrimination was poorer for noise with ramped envelopes than for noise with damped or rectangular envelopes. This result is inconsistent with differences in perceived duration and no explanation was readily apparent.
Collapse
Affiliation(s)
- R S Schlauch
- Department of Communication Disorders, University of Minnesota, Minneapolis 55455, USA
| | | | | |
Collapse
|
22
|
Lu T, Liang L, Wang X. Neural representations of temporally asymmetric stimuli in the auditory cortex of awake primates. J Neurophysiol 2001; 85:2364-80. [PMID: 11387383 DOI: 10.1152/jn.2001.85.6.2364] [Citation(s) in RCA: 127] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The representation of rapid acoustic transients by the auditory cortex is a fundamental issue that is still unresolved. Auditory cortical neurons have been shown to be limited in their stimulus-synchronized responses, yet the perceptual performances of humans and animals in discriminating temporal variations in complex sounds are better than what existing neurophysiological data would predict. This study investigated the neural representation of temporally asymmetric stimuli in the primary auditory cortex of awake marmoset monkeys. The stimuli, ramped and damped sinusoids, were systematically manipulated (by means of half-life of the exponential envelope) within a cortical neuron's presumed temporal integration window. The main findings of this study are as follows: 1) temporal asymmetry in ramped and damped sinusoids with a short period (25 ms) was clearly reflected by average discharge rate but not necessarily by temporal discharge patterns of auditory cortical neurons. There was considerable response specificity to these stimuli such that some neurons were strongly responsive to a ramped sinusoid but almost completely unresponsive to its damped counterpart or vice versa. Of 181 neurons studied, 140 (77%) showed significant response asymmetry in at least one of the tested half-life values of the exponential envelope. Forty-six neurons showed significant response asymmetry over all half-lives tested. Sustained firing, commonly observed under awake conditions, contributed to greater response asymmetry than that of onset responses in many neurons. 2) A greater proportion of the neurons (32/46) that exhibited significant overall response asymmetry showed stronger responses to the ramped sinusoids than to the damped sinusoids, possibly contributing to the difference in the perceived loudness between these two classes of sounds. 3) The asymmetry preference of a neuron to ramped or damped sinusoids did not appear to be correlated with its characteristic frequency or minimum response latency, suggesting that this is a general phenomenon that exists across populations of cortical neurons. Moreover, the intensity of the stimuli did not have significant effects on the measure of the asymmetry preference based on discharge rate. 4) A population measure of response preference, based on discharge rate, of cortical neurons to the temporally asymmetric stimuli was qualitatively similar to the performance of human listeners in discriminating ramped versus damped sinusoids at different half-life values. These findings suggest that rapid acoustic transients embedded in complex sounds can be represented by discharge rates of cortical neurons instead of or in the absence of stimulus-synchronized discharges.
Collapse
Affiliation(s)
- T Lu
- Laboratory of Auditory Neurophysiology, Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | | |
Collapse
|
23
|
Pressnitzer D, Winter IM, Patterson RD. The responses of single units in the ventral cochlear nucleus of the guinea pig to damped and ramped sinusoids. Hear Res 2000; 149:155-66. [PMID: 11033255 DOI: 10.1016/s0378-5955(00)00175-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Human listeners hear an asymmetry in the perception of damped and ramped sinusoids; the partial loudness of the envelope component is greater than the partial loudness of the carrier component for damped sinusoids. Here we show that an asymmetry also occurs in the physiological responses of most units in the ventral cochlear nucleus to these same sounds. The activity elicited by damped sinusoids is mainly restricted to the beginning of each envelope period, which is not the case for ramped sinusoids. This can be quantified by computing the ratio of the tallest bin of the modulation period histogram to the total number of spikes (the peak-to-total ratio, p/t). Damped sinusoids produce a higher p/t than ramped sinusoids, which demonstrates physiological temporal asymmetry. It is also the case that ramped sinusoids typically elicit more spikes than damped sinusoids. The physiological asymmetry occurs where the perceptual asymmetry is present. It is maximal at modulation half-lives of 4 and 16 ms, greatly reduced at 1 ms and absent at 64 ms. Different unit types exhibit differing degrees of temporal asymmetry. Onset units produce the greatest p/t asymmetry, primary-like units produce the least asymmetry and chopper units are in-between. With regard to total spike count, the maximal asymmetry occurs with chopper units. If primary-like units are assumed to reflect the activity in primary auditory nerve fibres, then there is enhancement of temporal asymmetry in the ventral cochlear nucleus by both onset and chopper units.
Collapse
Affiliation(s)
- D Pressnitzer
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Downing Street, CB2 3EG England, Cambridge, UK.
| | | | | |
Collapse
|
24
|
Derleth RP, Dau T. On the role of envelope fluctuation processing in spectral masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2000; 108:285-296. [PMID: 10923892 DOI: 10.1121/1.429464] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This study examines the role of temporal cues in spectral masking, such as beats and intrinsic envelope fluctuations. Predictions from the modulation-filterbank model developed by Dau et al. [J. Acoust. Soc. Am. 102, 2906-2919 (1997)] are compared to average masking patterns from Moore et al. [J. Acoust. Soc. Am. 104, 1023-1038 (1998)]. In these experiments, tones and narrow-band noises have been used as the signal and the masker, so that all four signal-masker combinations are considered. In addition, model predictions are compared with new experimental data in conditions of notched-noise masking, where the masker consisted of two narrow-band noises whose bandwidth and frequency separation were varied systematically. The model uses a peripheral filtering stage with linear and symmetric Gammatone filters, an adaptation stage that includes a static compressive nonlinearity for stationary input stumuli and a higher sensitivity for envelope fluctuation, and a modulation filterbank that analyzes the output for each peripheral channel. For low and medium masker levels, the model accounts very well for the masking patterns in all signal-masker conditions, as well as for the notched-noise conditions. In contrast, predictions from a version of the model that acts like an energy detector account for only some of the notched-noise data, and generally do not account for the shape of the masking patterns. For a high masker level, the simulations suggest the use of asymmetric filters, with a steeper high-frequency slope than is used in the linear model, consistent with results from previous studies. In addition, several nonlinear effects become apparent at this masker level, which cannot be accounted for by the current model.
Collapse
Affiliation(s)
- R P Derleth
- Carl-von-Ossietzky Universität Oldenburg, AG Medizinische Physik, Graduiertenkolleg Psychoakustik, Germany
| | | |
Collapse
|
25
|
Stecker GC, Hafter ER. An effect of temporal asymmetry on loudness. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2000; 107:3358-3368. [PMID: 10875381 DOI: 10.1121/1.429407] [Citation(s) in RCA: 45] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
A set of experiments was conducted to examine the loudness of sounds with temporally asymmetric amplitude envelopes. Envelopes were generated with fast-attack/slow-decay characteristics to produce F-S (or "fast-slow") stimuli, while temporally reversed versions of these same envelopes produced corresponding S-F ("slow-fast") stimuli. For sinusoidal (330-6000 Hz) and broadband noise carriers, S-F stimuli were louder than F-S stimuli of equal energy. The magnitude of this effect was sensitive to stimulus order, with the largest differences between F-S and S-F loudness occurring after exposure to a preceding F-S stimulus. These results are not compatible with automatic gain control, power-spectrum models of loudness, or predictions obtained using the auditory image model [Patterson et al., J. Acoust. Soc. Am. 98, 1890-1894 (1995)]. Rather, they are comparable to phenomena of perceptual constancy, and may be related to the parsing of auditory input into direct and reverberant sound.
Collapse
Affiliation(s)
- G C Stecker
- Department of Psychology, University of California at Berkeley, 94720-1650, USA
| | | |
Collapse
|
26
|
Patterson RD. Auditory images. How complex sounds are represented in the auditory system. ACTA ACUST UNITED AC 2000. [DOI: 10.1250/ast.21.183] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Roy D. Patterson
- Centre for the Neural Basis of Hearing,Physiology Department,Cambridge University,Downing Site,Cambridge,CB2 3EG,United Kingdom
| |
Collapse
|