1
|
Ou Y, Allen EJ, Kay KN, Oxenham AJ. Cortical substrates of perceptual confusion between pitch and timbre. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.19.639197. [PMID: 40027705 PMCID: PMC11870464 DOI: 10.1101/2025.02.19.639197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Pitch and timbre are two fundamental perceptual attributes of sound that help us distinguish voices in speech and appreciate music. Brightness, one of the primary dimensions of timbre, is governed by different acoustic parameters compared to pitch, but the two can be confused perceptually when varied simultaneously. Here we combine human behavior and fMRI to provide evidence of a potential neural substrate to explain this important but poorly understood perceptual confusion. We identify orderly mappings of both pitch and brightness within auditory cortex and reveal two independent lines of evidence for cortical confusion between them. First, the preferred pitch of individual voxels decreases systematically as brightness increases, and vice versa, consistent with predictions based on perceptual confusion. Second, pitch and brightness mapping share a common high-low-high gradient across auditory cortex, implying a shared trajectory of cortical activation for changes in each dimension. The results provide a cortical substrate at both local and global scales for an established auditory perceptual phenomenon that is thought to reflect efficient coding of features ubiquitous in natural sound statistics.
Collapse
|
2
|
Maruyama H, Motoyoshi I. Two-stage spectral space and the perceptual properties of sound textures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:2067-2076. [PMID: 40130951 DOI: 10.1121/10.0036219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 03/03/2025] [Indexed: 03/26/2025]
Abstract
Textural sounds can be perceived in the natural environment such as wind, waterflows, and footsteps. Recent studies have shown that the perception of auditory textures can be described and synthesized by the multiple classes of time-averaged statistics or the linear spectra and energy spectra of input sounds. The findings lead to a possibility that the explicit perceptual property of a textural sound, such as heaviness and complexity, could be predictable from the two-stage spectra. In the present study, numerous rating data were collected for 17 different perceptual properties with 325 real-world sounds, and the relationship between the rating and the two-stage spectral characteristics was investigated. The analysis showed that the ratings for each property were strongly and systematically correlated with specific frequency bands in the two-stage spectral space. The subsequent experiment demonstrated further that manipulation of power at critical frequency bands significantly alters the perceived property of natural sounds in the predicted direction. The results suggest that the perceptual impression of sound texture is strongly dependent on the power distribution of first- and second-order acoustic filters in the early auditory system.
Collapse
Affiliation(s)
- Hironori Maruyama
- Graduate School of Arts and Sciences, The University of Tokyo, Meguro-ku, Tokyo, 153-8902, Japan
- Japan Society for the Promotion of Science (JSPS), Chiyoda-ku, Tokyo, 102-0083, Japan
| | - Isamu Motoyoshi
- Graduate School of Arts and Sciences, The University of Tokyo, Meguro-ku, Tokyo, 153-8902, Japan
| |
Collapse
|
3
|
Shorey AE, King CJ, Whiteford KL, Stilp CE. Musical training is not associated with spectral context effects in instrument sound categorization. Atten Percept Psychophys 2024; 86:991-1007. [PMID: 38216848 DOI: 10.3758/s13414-023-02839-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2023] [Indexed: 01/14/2024]
Abstract
Musicians display a variety of auditory perceptual benefits relative to people with little or no musical training; these benefits are collectively referred to as the "musician advantage." Importantly, musicians consistently outperform nonmusicians for tasks relating to pitch, but there are mixed reports as to musicians outperforming nonmusicians for timbre-related tasks. Due to their experience manipulating the timbre of their instrument or voice in performance, we hypothesized that musicians would be more sensitive to acoustic context effects stemming from the spectral changes in timbre across a musical context passage (played by a string quintet then filtered) and a target instrument sound (French horn or tenor saxophone; Experiment 1). Additionally, we investigated the role of a musician's primary instrument of instruction by recruiting French horn and tenor saxophone players to also complete this task (Experiment 2). Consistent with the musician advantage literature, musicians exhibited superior pitch discrimination to nonmusicians. Contrary to our main hypothesis, there was no difference between musicians and nonmusicians in how spectral context effects shaped instrument sound categorization. Thus, musicians may only outperform nonmusicians for some auditory skills relevant to music (e.g., pitch perception) but not others (e.g., timbre perception via spectral differences).
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY, 40292, USA.
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY, 40292, USA.
| | - Kelly L Whiteford
- Department of Psychology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY, 40292, USA
| |
Collapse
|
4
|
Giordano BL, Esposito M, Valente G, Formisano E. Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds. Nat Neurosci 2023; 26:664-672. [PMID: 36928634 PMCID: PMC10076214 DOI: 10.1038/s41593-023-01285-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 02/15/2023] [Indexed: 03/18/2023]
Abstract
Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de La Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France.
| | - Michele Esposito
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands. .,Maastricht Centre for Systems Biology (MaCSBio), Faculty of Science and Engineering, Maastricht University, Maastricht, the Netherlands. .,Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, the Netherlands.
| |
Collapse
|
5
|
He F, Stevenson IH, Escabí MA. Two stages of bandwidth scaling drives efficient neural coding of natural sounds. PLoS Comput Biol 2023; 19:e1010862. [PMID: 36787338 PMCID: PMC9970106 DOI: 10.1371/journal.pcbi.1010862] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 02/27/2023] [Accepted: 01/09/2023] [Indexed: 02/15/2023] Open
Abstract
Theories of efficient coding propose that the auditory system is optimized for the statistical structure of natural sounds, yet the transformations underlying optimal acoustic representations are not well understood. Using a database of natural sounds including human speech and a physiologically-inspired auditory model, we explore the consequences of peripheral (cochlear) and mid-level (auditory midbrain) filter tuning transformations on the representation of natural sound spectra and modulation statistics. Whereas Fourier-based sound decompositions have constant time-frequency resolution at all frequencies, cochlear and auditory midbrain filters bandwidths increase proportional to the filter center frequency. This form of bandwidth scaling produces a systematic decrease in spectral resolution and increase in temporal resolution with increasing frequency. Here we demonstrate that cochlear bandwidth scaling produces a frequency-dependent gain that counteracts the tendency of natural sound power to decrease with frequency, resulting in a whitened output representation. Similarly, bandwidth scaling in mid-level auditory filters further enhances the representation of natural sounds by producing a whitened modulation power spectrum (MPS) with higher modulation entropy than both the cochlear outputs and the conventional Fourier MPS. These findings suggest that the tuning characteristics of the peripheral and mid-level auditory system together produce a whitened output representation in three dimensions (frequency, temporal and spectral modulation) that reduces redundancies and allows for a more efficient use of neural resources. This hierarchical multi-stage tuning strategy is thus likely optimized to extract available information and may underlies perceptual sensitivity to natural sounds.
Collapse
Affiliation(s)
- Fengrong He
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Ian H. Stevenson
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- The Connecticut Institute for Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Monty A. Escabí
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- The Connecticut Institute for Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
6
|
McAdams S, Thoret E, Wang G, Montrey M. Timbral cues for learning to generalize musical instrument identity across pitch register. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:797. [PMID: 36859162 DOI: 10.1121/10.0017100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/12/2023] [Indexed: 06/18/2023]
Abstract
Timbre provides an important cue to identify musical instruments. Many timbral attributes covary with other parameters like pitch. This study explores listeners' ability to construct categories of instrumental sound sources from sounds that vary in pitch. Nonmusicians identified 11 instruments from the woodwind, brass, percussion, and plucked and bowed string families. In experiment 1, they were trained to identify instruments playing a pitch of C4, and in experiments 2 and 3, they were trained with a five-tone sequence (F#3-F#4), exposing them to the way timbre varies with pitch. Participants were required to reach a threshold of 75% correct identification in training. In the testing phase, successful listeners heard single tones (experiments 1 and 2) or three-tone sequences from (A3-D#4) (experiment 3) across each instrument's full pitch range to test their ability to generalize identification from the learned sound(s). Identification generalization over pitch varies a great deal across instruments. No significant differences were found between single-pitch and multi-pitch training or testing conditions. Identification rates can be predicted moderately well by spectrograms or modulation spectra. These results suggest that listeners use the most relevant acoustical invariance to identify musical instrument sounds, also using previous experience with the tested instruments.
Collapse
Affiliation(s)
- Stephen McAdams
- Schulich School of Music, McGill University, Montreal, Québec H3A 1E3, Canada
| | - Etienne Thoret
- Aix-Marseille University, Centre National de la Recherche Scientifique, Perception Representations Image Sound Music Laboratory, Unité Mixte de Recherche 7061, Laboratoire d'Informatique et Systèmes, Unité Mixte de Recherche 7020, 13009 Marseille, France
| | - Grace Wang
- Cognitive Science Program, McGill University, Montreal, Québec H3A 3R1, Canada
| | - Marcel Montrey
- Department of Psychology, McGill University, Montreal, Québec H3A 1G1, Canada
| |
Collapse
|
7
|
Micallef Grimaud A, Eerola T. Emotional expression through musical cues: A comparison of production and perception approaches. PLoS One 2022; 17:e0279605. [PMID: 36584186 PMCID: PMC9803112 DOI: 10.1371/journal.pone.0279605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 12/10/2022] [Indexed: 01/01/2023] Open
Abstract
Multiple approaches have been used to investigate how musical cues are used to shape different emotions in music. The most prominent approach is a perception study, where musical stimuli varying in cue levels are assessed by participants in terms of their conveyed emotion. However, this approach limits the number of cues and combinations simultaneously investigated, since each variation produces another musical piece to be evaluated. Another less used approach is a production approach, where participants use cues to change the emotion conveyed in music, allowing participants to explore a larger number of cue combinations than the former approach. These approaches provide different levels of accuracy and economy for identifying how cues are used to convey different emotions in music. However, do these approaches provide converging results? This paper's aims are two-fold. The role of seven musical cues (tempo, pitch, dynamics, brightness, articulation, mode, and instrumentation) in communicating seven emotions (sadness, joy, calmness, anger, fear, power, and surprise) in music is investigated. Additionally, this paper explores whether the two approaches will yield similar findings on how the cues are used to shape different emotions in music. The first experiment utilises a production approach where participants adjust the cues in real-time to convey target emotions. The second experiment uses a perception approach where participants rate pre-rendered systematic variations of the stimuli for all emotions. Overall, the cues operated similarly in the majority (32/49) of cue-emotion combinations across both experiments, with the most variance produced by the dynamics and instrumentation cues. A comparison of the prediction accuracy rates of cue combinations representing the intended emotions found that prediction rates in Experiment 1 were higher than the ones obtained in Experiment 2, suggesting that a production approach may be a more efficient method to explore how cues are used to shape different emotions in music.
Collapse
Affiliation(s)
| | - Tuomas Eerola
- Department of Music, Music and Science Lab, Durham University, Durham, United Kingdom
| |
Collapse
|
8
|
Ogino M, Hamada N, Mitsukura Y. Simultaneous multiple-stimulus auditory brain-computer interface with semi-supervised learning and prior probability distribution tuning. J Neural Eng 2022; 19. [PMID: 36317357 DOI: 10.1088/1741-2552/ac9edd] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/31/2022] [Indexed: 11/13/2022]
Abstract
Objective.Auditory brain-computer interfaces (BCIs) enable users to select commands based on the brain activity elicited by auditory stimuli. However, existing auditory BCI paradigms cannot increase the number of available commands without decreasing the selection speed, because each stimulus needs to be presented independently and sequentially under the standard oddball paradigm. To solve this problem, we propose a double-stimulus paradigm that simultaneously presents multiple auditory stimuli.Approach.For addition to an existing auditory BCI paradigm, the best discriminable sound was chosen following a subjective assessment. The new sound was located on the right-hand side and presented simultaneously with an existing sound from the left-hand side. A total of six sounds were used for implementing the auditory BCI with a 6 × 6 letter matrix. We employ semi-supervised learning (SSL) and prior probability distribution tuning to improve the accuracy of the paradigm. The SSL method involved updating of the classifier weights, and their prior probability distributions were adjusted using the following three types of distributions: uniform, empirical, and extended empirical (e-empirical). The performance was evaluated based on the BCI accuracy and information transfer rate (ITR).Main results.The double-stimulus paradigm resulted in a BCI accuracy of 67.89 ± 11.46% and an ITR of 2.67 ± 1.09 bits min-1, in the absence of SSL and with uniform distribution. The proposed combination of SSL with e-empirical distribution improved the BCI accuracy and ITR to 74.59 ± 12.12% and 3.37 ± 1.27 bits min-1, respectively. The event-related potential analysis revealed that contralateral and right-hemispheric dominances contributed to the BCI performance improvement.Significance.Our study demonstrated that a BCI based on multiple simultaneous auditory stimuli, incorporating SSL and e-empirical prior distribution, can increase the number of commands without sacrificing typing speed beyond the acceptable level of accuracy.
Collapse
Affiliation(s)
- Mikito Ogino
- Graduate School of Science and Technology, Keio University, Yokohama, Kanagawa, Japan
| | - Nozomu Hamada
- Faculty of Science and Technology, Keio University, Yokohama, Kanagawa, Japan
| | - Yasue Mitsukura
- Faculty of Science and Technology, Keio University, Yokohama, Kanagawa, Japan
| |
Collapse
|
9
|
Giordano BL, de Miranda Azevedo R, Plasencia-Calaña Y, Formisano E, Dumontier M. What do we mean with sound semantics, exactly? A survey of taxonomies and ontologies of everyday sounds. Front Psychol 2022; 13:964209. [PMID: 36312201 PMCID: PMC9601315 DOI: 10.3389/fpsyg.2022.964209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/18/2022] [Indexed: 11/28/2022] Open
Abstract
Taxonomies and ontologies for the characterization of everyday sounds have been developed in several research fields, including auditory cognition, soundscape research, artificial hearing, sound design, and medicine. Here, we surveyed 36 of such knowledge organization systems, which we identified through a systematic literature search. To evaluate the semantic domains covered by these systems within a homogeneous framework, we introduced a comprehensive set of verbal sound descriptors (sound source properties; attributes of sensation; sound signal descriptors; onomatopoeias; music genres), which we used to manually label the surveyed descriptor classes. We reveal that most taxonomies and ontologies were developed to characterize higher-level semantic relations between sound sources in terms of the sound-generating objects and actions involved (what/how), or in terms of the environmental context (where). This indicates the current lack of a comprehensive ontology of everyday sounds that covers simultaneously all semantic aspects of the relation between sounds. Such an ontology may have a wide range of applications and purposes, ranging from extending our scientific knowledge of auditory processes in the real world, to developing artificial hearing systems.
Collapse
Affiliation(s)
- Bruno L. Giordano
- Institut des Neurosciences de La Timone, CNRS UMR 7289 – Université Aix-Marseille, Marseille, France
- *Correspondence: Bruno L. Giordano
| | - Ricardo de Miranda Azevedo
- Faculty of Science and Engineering, Institute of Data Science, Maastricht University, Maastricht, Netherlands
| | - Yenisel Plasencia-Calaña
- Faculty of Science and Engineering, BISS (Brightlands Institute for Smart Society) Institute, Maastricht University, Maastricht, Netherlands
| | - Elia Formisano
- Faculty of Science and Engineering, BISS (Brightlands Institute for Smart Society) Institute, Maastricht University, Maastricht, Netherlands
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Michel Dumontier
- Faculty of Science and Engineering, Institute of Data Science, Maastricht University, Maastricht, Netherlands
- Faculty of Science and Engineering, BISS (Brightlands Institute for Smart Society) Institute, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
10
|
Acoustic Descriptors for Characterization of Musical Timbre Using the Fast Fourier Transform. ELECTRONICS 2022. [DOI: 10.3390/electronics11091405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The quantitative assessment of the musical timbre in an audio record is still an open-ended issue. Evaluating the musical timbre allows not only to establish precise musical parameters but also the recognition, classification of musical instruments, and assessment of the musical quality of a sound record. In this paper, we present a minimum set of dimensionless descriptors, motivated by musical acoustics, using the spectra obtained by the Fast Fourier Transform (FFT), which allows describing the timbre of wooden aerophones (Bassoon, Clarinet, Transverse Flute, and Oboe) using individual sound recordings of the musical tempered scale. We postulate that the proposed descriptors are sufficient to describe the timbral characteristics in the aerophones studied, allowing their recognition using the acoustic spectral signature. We believe that this approach can be further extended to use multidimensional unsupervised machine learning techniques, such as clustering, to obtain new insights into timbre characterization.
Collapse
|
11
|
Xu Y, Wang W, Cui H, Xu M, Li M. Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2022; 2022:8. [PMID: 35440938 PMCID: PMC9011380 DOI: 10.1186/s13636-022-00240-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 03/26/2022] [Indexed: 06/14/2023]
Abstract
Humans can recognize someone's identity through their voice and describe the timbral phenomena of voices. Likewise, the singing voice also has timbral phenomena. In vocal pedagogy, vocal teachers listen and then describe the timbral phenomena of their student's singing voice. In this study, in order to enable machines to describe the singing voice from the vocal pedagogy point of view, we perform a task called paralinguistic singing attribute recognition. To achieve this goal, we first construct and publish an open source dataset named Singing Voice Quality and Technique Database (SVQTD) for supervised learning. All the audio clips in SVQTD are downloaded from YouTube and processed by music source separation and silence detection. For annotation, seven paralinguistic singing attributes commonly used in vocal pedagogy are adopted as the labels. Furthermore, to explore the different supervised machine learning algorithm for classifying each paralinguistic singing attribute, we adopt three main frameworks, namely openSMILE features with support vector machine (SF-SVM), end-to-end deep learning (E2EDL), and deep embedding with support vector machine (DE-SVM). Our methods are based on existing frameworks commonly employed in other paralinguistic speech attribute recognition tasks. In SF-SVM, we separately use the feature set of the INTERSPEECH 2009 Challenge and that of the INTERSPEECH 2016 Challenge as the SVM classifier's input. In E2EDL, the end-to-end framework separately utilizes the ResNet and transformer encoder as feature extractors. In particular, to handle two-dimensional spectrogram input for a transformer, we adopt a sliced multi-head self-attention (SMSA) mechanism. In the DE-SVM, we use the representation extracted from the E2EDL model as the input of the SVM classifier. Experimental results on SVQTD show no absolute winner between E2EDL and the DE-SVM, which means that the back-end SVM classifier with the representation learned by E2E as input does not necessarily improve the performance. However, the DE-SVM that utilizes the ResNet as the feature extractor achieves the best average UAR, with an average 16% improvement over that of the SF-SVM with INTERSPEECH's hand-crafted feature set.
Collapse
Affiliation(s)
- Yanze Xu
- Data Science Research Center, Duke Kunshan University, Kunshan, China
| | - Weiqing Wang
- Data Science Research Center, Duke Kunshan University, Kunshan, China
| | - Huahua Cui
- Advanced Computing East China Sub-Center, Suzhou, China
| | - Mingyang Xu
- Advanced Computing East China Sub-Center, Suzhou, China
| | - Ming Li
- Data Science Research Center, Duke Kunshan University, Kunshan, China
| |
Collapse
|
12
|
Abstract
Does the brain perceive song as speech with melody? A new study using intracranial recordings and functional brain imaging in humans suggests that it does not. Instead, singing, instrumental music, and speech are represented by different neural populations.
Collapse
Affiliation(s)
- Liberty S Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX 78712, USA; Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
13
|
Reymore L, Beauvais-Lacasse E, Smith BK, McAdams S. Modeling Noise-Related Timbre Semantic Categories of Orchestral Instrument Sounds With Audio Features, Pitch Register, and Instrument Family. Front Psychol 2022; 13:796422. [PMID: 35432090 PMCID: PMC9010607 DOI: 10.3389/fpsyg.2022.796422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
Audio features such as inharmonicity, noisiness, and spectral roll-off have been identified as correlates of "noisy" sounds. However, such features are likely involved in the experience of multiple semantic timbre categories of varied meaning and valence. This paper examines the relationships of stimulus properties and audio features with the semantic timbre categories raspy/grainy/rough, harsh/noisy, and airy/breathy. Participants (n = 153) rated a random subset of 52 stimuli from a set of 156 approximately 2-s orchestral instrument sounds representing varied instrument families (woodwinds, brass, strings, percussion), registers (octaves 2 through 6, where middle C is in octave 4), and both traditional and extended playing techniques (e.g., flutter-tonguing, bowing at the bridge). Stimuli were rated on the three semantic categories of interest, as well as on perceived playing exertion and emotional valence. Correlational analyses demonstrated a strong negative relationship between positive valence and perceived physical exertion. Exploratory linear mixed models revealed significant effects of extended technique and pitch register on valence, the perception of physical exertion, raspy/grainy/rough, and harsh/noisy. Instrument family was significantly related to ratings of airy/breathy. With an updated version of the Timbre Toolbox (R-2021 A), we used 44 summary audio features, extracted from the stimuli using spectral and harmonic representations, as input for various models built to predict mean semantic ratings for each sound on the three semantic categories, on perceived exertion, and on valence. Random Forest models predicting semantic ratings from audio features outperformed Partial Least-Squares Regression models, consistent with previous results suggesting that non-linear methods are advantageous in timbre semantic predictions using audio features. Relative Variable Importance measures from the models among the three semantic categories demonstrate that although these related semantic categories are associated in part with overlapping features, they can be differentiated through individual patterns of audio feature relationships.
Collapse
Affiliation(s)
- Lindsey Reymore
- Department of Music Research, Schulich School of Music, McGill University, Montreal, QC, Canada
| | | | | | | |
Collapse
|
14
|
Heggli OA, Konvalinka I, Kringelbach ML, Vuust P. A metastable attractor model of self-other integration (MEAMSO) in rhythmic synchronization. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200332. [PMID: 34420393 DOI: 10.1098/rstb.2020.0332] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Human interaction is often accompanied by synchronized bodily rhythms. Such synchronization may emerge spontaneously as when a crowd's applause turns into a steady beat, be encouraged as in nursery rhymes, or be intentional as in the case of playing music together. The latter has been extensively studied using joint finger-tapping paradigms as a simplified version of rhythmic interpersonal synchronization. A key finding is that synchronization in such cases is multifaceted, with synchronized behaviour resting upon different synchronization strategies such as mutual adaptation, leading-following and leading-leading. However, there are multiple open questions regarding the mechanism behind these strategies and how they develop dynamically over time. Here, we propose a metastable attractor model of self-other integration (MEAMSO). This model conceptualizes dyadic rhythmic interpersonal synchronization as a process of integrating and segregating signals of self and other. Perceived sounds are continuously evaluated as either being attributed to self-produced or other-produced actions. The model entails a metastable system with two particular attractor states: one where an individual maintains two separate predictive models for self- and other-produced actions, and the other where these two predictive models integrate into one. The MEAMSO explains the three known synchronization strategies and makes testable predictions about the dynamics of interpersonal synchronization both in behaviour and the brain. This article is part of the theme issue 'Synchrony and rhythm interaction: from the brain to behavioural ecology'.
Collapse
Affiliation(s)
- Ole Adrian Heggli
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University and the Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark
| | - Ivana Konvalinka
- SINe Lab, Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Morten L Kringelbach
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University and the Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark.,Centre for Eudaimonia and Human Flourishing, Department of Psychiatry, University of Oxford, Oxford, UK
| | - Peter Vuust
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University and the Royal Academy of Music Aarhus/Aalborg, Aarhus, Denmark
| |
Collapse
|
15
|
Doi H, Yamaguchi K, Sugisaki S. Timbral perception is influenced by unconscious presentation of hands playing musical instruments. Q J Exp Psychol (Hove) 2021; 75:1186-1191. [PMID: 34507501 DOI: 10.1177/17470218211048032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Timbre is an integral dimension of musical sound quality, and people accumulate knowledge about timbre of sounds generated by various musical instruments throughout their life. Recent studies have proposed the possibility that musical sound is crossmodally integrated with visual information related to the sound. However, little is known about the influence of visual information on musical timbre perception. The present study investigated the automaticity of crossmodal integration between musical timbre and visual image of hands playing musical instruments. In the experiment, an image of hands playing piano or violin, or a control scrambled image was presented to participants unconsciously. Simultaneously, participants heard intermediate sounds synthesised by morphing piano and violin sounds with the same note. The participants answered whether the musical tone sounded like piano or violin. The results revealed that participants were more likely to perceive violin sound when an image of a violin was presented unconsciously than when playing piano was presented. This finding indicates that timbral perception of musical sound is influenced by visual information of musical performance without conscious awareness, supporting the automaticity of crossmodal integration in musical timbre perception.
Collapse
Affiliation(s)
- Hirokazu Doi
- School of Science and Engineering, Kokushikan University, Tokyo, Japan
| | - Kazuki Yamaguchi
- School of Science and Engineering, Kokushikan University, Tokyo, Japan
| | - Shoma Sugisaki
- School of Science and Engineering, Kokushikan University, Tokyo, Japan
| |
Collapse
|
16
|
Colonel JT, Reiss J. Reverse engineering of a recording mix with differentiable digital signal processing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:608. [PMID: 34340521 DOI: 10.1121/10.0005622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 06/24/2021] [Indexed: 06/13/2023]
Abstract
A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model linear time-invariant effects such as gain, pan, equalisation, delay, and reverb. Nonlinear effects, such as distortion and compression, are not considered in this work. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. Two reverb module architectures are proposed, a "stereo reverb" model and an "individual reverb" model, and each is discussed. Objective feature measures are taken of the outputs of the two architectures when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the two architectures can perceptually match a reference mix when compared to a stereo gain mix. Results show that the stereo reverb model performs best on objective measures and there is no statistically significant difference between the participants' perception of the stereo reverb model and reference mixes.
Collapse
Affiliation(s)
- Joseph T Colonel
- Centre for Digital Music, Queen Mary University of London, London, United Kingdom
| | - Joshua Reiss
- Centre for Digital Music, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
17
|
Siedenburg K, Jacobsen S, Reuter C. Spectral envelope position and shape in sustained musical instrument sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3715. [PMID: 34241486 DOI: 10.1121/10.0005088] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 05/09/2021] [Indexed: 06/13/2023]
Abstract
It has been argued that the relative position of spectral envelopes along the frequency axis serves as a cue for musical instrument size (e.g., violin vs viola) and that the shape of the spectral envelope encodes family identity (violin vs flute). It is further known that fundamental frequency (F0), F0-register for specific instruments, and dynamic level strongly affect spectral properties of acoustical instrument sounds. However, the associations between these factors have not been rigorously quantified for a representative set of musical instruments. Here, we analyzed 5640 sounds from 50 sustained orchestral instruments sampled across their entire range of F0s at three dynamic levels. Regression of spectral centroid (SC) values that index envelope position indicated that smaller instruments possessed higher SC values for a majority of instrument classes (families), but SC also correlated with F0 and was strongly and consistently affected by the dynamic level. Instrument classification using relatively low-dimensional cepstral audio descriptors allowed for discrimination between instrument classes with accuracies beyond 80%. Envelope shape became much less indicative of instrument class whenever the classification problem involved generalization to different dynamic levels or F0-registers. These analyses confirm that spectral envelopes encode information about instrument size and family identity and highlight their dependence on F0(-register) and dynamic level.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, 26129 Oldenburg, Germany
| | - Simon Jacobsen
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, 26129 Oldenburg, Germany
| | - Christoph Reuter
- Department of Musicology, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
18
|
Lostanlen V, El-Hajj C, Rossignol M, Lafay G, Andén J, Lagrange M. Time-frequency scattering accurately models auditory similarities between instrumental playing techniques. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2021; 2021:3. [PMID: 33488686 PMCID: PMC7801324 DOI: 10.1186/s13636-020-00187-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 11/09/2020] [Indexed: 06/12/2023]
Abstract
Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time-frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 99.0%±1. An ablation study demonstrates that removing either the joint time-frequency scattering transform or the metric learning algorithm noticeably degrades performance.
Collapse
Affiliation(s)
- Vincent Lostanlen
- LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France
| | - Christian El-Hajj
- LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France
| | | | | | - Joakim Andén
- Department of Mathematics, KTH Royal Institute of Technology, Lindstedtsvägen 25, Stockholm, SE-100 44 Sweden
- Center for Computational Mathematics, Flatiron Institute, 162 5th Avenue, New York, 10010 NY USA
| | - Mathieu Lagrange
- LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France
| |
Collapse
|
19
|
Orofacial Trauma on the Anterior Zone of a Trumpet's Player Maxilla: Concept of the Oral Rehabilitation-A Case Report. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17249423. [PMID: 33339137 PMCID: PMC7765605 DOI: 10.3390/ijerph17249423] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/11/2020] [Accepted: 12/12/2020] [Indexed: 12/26/2022]
Abstract
Background: The occurrence of an orofacial trauma can originate health, social, economic and professional problems. A 13-year boy suffered the avulsion of tooth 11 and 21, lost at the scenario. Methods: Three intraoral appliances were manufactured: A Hawley appliance with a central expansion screw and two central incisors (1), trumpet edentulous anterior tooth appliance (2) and a customized splint (3) were designed as part of the rehabilitation procedure. Objectively assessing the sound quality of the trumpet player with these new devices in terms of its spectral, temporal, and spectro-temporal audio properties. A linear frequency response microphone was adopted for precision measurement of pitch, loudness, and timbre descriptors. Results: Pitch deviations may result from the different intra-oral appliances due to the alteration of the mouth cavity, respectively, the area occupied and modification/interaction with the anatomy. This investigation supports the findings that the intra-oral appliance which occupies less volume is the best solution in terms of sound quality. Conclusions: Young wind instrumentalists should have dental impressions of their teeth made, so their dentist has the most reliable anatomy of the natural teeth in case of an orofacial trauma. Likewise, the registration of their sound quality should be done regularly to have standard parameters for comparison.
Collapse
|
20
|
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 2020; 5:369-377. [PMID: 33257878 DOI: 10.1038/s41562-020-00987-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 09/18/2020] [Indexed: 11/08/2022]
Abstract
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.
Collapse
|
21
|
Saitis C, Siedenburg K. Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2256. [PMID: 33138535 DOI: 10.1121/10.0002275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/30/2020] [Indexed: 06/11/2023]
Abstract
Timbre dissimilarity of orchestral sounds is well-known to be multidimensional, with attack time and spectral centroid representing its two most robust acoustical correlates. The centroid dimension is traditionally considered as reflecting timbral brightness. However, the question of whether multiple continuous acoustical and/or categorical cues influence brightness perception has not been addressed comprehensively. A triangulation approach was used to examine the dimensionality of timbral brightness, its robustness across different psychoacoustical contexts, and relation to perception of the sounds' source-cause. Listeners compared 14 acoustic instrument sounds in three distinct tasks that collected general dissimilarity, brightness dissimilarity, and direct multi-stimulus brightness ratings. Results confirmed that brightness is a robust unitary auditory dimension, with direct ratings recovering the centroid dimension of general dissimilarity. When a two-dimensional space of brightness dissimilarity was considered, its second dimension correlated with the attack-time dimension of general dissimilarity, which was interpreted as reflecting a potential infiltration of the latter into brightness dissimilarity. Dissimilarity data were further modeled using partial least-squares regression with audio descriptors as predictors. Adding predictors derived from instrument family and the type of resonator and excitation did not improve the model fit, indicating that brightness perception is underpinned primarily by acoustical rather than source-cause cues.
Collapse
Affiliation(s)
- Charalampos Saitis
- Audio Communication Group, TU Berlin, Einsteinufer 17c, D-10587 Berlin, Germany
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
22
|
Analysis and Modeling of Timbre Perception Features in Musical Sounds. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10030789] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A novel technique is proposed for the analysis and modeling of timbre perception features, including a new terminology system for evaluating timbre in musical instruments. This database consists of 16 expert and novice evaluation terms, including five pairs with opposite polarity. In addition, a material library containing 72 samples (including 37 Chinese orchestral instruments, 11 Chinese minority instruments, and 24 Western orchestral instruments) and a 54-sample objective acoustic parameter set were developed as part of the study. The method of successive categories was applied to each term for subjective assessment. A mathematical model of timbre perception features (i.e., bright or dark, raspy or mellow, sharp or vigorous, coarse or pure, and hoarse or consonant) was then developed for the first time using linear regression, support vector regression, a neural network, and random forest algorithms. Experimental results showed the proposed model accurately predicted these attributes. Finally, an improved technique for 3D timbre space construction is proposed. Auditory perception attributes for this 3D timbre space were determined by analyzing the correlation between each spatial dimension and the 16 timbre evaluation terms.
Collapse
|
23
|
Marty N, Marty M, Pfeuty M. Relative contribution of pitch and brightness to the auditory kappa effect. PSYCHOLOGICAL RESEARCH 2019; 85:55-67. [PMID: 31440814 DOI: 10.1007/s00426-019-01233-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 07/22/2019] [Indexed: 11/25/2022]
Abstract
Pitch height is known to interfere with temporal judgment. This is the case in the auditory kappa effect in which the relative degree of pitch distance separating two tones extends the perceived duration of the inter-onset interval (IOI). However, pitch variations which result from manipulations of the fundamental frequency of tones are associated with variations of the spectral centroid, which is related to the perceived brightness. The present study aimed at determining the relative contribution of pitch and brightness to the auditory kappa effect. Forty-eight participants performed an AXB paradigm (tone X was shifted to be closer to either tone A or B) in three conditions: the three tones varied in both pitch and brightness (PB condition), pitch varied but brightness was fixed (P condition) or brightness varied but pitch was fixed (B condition). Pitch and brightness were modified by manipulating the fundamental frequency (F0) and the spectral centroid of the tones, respectively. In each condition, the percentage of trials in which the first IOI was perceived as shorter increased as X was closer (in pitch and/or brightness) to A. Furthermore, the magnitude of the effect was larger in PB than in P condition, while it did not differ between PB and B conditions, suggesting that brightness would contribute more than pitch height to the auditory kappa effect. This study provides the first evidence that auditory brightness interferes with duration judgment and highlights the importance to consider jointly the role of pitch height and brightness in future studies on auditory temporal processing.
Collapse
Affiliation(s)
- Nicolas Marty
- Sorbonne University, 75000, Paris, France
- University of Bourgogne Franche-Comté, LEAD, UMR 5022, CNRS, 21000, Dijon, France
| | - Maxime Marty
- University of Bordeaux, INCIA, UMR 5287, CNRS, 146 rue Leo Saignat, 33076, Bordeaux, France
| | - Micha Pfeuty
- University of Bordeaux, INCIA, UMR 5287, CNRS, 146 rue Leo Saignat, 33076, Bordeaux, France.
| |
Collapse
|
24
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
25
|
Lim HP, Sanderson P. A comparison of two designs for earcons conveying pulse oximetry information. APPLIED ERGONOMICS 2019; 78:110-119. [PMID: 31046941 DOI: 10.1016/j.apergo.2019.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/20/2018] [Accepted: 01/27/2019] [Indexed: 06/09/2023]
Abstract
We performed a randomised controlled trial comparing two kinds of earcons that could provide intermittent pulse oximetry information about a patient's oxygen saturation (SpO2) and heart rate (HR). Timbre-earcons represented SpO2 levels with different levels of timbre, and pitch-earcons with different levels of pitch. Both kinds of earcons represented HR with tremolo. Participants using pitch-earcons identified SpO2 levels alone, and both SpO2 plus HR levels, significantly better than participants using timbre-earcons: p < .001 in both cases. However, there was no difference between earcon conditions in how effectively HR was identified, p = .422. For both kinds of earcons, identification of SpO2 levels was more compromised by simultaneous changes in HR than identification of HR levels was compromised by simultaneous changes in SpO2, suggesting asymmetric integrality. Overall, pitch-earcons may provide a better intermittent auditory pulse oximetry display than timbre-earcons, especially for clinical contexts when quiet is needed.
Collapse
Affiliation(s)
- Hai-Ping Lim
- School of Psychology, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Penelope Sanderson
- School of Psychology, The University of Queensland, St Lucia, QLD, 4072, Australia; School of Information Technology and Electrical Engineering, The University of Queensland, St Lucia, QLD, Australia; School of Clinical Medicine, The University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
26
|
Giraldo S, Waddell G, Nou I, Ortega A, Mayor O, Perez A, Williamon A, Ramirez R. Automatic Assessment of Tone Quality in Violin Music Performance. Front Psychol 2019; 10:334. [PMID: 30930804 PMCID: PMC6427949 DOI: 10.3389/fpsyg.2019.00334] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 02/04/2019] [Indexed: 11/13/2022] Open
Abstract
The automatic assessment of music performance has become an area of increasing interest due to the growing number of technology-enhanced music learning systems. In most of these systems, the assessment of musical performance is based on pitch and onset accuracy, but very few pay attention to other important aspects of performance, such as sound quality or timbre. This is particularly true in violin education, where the quality of timbre plays a significant role in the assessment of musical performances. However, obtaining quantifiable criteria for the assessment of timbre quality is challenging, as it relies on consensus among the subjective interpretations of experts. We present an approach to assess the quality of timbre in violin performances using machine learning techniques. We collected audio recordings of several tone qualities and performed perceptual tests to find correlations among different timbre dimensions. We processed the audio recordings to extract acoustic features for training tone-quality models. Correlations among the extracted features were analyzed and feature information for discriminating different timbre qualities were investigated. A real-time feedback system designed for pedagogical use was implemented in which users can train their own timbre models to assess and receive feedback on their performances.
Collapse
Affiliation(s)
- Sergio Giraldo
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - George Waddell
- Centre for Performance Science, Royal College of Music, London, United Kingdom
- Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Ignasi Nou
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Ariadna Ortega
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Oscar Mayor
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Alfonso Perez
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Aaron Williamon
- Centre for Performance Science, Royal College of Music, London, United Kingdom
- Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Rafael Ramirez
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| |
Collapse
|
27
|
Cortical Correlates of Attention to Auditory Features. J Neurosci 2019; 39:3292-3300. [PMID: 30804086 DOI: 10.1523/jneurosci.0588-18.2019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 02/12/2019] [Accepted: 02/13/2019] [Indexed: 11/21/2022] Open
Abstract
Pitch and timbre are two primary features of auditory perception that are generally considered independent. However, an increase in pitch (produced by a change in fundamental frequency) can be confused with an increase in brightness (an attribute of timbre related to spectral centroid) and vice versa. Previous work indicates that pitch and timbre are processed in overlapping regions of the auditory cortex, but are separable to some extent via multivoxel pattern analysis. Here, we tested whether attention to one or other feature increases the spatial separation of their cortical representations and if attention can enhance the cortical representation of these features in the absence of any physical change in the stimulus. Ten human subjects (four female, six male) listened to pairs of tone triplets varying in pitch, timbre, or both and judged which tone triplet had the higher pitch or brighter timbre. Variations in each feature engaged common auditory regions with no clear distinctions at a univariate level. Attending to one did not improve the separability of the neural representations of pitch and timbre at the univariate level. At the multivariate level, the classifier performed above chance in distinguishing between conditions in which pitch or timbre was discriminated. The results confirm that the computations underlying pitch and timbre perception are subserved by strongly overlapping cortical regions, but reveal that attention to one or other feature leads to distinguishable activation patterns even in the absence of physical differences in the stimuli.SIGNIFICANCE STATEMENT Although pitch and timbre are generally thought of as independent auditory features of a sound, pitch height and timbral brightness can be confused for one another. This study shows that pitch and timbre variations are represented in overlapping regions of auditory cortex, but that they produce distinguishable patterns of activation. Most importantly, the patterns of activation can be distinguished based on whether subjects attended to pitch or timbre even when the stimuli remained physically identical. The results therefore show that variations in pitch and timbre are represented by overlapping neural networks, but that attention to different features of the same sound can lead to distinguishable patterns of activation.
Collapse
|
28
|
Piazza EA, Theunissen FE, Wessel D, Whitney D. Rapid Adaptation to the Timbre of Natural Sounds. Sci Rep 2018; 8:13826. [PMID: 30218053 PMCID: PMC6138731 DOI: 10.1038/s41598-018-32018-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 08/29/2018] [Indexed: 11/09/2022] Open
Abstract
Timbre, the unique quality of a sound that points to its source, allows us to quickly identify a loved one's voice in a crowd and distinguish a buzzy, bright trumpet from a warm cello. Despite its importance for perceiving the richness of auditory objects, timbre is a relatively poorly understood feature of sounds. Here we demonstrate for the first time that listeners adapt to the timbre of a wide variety of natural sounds. For each of several sound classes, participants were repeatedly exposed to two sounds (e.g., clarinet and oboe, male and female voice) that formed the endpoints of a morphed continuum. Adaptation to timbre resulted in consistent perceptual aftereffects, such that hearing sound A significantly altered perception of a neutral morph between A and B, making it sound more like B. Furthermore, these aftereffects were robust to moderate pitch changes, suggesting that adaptation to timbral features used for object identification drives these effects, analogous to face adaptation in vision.
Collapse
Affiliation(s)
- Elise A Piazza
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08544, USA. .,Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, 94720, USA. .,Vision Science Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA.
| | - Frédéric E Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, 94720, USA.,Department of Psychology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - David Wessel
- Department of Music, University of California, Berkeley, Berkeley, CA, 94720, USA.,Center for New Music and Audio Technologies, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - David Whitney
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, 94720, USA.,Vision Science Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA.,Department of Psychology, University of California, Berkeley, Berkeley, CA, 94720, USA
| |
Collapse
|
29
|
Ogg M, Slevc LR, Idsardi WJ. The time course of sound category identification: Insights from acoustic features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3459. [PMID: 29289109 DOI: 10.1121/1.5014057] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Humans have an impressive, automatic capacity for identifying and organizing sounds in their environment. However, little is known about the timescales that sound identification functions on, or the acoustic features that listeners use to identify auditory objects. To better understand the temporal and acoustic dynamics of sound category identification, two go/no-go perceptual gating studies were conducted. Participants heard speech, musical instrument, and human-environmental sounds ranging from 12.5 to 200 ms in duration. Listeners could reliably identify sound categories with just 25 ms of duration. In experiment 1, participants' performance on instrument sounds showed a distinct processing advantage at shorter durations. Experiment 2 revealed that this advantage was largely dependent on regularities in instrument onset characteristics relative to the spectrotemporal complexity of environmental sounds and speech. Models of participant responses indicated that listeners used spectral, temporal, noise, and pitch cues in the task. Aspects of spectral centroid were associated with responses for all categories, while noisiness and spectral flatness were associated with environmental and instrument responses, respectively. Responses for speech and environmental sounds were also associated with spectral features that varied over time. Experiment 2 indicated that variability in fundamental frequency was useful in identifying steady state speech and instrument stimuli.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, 4090 Union Drive, College Park, Maryland 20742, USA
| | - L Robert Slevc
- Department of Psychology, University of Maryland, 4094 Campus Drive, College Park, Maryland 20742, USA
| | - William J Idsardi
- Department of Linguistics, University of Maryland, 1401 Marie Mount Hall, College Park, Maryland 20742, USA
| |
Collapse
|
30
|
Encoding of natural timbre dimensions in human auditory cortex. Neuroimage 2017; 166:60-70. [PMID: 29080711 DOI: 10.1016/j.neuroimage.2017.10.050] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 10/19/2017] [Accepted: 10/24/2017] [Indexed: 11/22/2022] Open
Abstract
Timbre, or sound quality, is a crucial but poorly understood dimension of auditory perception that is important in describing speech, music, and environmental sounds. The present study investigates the cortical representation of different timbral dimensions. Encoding models have typically incorporated the physical characteristics of sounds as features when attempting to understand their neural representation with functional MRI. Here we test an encoding model that is based on five subjectively derived dimensions of timbre to predict cortical responses to natural orchestral sounds. Results show that this timbre model can outperform other models based on spectral characteristics, and can perform as well as a complex joint spectrotemporal modulation model. In cortical regions at the medial border of Heschl's gyrus, bilaterally, and regions at its posterior adjacency in the right hemisphere, the timbre model outperforms even the complex joint spectrotemporal modulation model. These findings suggest that the responses of cortical neuronal populations in auditory cortex may reflect the encoding of perceptual timbre dimensions.
Collapse
|
31
|
Piazza EA, Iordan MC, Lew-Williams C. Mothers Consistently Alter Their Unique Vocal Fingerprints When Communicating with Infants. Curr Biol 2017; 27:3162-3167.e3. [PMID: 29033333 PMCID: PMC5656453 DOI: 10.1016/j.cub.2017.08.074] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Revised: 07/25/2017] [Accepted: 08/30/2017] [Indexed: 11/25/2022]
Abstract
The voice is the most direct link we have to others' minds, allowing us to communicate using a rich variety of speech cues [1, 2]. This link is particularly critical early in life as parents draw infants into the structure of their environment using infant-directed speech (IDS), a communicative code with unique pitch and rhythmic characteristics relative to adult-directed speech (ADS) [3, 4]. To begin breaking into language, infants must discern subtle statistical differences about people and voices in order to direct their attention toward the most relevant signals. Here, we uncover a new defining feature of IDS: mothers significantly alter statistical properties of vocal timbre when speaking to their infants. Timbre, the tone color or unique quality of a sound, is a spectral fingerprint that helps us instantly identify and classify sound sources, such as individual people and musical instruments [5-7]. We recorded 24 mothers' naturalistic speech while they interacted with their infants and with adult experimenters in their native language. Half of the participants were English speakers, and half were not. Using a support vector machine classifier, we found that mothers consistently shifted their timbre between ADS and IDS. Importantly, this shift was similar across languages, suggesting that such alterations of timbre may be universal. These findings have theoretical implications for understanding how infants tune in to their local communicative environments. Moreover, our classification algorithm for identifying infant-directed timbre has direct translational implications for speech recognition technology.
Collapse
Affiliation(s)
- Elise A Piazza
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Princeton University, Princeton, NJ 08544, USA.
| | - Marius Cătălin Iordan
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Casey Lew-Williams
- Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
32
|
Thoret E, Depalle P, McAdams S. Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification. Front Psychol 2017; 8:587. [PMID: 28450846 PMCID: PMC5390014 DOI: 10.3389/fpsyg.2017.00587] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 03/29/2017] [Indexed: 11/21/2022] Open
Abstract
The ability of a listener to recognize sound sources, and in particular musical instruments from the sounds they produce, raises the question of determining the acoustical information used to achieve such a task. It is now well known that the shapes of the temporal and spectral envelopes are crucial to the recognition of a musical instrument. More recently, Modulation Power Spectra (MPS) have been shown to be a representation that potentially explains the perception of musical instrument sounds. Nevertheless, the question of which specific regions of this representation characterize a musical instrument is still open. An identification task was applied to two subsets of musical instruments: tuba, trombone, cello, saxophone, and clarinet on the one hand, and marimba, vibraphone, guitar, harp, and viola pizzicato on the other. The sounds were processed with filtered spectrotemporal modulations with 2D Gaussian windows. The most relevant regions of this representation for instrument identification were determined for each instrument and reveal the regions essential for their identification. The method used here is based on a “molecular approach,” the so-called bubbles method. Globally, the instruments were correctly identified and the lower values of spectrotemporal modulations are the most important regions of the MPS for recognizing instruments. Interestingly, instruments that were confused with each other led to non-overlapping regions and were confused when they were filtered in the most salient region of the other instrument. These results suggest that musical instrument timbres are characterized by specific spectrotemporal modulations, information which could contribute to music information retrieval tasks such as automatic source recognition.
Collapse
Affiliation(s)
- Etienne Thoret
- Schulich School of Music, McGill UniversityMontreal, QC, Canada
| | | | - Stephen McAdams
- Schulich School of Music, McGill UniversityMontreal, QC, Canada
| |
Collapse
|
33
|
Thoret E, Depalle P, McAdams S. Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:EL478. [PMID: 28039992 DOI: 10.1121/1.4971204] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Modulation Power Spectra include dimensions of spectral and temporal modulation that contribute significantly to the perception of musical instrument timbres. Nevertheless, it remains unknown whether each instrument's identity is characterized by specific regions in this representation. A recognition task was applied to tuba, trombone, cello, saxophone, and clarinet sounds resynthesized with filtered spectrotemporal modulations. The most relevant parts of this representation for instrument identification were determined for each instrument. In addition, instruments that were confused with each other led to non-overlapping spectrotemporal modulation regions, suggesting that musical instrument timbres are characterized by specific spectrotemporal modulations.
Collapse
Affiliation(s)
- Etienne Thoret
- Schulich School of Music, McGill University, Montreal, Quebec, Canada , ,
| | - Philippe Depalle
- Schulich School of Music, McGill University, Montreal, Quebec, Canada , ,
| | - Stephen McAdams
- Schulich School of Music, McGill University, Montreal, Quebec, Canada , ,
| |
Collapse
|
34
|
Abstract
Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users’ speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users’ speech and music perception, bimodal listening may partially compensate for these deficits.
Collapse
Affiliation(s)
- Joseph D Crew
- University of Southern California, Los Angeles, CA, USA
| | | | - Qian-Jie Fu
- University of California-Los Angeles, CA, USA
| |
Collapse
|
35
|
Schelinski S, Roswandowitz C, von Kriegstein K. Voice identity processing in autism spectrum disorder. Autism Res 2016; 10:155-168. [PMID: 27404447 DOI: 10.1002/aur.1639] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 04/01/2016] [Accepted: 04/04/2016] [Indexed: 12/20/2022]
Abstract
People with autism spectrum disorder (ASD) have difficulties in identifying another person by face and voice. This might contribute considerably to the development of social cognition and interaction difficulties. The characteristics of the voice recognition deficit in ASD are unknown. Here, we used a comprehensive behavioral test battery to systematically investigate voice processing in high-functioning ASD (n = 16) and typically developed pair-wise matched controls (n = 16). The ASD group had particular difficulties with discriminating, learning, and recognizing unfamiliar voices, while recognizing famous voices was relatively intact. Tests on acoustic processing abilities showed that the ASD group had a specific deficit in vocal pitch perception that was dissociable from otherwise intact acoustic processing (i.e., musical pitch, musical, and vocal timbre perception). Our results allow a characterization of the voice recognition deficit in ASD: The findings indicate that in high-functioning ASD, the difficulty to recognize voices is particularly pronounced for learning novel voices and the recognition of unfamiliar peoples' voices. This pattern might be indicative of difficulties with integrating the acoustic characteristics of the voice into a coherent percept-a function that has been previously associated with voice-selective regions in the posterior superior temporal sulcus/gyrus of the human brain. Autism Res 2017, 10: 155-168. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Stefanie Schelinski
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany.,Humboldt University of Berlin, Berlin, Germany
| | - Claudia Roswandowitz
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | | |
Collapse
|
36
|
Isnard V, Taffou M, Viaud-Delmon I, Suied C. Auditory Sketches: Very Sparse Representations of Sounds Are Still Recognizable. PLoS One 2016; 11:e0150313. [PMID: 26950589 PMCID: PMC4780819 DOI: 10.1371/journal.pone.0150313] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Accepted: 02/11/2016] [Indexed: 02/03/2023] Open
Abstract
Sounds in our environment like voices, animal calls or musical instruments are easily recognized by human listeners. Understanding the key features underlying this robust sound recognition is an important question in auditory science. Here, we studied the recognition by human listeners of new classes of sounds: acoustic and auditory sketches, sounds that are severely impoverished but still recognizable. Starting from a time-frequency representation, a sketch is obtained by keeping only sparse elements of the original signal, here, by means of a simple peak-picking algorithm. Two time-frequency representations were compared: a biologically grounded one, the auditory spectrogram, which simulates peripheral auditory filtering, and a simple acoustic spectrogram, based on a Fourier transform. Three degrees of sparsity were also investigated. Listeners were asked to recognize the category to which a sketch sound belongs: singing voices, bird calls, musical instruments, and vehicle engine noises. Results showed that, with the exception of voice sounds, very sparse representations of sounds (10 features, or energy peaks, per second) could be recognized above chance. No clear differences could be observed between the acoustic and the auditory sketches. For the voice sounds, however, a completely different pattern of results emerged, with at-chance or even below-chance recognition performances, suggesting that the important features of the voice, whatever they are, were removed by the sketch process. Overall, these perceptual results were well correlated with a model of auditory distances, based on spectro-temporal excitation patterns (STEPs). This study confirms the potential of these new classes of sounds, acoustic and auditory sketches, to study sound recognition.
Collapse
Affiliation(s)
- Vincent Isnard
- Espaces Acoustiques et Cognitifs, Sorbonne Universités, UPMC Univ Paris 06, CNRS, IRCAM, STMS, Paris, France
- Département Action et Cognition en Situation Opérationnelle, Institut de Recherche Biomédicale des Armées, Brétigny-sur-Orge, France
- * E-mail: (VI); (CS)
| | - Marine Taffou
- Espaces Acoustiques et Cognitifs, Sorbonne Universités, UPMC Univ Paris 06, CNRS, IRCAM, STMS, Paris, France
| | - Isabelle Viaud-Delmon
- Espaces Acoustiques et Cognitifs, Sorbonne Universités, UPMC Univ Paris 06, CNRS, IRCAM, STMS, Paris, France
| | - Clara Suied
- Département Action et Cognition en Situation Opérationnelle, Institut de Recherche Biomédicale des Armées, Brétigny-sur-Orge, France
- * E-mail: (VI); (CS)
| |
Collapse
|
37
|
Siedenburg K, Jones-Mollerup K, McAdams S. Acoustic and Categorical Dissimilarity of Musical Timbre: Evidence from Asymmetries Between Acoustic and Chimeric Sounds. Front Psychol 2016; 6:1977. [PMID: 26779086 PMCID: PMC4700179 DOI: 10.3389/fpsyg.2015.01977] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 12/10/2015] [Indexed: 11/13/2022] Open
Abstract
This paper investigates the role of acoustic and categorical information in timbre dissimilarity ratings. Using a Gammatone-filterbank-based sound transformation, we created tones that were rated as less familiar than recorded tones from orchestral instruments and that were harder to associate with an unambiguous sound source (Experiment 1). A subset of transformed tones, a set of orchestral recordings, and a mixed set were then rated on pairwise dissimilarity (Experiment 2A). We observed that recorded instrument timbres clustered into subsets that distinguished timbres according to acoustic and categorical properties. For the subset of cross-category comparisons in the mixed set, we observed asymmetries in the distribution of ratings, as well as a stark decay of inter-rater agreement. These effects were replicated in a more robust within-subjects design (Experiment 2B) and cannot be explained by acoustic factors alone. We finally introduced a novel model of timbre dissimilarity based on partial least-squares regression that compared the contributions of both acoustic and categorical timbre descriptors. The best model fit (R2 = 0.88) was achieved when both types of descriptors were taken into account. These findings are interpreted as evidence for an interplay of acoustic and categorical information in timbre dissimilarity perception.
Collapse
Affiliation(s)
- Kai Siedenburg
- Centre for Interdisciplinary Research in Music Media and Technology, Schulich School of Music, McGill UniversityMontreal, QC, Canada; Signal Processing Group, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4All, University of OldenburgOldenburg, Germany
| | - Kiray Jones-Mollerup
- Centre for Interdisciplinary Research in Music Media and Technology, Schulich School of Music, McGill University Montreal, QC, Canada
| | - Stephen McAdams
- Centre for Interdisciplinary Research in Music Media and Technology, Schulich School of Music, McGill University Montreal, QC, Canada
| |
Collapse
|
38
|
Andrillon T, Kouider S, Agus T, Pressnitzer D. Perceptual Learning of Acoustic Noise Generates Memory-Evoked Potentials. Curr Biol 2015; 25:2823-2829. [DOI: 10.1016/j.cub.2015.09.027] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 07/22/2015] [Accepted: 09/09/2015] [Indexed: 11/16/2022]
|
39
|
Töpken S, Verhey JL, Weber R. Perceptual space, pleasantness and periodicity of multi-tone sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:288-298. [PMID: 26233029 DOI: 10.1121/1.4922783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Technical sounds often contain several tonal components, forming a multi-tone sound. The present study investigates the perception of multi-tone sounds consisting of two harmonic complexes with different fundamental frequencies and combination tones with frequencies that are equal to the sum of multiple integers of the two fundamentals. The experimental parameter is the ratio between the two fundamental frequencies ρ. A total of 15 synthetic multi-tone sounds are rated by 37 participants. In the first experiment, the perceptual space is assessed based on 16 adjective scales using categorical scaling. The resulting perceptual space has the four dimensions (i) pleasant, (ii) power, (iii) temporal structure, and (iv) spectral content of the sounds. In the second experiment, the pleasantness is measured with a paired comparison test. The data consistently show that sounds based on ratios of small integers (e.g., ρ=4:3) are significantly less pleasant than sounds with ratios based on large integers which were constructed by a slight detuning from a ratio of small integers. The repetition rate derived from an autocorrelation analysis of the stimuli turns out to be a good predictor of the (un-)pleasantness sensation.
Collapse
Affiliation(s)
- Stephan Töpken
- Acoustics Group, Department of Medical Physics and Acoustics, Carl von Ossietzky University, Carl-von-Ossietzky-Str. 9-11, 26111 Oldenburg, Germany
| | - Jesko L Verhey
- Department of Experimental Audiology, Otto von Guericke University, Leipziger Straße 44, 39120 Magdeburg, Germany
| | - Reinhard Weber
- Acoustics Group, Department of Medical Physics and Acoustics, Carl von Ossietzky University, Carl-von-Ossietzky-Str. 9-11, 26111 Oldenburg, Germany
| |
Collapse
|
40
|
|
41
|
|
42
|
Cousineau M, Carcagno S, Demany L, Pressnitzer D. What is a melody? On the relationship between pitch and brightness of timbre. Front Syst Neurosci 2014; 7:127. [PMID: 24478638 PMCID: PMC3894522 DOI: 10.3389/fnsys.2013.00127] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 12/25/2013] [Indexed: 11/13/2022] Open
Abstract
Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners’ task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities.
Collapse
Affiliation(s)
- Marion Cousineau
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, University of Montreal Montreal, QC, Canada
| | | | | | - Daniel Pressnitzer
- Laboratoire des Systèmes Perceptifs, CNRS UMR 8248 Paris, France ; Département d'études cognitives, École normale supérieure Paris, France
| |
Collapse
|
43
|
Town SM, Bizley JK. Neural and behavioral investigations into timbre perception. Front Syst Neurosci 2013; 7:88. [PMID: 24312021 PMCID: PMC3826062 DOI: 10.3389/fnsys.2013.00088] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 10/27/2013] [Indexed: 11/23/2022] Open
Abstract
Timbre is the attribute that distinguishes sounds of equal pitch, loudness and duration. It contributes to our perception and discrimination of different vowels and consonants in speech, instruments in music and environmental sounds. Here we begin by reviewing human timbre perception and the spectral and temporal acoustic features that give rise to timbre in speech, musical and environmental sounds. We also consider the perception of timbre by animals, both in the case of human vowels and non-human vocalizations. We then explore the neural representation of timbre, first within the peripheral auditory system and later at the level of the auditory cortex. We examine the neural networks that are implicated in timbre perception and the computations that may be performed in auditory cortex to enable listeners to extract information about timbre. We consider whether single neurons in auditory cortex are capable of representing spectral timbre independently of changes in other perceptual attributes and the mechanisms that may shape neural sensitivity to timbre. Finally, we conclude by outlining some of the questions that remain about the role of neural mechanisms in behavior and consider some potentially fruitful avenues for future research.
Collapse
|