1
|
McMullin MA, Kumar R, Higgins NC, Gygi B, Elhilali M, Snyder JS. Preliminary Evidence for Global Properties in Human Listeners During Natural Auditory Scene Perception. Open Mind (Camb) 2024; 8:333-365. [PMID: 38571530 PMCID: PMC10990578 DOI: 10.1162/opmi_a_00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/10/2024] [Indexed: 04/05/2024] Open
Abstract
Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field's ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.
Collapse
Affiliation(s)
| | - Rohit Kumar
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan C. Higgins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, FL, USA
| | - Brian Gygi
- East Bay Institute for Research and Education, Martinez, CA, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV, USA
| |
Collapse
|
2
|
Wanke RD. Listening to Contemporary Art Music: A Morphodynamic Model of Cognition. J Cogn 2023; 6:32. [PMID: 37426057 PMCID: PMC10327862 DOI: 10.5334/joc.280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 04/20/2023] [Indexed: 07/11/2023] Open
Abstract
This paper proposes that the perceptual and cognitive mechanisms involved when listening to certain genres within "sound-based" music, such as post-spectralism, glitch-electronica, and electroacoustic music and in various areas of sound art, are best understood within a connectionist cognitive framework described by morphodynamic theory. By analysing the specific characteristics of sound-based music, it is explored how this kind of music works at perceptual and cognitive levels. The sound patterns found in these pieces engage listeners more readily at a phenomenological level rather than through establishing long-term conceptual associations. They consist of a set of geometries in motion appearing to the listener as "image schemata", as they embody Gestalt and kinaesthetic principles portraying the forces and tensions of our being in the physical world (e.g., figure-background, near-far, superimposition, compulsion, blockage). In applying morphodynamic theory to the listening process involved in this kind of music, this paper discusses the results of a listening survey designed to investigate the functional isomorphism between sound patterns and image schemata. The results suggest that this music can be seen as a mean term within a connectionist model between the acoustic-physical world and the symbolic level. This original perspective opens up new pathways to access this kind of music and leads to a more general understanding of today's modes of listening.
Collapse
Affiliation(s)
- Riccardo D. Wanke
- Centre for the Study of the Sociology and Aesthetics of Music –CESEM, Nova University Lisbon, Portugal
| |
Collapse
|
3
|
Dasdar S, Nasresfahani A, Kianfar N, Zarandi MM, Mobedshahi F, Dabiri S, Kouhi A. Perception of timbre in adult Cochlear implant users: Comparison of Iranian and Western musical instruments. Cochlear Implants Int 2023; 24:27-34. [PMID: 36495227 DOI: 10.1080/14670100.2022.2137909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
OBJECTIVES Cochlear implants (CI) have dramatically improved speech perception for patients with sensorineural hearing impairment. However, listening to music is a great challenge for them. This study examined the perception and appraisal of Iranian musical instruments comparing with similar Western instruments. METHODS Eleven adult CI users and 25 normal hearing (NH) individuals participated in this study. Musical stimuli of three commonly heard instrument pairs were prepared. Participants were asked to identify the instruments and rate their appraisal on a ten-point Likert scale (0 = dislike very much, 10 = like very much). RESULTS The instrument recognition rate was 40.6% among the CI users, and the mean appraisal score was 5.2 ± 2.7. NH listeners had none significant higher scores on both tasks with a recognition rate of 50.0% and the mean appraisal score of 6.9 ± 1.5. Iranian instruments were more recognized in both groups. Regarding their appraisal, the mean score for both types was almost equal in the NH group, while CI users more appraised Iranian instruments. CONCLUSION In addition to better recognition of Iranian instruments, they were particularly better appraised in the CI group. Iranian instruments provide suitable musical pieces for CI recipients that can be considered in rehabilitation programs.
Collapse
Affiliation(s)
- Shayan Dasdar
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| | - Azam Nasresfahani
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| | - Nika Kianfar
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| | - Masoud Motesadi Zarandi
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| | - Farzad Mobedshahi
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| | - Sasan Dabiri
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Kouhi
- Department of Cochlear Implant Center and Otorhinolaryngology, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
4
|
Saberi K, Hickok G. A critical analysis of Lin et al.'s (2021) failure to observe forward entrainment in pitch discrimination. Eur J Neurosci 2022; 56:5191-5200. [PMID: 35857282 PMCID: PMC9804316 DOI: 10.1111/ejn.15778] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 06/30/2022] [Accepted: 07/14/2022] [Indexed: 01/07/2023]
Abstract
Forward entrainment refers to that part of the entrainment process that outlasts the entraining stimulus. Several studies have demonstrated psychophysical forward entrainment in a pitch-discrimination task. In a recent paper, Lin et al. (2021) challenged these findings by demonstrating that a sequence of 4 entraining pure tones does not affect the ability to determine whether a frequency modulated pulse, presented after termination of the entraining sequence, has swept up or down in frequency. They concluded that rhythmic sequences do not facilitate pitch discrimination. Here, we describe several methodological and stimulus design flaws in Lin et al.'s study that may explain their failure to observe forward entrainment in pitch discrimination.
Collapse
Affiliation(s)
- Kourosh Saberi
- Department of Cognitive SciencesUniversity of CaliforniaIrvineCaliforniaUSA
| | - Gregory Hickok
- Department of Cognitive SciencesUniversity of CaliforniaIrvineCaliforniaUSA,Department of Language ScienceUniversity of CaliforniaIrvineCaliforniaUSA
| |
Collapse
|
5
|
Rosi V, Ravillion A, Houix O, Susini P. Best-worst scaling, an alternative method to assess perceptual sound qualities. JASA EXPRESS LETTERS 2022; 2:064404. [PMID: 36154161 DOI: 10.1121/10.0011752] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
When designing sound evaluation experiments, researchers rely on listening test methods, such as rating scales (RS). This work aims to investigate the suitability of best-worst scaling (BWS) for the perceptual evaluation of sound qualities. To do so, 20 participants rated the "brightness" of a corpus of instrumental sounds (N = 100) with RS and BWS methods. The results show that BWS procedure is the fastest and that RS and BWS are equivalent in terms of performance. Interestingly, participants preferred BWS over RS. Therefore, BWS is an alternative method that reliably measures perceptual sound qualities and could be used in many-sounds paradigm.
Collapse
Affiliation(s)
- Victor Rosi
- Sound Perception and Design group, STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75004 Paris, France , , ,
| | - Aliette Ravillion
- Sound Perception and Design group, STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75004 Paris, France , , ,
| | - Olivier Houix
- Sound Perception and Design group, STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75004 Paris, France , , ,
| | - Patrick Susini
- Sound Perception and Design group, STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75004 Paris, France , , ,
| |
Collapse
|
6
|
Wei Y, Gan L, Huang X. A Review of Research on the Neurocognition for Timbre Perception. Front Psychol 2022; 13:869475. [PMID: 35422736 PMCID: PMC9001888 DOI: 10.3389/fpsyg.2022.869475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 03/07/2022] [Indexed: 11/26/2022] Open
Abstract
As one of the basic elements in acoustic events, timbre influences the brain collectively with other factors such as pitch and loudness. Research on timbre perception involve interdisciplinary fields, including physical acoustics, auditory psychology, neurocognitive science and music theory, etc. From the perspectives of psychology and physiology, this article summarizes the features and functions of timbre perception as well as their correlation, among which the multi-dimensional scaling modeling methods to define timbre are the focus; the neurocognition and perception of timbre (including sensitivity, adaptability, memory capability, etc.) are outlined; related experiment findings (by using EEG/ERP, fMRI, etc.) on the deeper level of timbre perception in terms of neural cognition are summarized. In the meantime, potential problems in the process of experiments on timbre perception and future possibilities are also discussed. Thought sorting out the existing research contents, methods and findings of timbre perception, this article aims to provide heuristic guidance for researchers in related fields of timbre perception psychology, physiology and neural mechanism. It is believed that the study of timbre perception will be essential in various fields in the future, including neuroaesthetics, psychological intervention, artistic creation, rehabilitation, etc.
Collapse
Affiliation(s)
- Yuyan Wei
- Department of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Lin Gan
- Department of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, China
| | - Xiangdong Huang
- Department of Electrical and Information Engineering, Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Sköld M, Bresin R. Sonification of Complex Spectral Structures. Front Neurosci 2022; 16:832265. [PMID: 35360157 PMCID: PMC8960303 DOI: 10.3389/fnins.2022.832265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 02/04/2022] [Indexed: 11/24/2022] Open
Abstract
In this article, we present our work on the sonification of notated complex spectral structures. It is part of a larger research project about the design of a new notation system for representing sound-based musical structures. Complex spectral structures are notated with special symbols in the scores, which can be digitally rendered so that the user can hear key aspects of what has been notated. This hearing of the notated data is significantly different from reading the same data, and reveals the complexity hidden in its simplified notation. The digitally played score is not the music itself but can provide essential information about the music in ways that can only be obtained in sounding form. The playback needs to be designed so that the user can make relevant sonic readings of the sonified data. The sound notation system used here is an adaptation of Thoresen and Hedman's spectromorphological analysis notation. Symbols originally developed by Lasse Thoresen from Pierre Schaeffer's typo-morphology have in this system been adapted to display measurable spectral features of timbrel structure for the composition and transcription of sound-based musical structures. Spectrum category symbols are placed over a spectral grand-staff that combines indications of pitch and frequency values for the combined display of music related to pitch-based and spectral values. Spectral features of a musical structure such as spectral width and density are represented as graphical symbols and sonically rendered. In perceptual experiments we have verified that users can identify spectral notation parameters based on their sonification. This confirms the main principle of sonification that is that the data/dimensions relations in one domain, in our case notated representation of spectral features, are transformed in perceived relations in the audio domain, and back.
Collapse
Affiliation(s)
- Mattias Sköld
- Composition, Conducting and Music Theory, KMH Royal College of Music, Stockholm, Sweden
- Sound and Music Computing, KTH Royal Institute of Technology, Stockholm, Sweden
- *Correspondence: Mattias Sköld
| | - Roberto Bresin
- Sound and Music Computing, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
8
|
Abstract
White light can be decomposed into different colors, and a complex sound wave can be decomposed into its partials. While the physics behind transverse and longitudinal waves is quite different and several theories have been developed to investigate the complexity of colors and timbres, we can try to model their structural similarities through the language of categories. Then, we consider color mixing and color transition in painting, comparing them with timbre superposition and timbre morphing in orchestration and computer music in light of bicategories and bigroupoids. Colors and timbres can be a probe to investigate some relevant aspects of visual and auditory perception jointly with their connections. Thus, the use of categories proposed here aims to investigate color/timbre perception, influencing the computer science developments in this area.
Collapse
|
9
|
Oh Y, Zuwala JC, Salvagno CM, Tilbrook GA. The Impact of Pitch and Timbre Cues on Auditory Grouping and Stream Segregation. Front Neurosci 2022; 15:725093. [PMID: 35087369 PMCID: PMC8787191 DOI: 10.3389/fnins.2021.725093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 12/16/2021] [Indexed: 11/16/2022] Open
Abstract
In multi-talker listening environments, the culmination of different voice streams may lead to the distortion of each source’s individual message, causing deficits in comprehension. Voice characteristics, such as pitch and timbre, are major dimensions of auditory perception and play a vital role in grouping and segregating incoming sounds based on their acoustic properties. The current study investigated how pitch and timbre cues (determined by fundamental frequency, notated as F0, and spectral slope, respectively) can affect perceptual integration and segregation of complex-tone sequences within an auditory streaming paradigm. Twenty normal-hearing listeners participated in a traditional auditory streaming experiment using two alternating sequences of harmonic tone complexes A and B with manipulating F0 and spectral slope. Grouping ranges, the F0/spectral slope ranges over which auditory grouping occurs, were measured with various F0/spectral slope differences between tones A and B. Results demonstrated that the grouping ranges were maximized in the absence of the F0/spectral slope differences between tones A and B and decreased by 2 times as their differences increased to ±1-semitone F0 and ±1-dB/octave spectral slope. In other words, increased differences in either F0 or spectral slope allowed listeners to more easily distinguish between harmonic stimuli, and thus group them together less. These findings suggest that pitch/timbre difference cues play an important role in how we perceive harmonic sounds in an auditory stream, representing our ability to group or segregate human voices in a multi-talker listening environment.
Collapse
|
10
|
Kazazis S, Depalle P, McAdams S. Ordinal scaling of temporal audio descriptors and perceptual significance of attack temporal centroid in timbre spaces. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3461. [PMID: 34852574 DOI: 10.1121/10.0006788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 10/05/2021] [Indexed: 06/13/2023]
Abstract
Temporal audio features play an important role in timbre perception and sound identification. An experiment was conducted to test whether listeners are able to rank order synthesized stimuli over a wide range of feature values restricted within the range of instrument sounds. The following audio descriptors were tested: attack and decay time, temporal centroid with fixed attack and decay time, and inharmonicity. The results indicate that these descriptors are susceptible to ordinal scaling. The spectral envelope played an important role when ordering stimuli with various inharmonicity levels, whereas the shape of the amplitude envelope was an important parameter when ordering stimuli with different attack and decay times. Linear amplitude envelopes made the ordering of attack times easier and caused the least amount of confusion among listeners, whereas exponential envelopes were more effective when ordering decay times. Although there were many confusions in ordering short attack and decay times, listeners performed well in ordering temporal centroids even at very short attack and decay times. A meta-analysis of six timbre spaces was therefore conducted to test the explanatory power of attack time versus the attack temporal centroid along a perceptual dimension. The results indicate that attack temporal centroid has greater overall explanatory power than attack time itself.
Collapse
Affiliation(s)
- Savvas Kazazis
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| | - Philippe Depalle
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| | - Stephen McAdams
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| |
Collapse
|
11
|
Doi H, Yamaguchi K, Sugisaki S. Timbral perception is influenced by unconscious presentation of hands playing musical instruments. Q J Exp Psychol (Hove) 2021; 75:1186-1191. [PMID: 34507501 DOI: 10.1177/17470218211048032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Timbre is an integral dimension of musical sound quality, and people accumulate knowledge about timbre of sounds generated by various musical instruments throughout their life. Recent studies have proposed the possibility that musical sound is crossmodally integrated with visual information related to the sound. However, little is known about the influence of visual information on musical timbre perception. The present study investigated the automaticity of crossmodal integration between musical timbre and visual image of hands playing musical instruments. In the experiment, an image of hands playing piano or violin, or a control scrambled image was presented to participants unconsciously. Simultaneously, participants heard intermediate sounds synthesised by morphing piano and violin sounds with the same note. The participants answered whether the musical tone sounded like piano or violin. The results revealed that participants were more likely to perceive violin sound when an image of a violin was presented unconsciously than when playing piano was presented. This finding indicates that timbral perception of musical sound is influenced by visual information of musical performance without conscious awareness, supporting the automaticity of crossmodal integration in musical timbre perception.
Collapse
Affiliation(s)
- Hirokazu Doi
- School of Science and Engineering, Kokushikan University, Tokyo, Japan
| | - Kazuki Yamaguchi
- School of Science and Engineering, Kokushikan University, Tokyo, Japan
| | - Shoma Sugisaki
- School of Science and Engineering, Kokushikan University, Tokyo, Japan
| |
Collapse
|
12
|
Chen G, Hu Z, Guan N, Wang X. Finding therapeutic music for anxiety using scoring model. INT J INTELL SYST 2021. [DOI: 10.1002/int.22460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Gong Chen
- Shenzhen Research Institute The Hong Kong Polytechnic University Shenzhen China
- Department of Computing The Hong Kong Polytechnic University, Hung Hom Kowloon China
| | - Zhejing Hu
- Department of Computing The Hong Kong Polytechnic University, Hung Hom Kowloon China
| | - Nianhong Guan
- The Third Affiliated Hospital Sun Yat‐Sen University Guangzhou China
| | - Xiaoying Wang
- The Third Affiliated Hospital Sun Yat‐Sen University Guangzhou China
| |
Collapse
|
13
|
Kazazis S, Depalle P, McAdams S. Ordinal scaling of timbre-related spectral audio descriptors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3785. [PMID: 34241417 DOI: 10.1121/10.0005058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 04/29/2021] [Indexed: 06/13/2023]
Abstract
A psychophysical experiment was conducted to perceptually validate several spectral audio features through ordinal scaling: spectral centroid, spectral spread, spectral skewness, odd-to-even harmonic ratio, spectral slope, and harmonic spectral deviation. Several sets of stimuli per audio feature were synthesized at different fundamental frequencies and spectral centroids by controlling (wherever possible) each spectral feature independently of the others, thus isolating the effect that each feature had on the stimulus rankings within each sound set. Listeners were overall able to order stimuli varying along all the spectral features tested when presented with an appropriate spacing of feature values. For specific cases of stimuli in which the ordering task partially failed, psychophysical interpretations are provided to explain listeners' confusions. The results of the ordinal scaling experiment outline trajectories of spectral features that correspond to listeners' perceptions and suggest a number of sound synthesis parameters that could carry timbral contour information.
Collapse
Affiliation(s)
- Savvas Kazazis
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| | - Philippe Depalle
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| | - Stephen McAdams
- Schulich School of Music, McGill University, 555 Sherbrooke Street West, Montreal, Quebec H3A 1E3, Canada
| |
Collapse
|
14
|
|
15
|
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 2020; 5:369-377. [PMID: 33257878 DOI: 10.1038/s41562-020-00987-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 09/18/2020] [Indexed: 11/08/2022]
Abstract
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.
Collapse
|
16
|
Abstract
INTRODUCTION Cochlear implants (CIs) are biomedical devices that restore sound perception for people with severe-to-profound sensorineural hearing loss. Most postlingually deafened CI users are able to achieve excellent speech recognition in quiet environments. However, current CI sound processors remain limited in their ability to deliver fine spectrotemporal information, making it difficult for CI users to perceive complex sounds. Limited access to complex acoustic cues such as music, environmental sounds, lexical tones, and voice emotion may have significant ramifications on quality of life, social development, and community interactions. AREAS COVERED The purpose of this review article is to summarize the literature on CIs and music perception, with an emphasis on music training in pediatric CI recipients. The findings have implications on our understanding of noninvasive, accessible methods for improving auditory processing and may help advance our ability to improve sound quality and performance for implantees. EXPERT OPINION Music training, particularly in the pediatric population, may be able to continue to enhance auditory processing even after performance plateaus. The effects of these training programs appear generalizable to non-trained musical tasks, speech prosody and, emotion perception. Future studies should employ rigorous control groups involving a non-musical acoustic intervention, standardized auditory stimuli, and the provision of feedback.
Collapse
Affiliation(s)
- Nicole T Jiam
- Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco School of Medicine , San Francisco, CA, USA
| | - Charles Limb
- Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco School of Medicine , San Francisco, CA, USA
| |
Collapse
|
17
|
Saitis C, Siedenburg K. Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2256. [PMID: 33138535 DOI: 10.1121/10.0002275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/30/2020] [Indexed: 06/11/2023]
Abstract
Timbre dissimilarity of orchestral sounds is well-known to be multidimensional, with attack time and spectral centroid representing its two most robust acoustical correlates. The centroid dimension is traditionally considered as reflecting timbral brightness. However, the question of whether multiple continuous acoustical and/or categorical cues influence brightness perception has not been addressed comprehensively. A triangulation approach was used to examine the dimensionality of timbral brightness, its robustness across different psychoacoustical contexts, and relation to perception of the sounds' source-cause. Listeners compared 14 acoustic instrument sounds in three distinct tasks that collected general dissimilarity, brightness dissimilarity, and direct multi-stimulus brightness ratings. Results confirmed that brightness is a robust unitary auditory dimension, with direct ratings recovering the centroid dimension of general dissimilarity. When a two-dimensional space of brightness dissimilarity was considered, its second dimension correlated with the attack-time dimension of general dissimilarity, which was interpreted as reflecting a potential infiltration of the latter into brightness dissimilarity. Dissimilarity data were further modeled using partial least-squares regression with audio descriptors as predictors. Adding predictors derived from instrument family and the type of resonator and excitation did not improve the model fit, indicating that brightness perception is underpinned primarily by acoustical rather than source-cause cues.
Collapse
Affiliation(s)
- Charalampos Saitis
- Audio Communication Group, TU Berlin, Einsteinufer 17c, D-10587 Berlin, Germany
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
18
|
Cellists' sound quality is shaped by their primary postural behavior. Sci Rep 2020; 10:13882. [PMID: 32807898 PMCID: PMC7431865 DOI: 10.1038/s41598-020-70705-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 07/27/2020] [Indexed: 11/25/2022] Open
Abstract
During the last 20 years, the role of musicians’ body movements has emerged as a central question in instrument practice: Why do musicians make so many postural movements, for instance, with their torsos and heads, while playing musical instruments? The musical significance of such ancillary gestures is still an enigma and therefore remains a major pedagogical challenge, since one does not know if these movements should be considered essential embodied skills that improve musical expressivity. Although previous studies established clear connections between musicians’ body movements and musical structures (particularly for clarinet, piano or violin performances), no evidence of direct relationships between body movements and the quality of the produced timbre has ever been found. In this study, focusing on the area of bowed-string instruments, we address the problem by showing that cellists use a set of primary postural directions to develop fluid kinematic bow features (velocity, acceleration) that prevent the production of poor quality (i.e., harsh, shrill, whistling) sounds. By comparing the body-related angles between normal and posturally constrained playing situations, our results reveal that the chest rotation and vertical inclination made by cellists act as coordinative support for the kinematics of the bowing gesture. These findings support the experimental works of Alexander, especially those that showed the role of head movements with respect to the upper torso (the so-called primary control) in ensuring the smooth transmission of fine motor control in musicians all the way to the produced sound. More generally, our research highlights the importance of focusing on this fundamental postural sense to improve the quality of human activities across different domains (music, dance, sports, rehabilitation, working positions, etc.).
Collapse
|
19
|
The Timbre Perception Test (TPT): A new interactive musical assessment tool to measure timbre perception ability. Atten Percept Psychophys 2020; 82:3658-3675. [PMID: 32529570 PMCID: PMC7536169 DOI: 10.3758/s13414-020-02058-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
To date, tests that measure individual differences in the ability to perceive musical timbre are scarce in the published literature. The lack of such tool limits research on how timbre, a primary attribute of sound, is perceived and processed among individuals. The current paper describes the development of the Timbre Perception Test (TPT), in which participants use a slider to reproduce heard auditory stimuli that vary along three important dimensions of timbre: envelope, spectral flux, and spectral centroid. With a sample of 95 participants, the TPT was calibrated and validated against measures of related abilities and examined for its reliability. The results indicate that a short-version (8 minutes) of the TPT has good explanatory support from a factor analysis model, acceptable internal reliability (α = .69, ωt = .70), good test–retest reliability (r = .79) and substantial correlations with self-reported general musical sophistication (ρ = .63) and pitch discrimination (ρ = .56), as well as somewhat lower correlations with duration discrimination (ρ = .27), and musical instrument discrimination abilities (ρ = .33). Overall, the TPT represents a robust tool to measure an individual’s timbre perception ability. Furthermore, the use of sliders to perform a reproductive task has shown to be an effective approach in threshold testing. The current version of the TPT is openly available for research purposes.
Collapse
|
20
|
Schutz M, Gillard J. On the generalization of tones: A detailed exploration of non-speech auditory perception stimuli. Sci Rep 2020; 10:9520. [PMID: 32533008 PMCID: PMC7293323 DOI: 10.1038/s41598-020-63132-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 03/13/2020] [Indexed: 11/09/2022] Open
Abstract
The dynamic changes in natural sounds’ temporal structures convey important event-relevant information. However, prominent researchers have previously expressed concern that non-speech auditory perception research disproportionately uses simplistic stimuli lacking the temporal variation found in natural sounds. A growing body of work now demonstrates that some conclusions and models derived from experiments using simplistic tones fail to generalize, raising important questions about the types of stimuli used to assess the auditory system. To explore the issue empirically, we conducted a novel, large-scale survey of non-speech auditory perception research from four prominent journals. A detailed analysis of 1017 experiments from 443 articles reveals that 89% of stimuli employ amplitude envelopes lacking the dynamic variations characteristic of non-speech sounds heard outside the laboratory. Given differences in task outcomes and even the underlying perceptual strategies evoked by dynamic vs. invariant amplitude envelopes, this raises important questions of broad relevance to psychologists and neuroscientists alike. This lack of exploration of a property increasingly recognized as playing a crucial role in perception suggests future research using stimuli with time-varying amplitude envelopes holds significant potential for furthering our understanding of the auditory system’s basic processing capabilities.
Collapse
Affiliation(s)
- Michael Schutz
- School of the Arts, McMaster University, Hamilton, Canada. .,Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Canada.
| | - Jessica Gillard
- Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Canada
| |
Collapse
|
21
|
Erickson ML, Faulkner K, Johnstone PM, Hedrick MS, Stone T. Multidimensional Timbre Spaces of Cochlear Implant Vocoded and Non-vocoded Synthetic Female Singing Voices. Front Neurosci 2020; 14:307. [PMID: 32372904 PMCID: PMC7179674 DOI: 10.3389/fnins.2020.00307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 03/16/2020] [Indexed: 12/04/2022] Open
Abstract
Many post-lingually deafened cochlear implant (CI) users report that they no longer enjoy listening to music, which could possibly contribute to a perceived reduction in quality of life. One aspect of music perception, vocal timbre perception, may be difficult for CI users because they may not be able to use the same timbral cues available to normal hearing listeners. Vocal tract resonance frequencies have been shown to provide perceptual cues to voice categories such as baritone, tenor, mezzo-soprano, and soprano, while changes in glottal source spectral slope are believed to be related to perception of vocal quality dimensions such as fluty vs. brassy. As a first step toward understanding vocal timbre perception in CI users, we employed an 8-channel noise-band vocoder to test how vocoding can alter the timbral perception of female synthetic sung vowels across pitches. Non-vocoded and vocoded stimuli were synthesized with vibrato using 3 excitation source spectral slopes and 3 vocal tract transfer functions (mezzo-soprano, intermediate, soprano) at the pitches C4, B4, and F5. Six multi-dimensional scaling experiments were conducted: C4 not vocoded, C4 vocoded, B4 not vocoded, B4 vocoded, F5 not vocoded, and F5 vocoded. At the pitch C4, for both non-vocoded and vocoded conditions, dimension 1 grouped stimuli according to voice category and was most strongly predicted by spectral centroid from 0 to 2 kHz. While dimension 2 grouped stimuli according to excitation source spectral slope, it was organized slightly differently and predicted by different acoustic parameters in the non-vocoded and vocoded conditions. For pitches B4 and F5 spectral centroid from 0 to 2 kHz most strongly predicted dimension 1. However, while dimension 1 separated all 3 voice categories in the vocoded condition, dimension 1 only separated the soprano stimuli from the intermediate and mezzo-soprano stimuli in the non-vocoded condition. While it is unclear how these results predict timbre perception in CI listeners, in general, these results suggest that perhaps some aspects of vocal timbre may remain.
Collapse
Affiliation(s)
- Molly L. Erickson
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, TN, United States
| | | | | | | | | |
Collapse
|
22
|
Analysis and Modeling of Timbre Perception Features in Musical Sounds. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10030789] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A novel technique is proposed for the analysis and modeling of timbre perception features, including a new terminology system for evaluating timbre in musical instruments. This database consists of 16 expert and novice evaluation terms, including five pairs with opposite polarity. In addition, a material library containing 72 samples (including 37 Chinese orchestral instruments, 11 Chinese minority instruments, and 24 Western orchestral instruments) and a 54-sample objective acoustic parameter set were developed as part of the study. The method of successive categories was applied to each term for subjective assessment. A mathematical model of timbre perception features (i.e., bright or dark, raspy or mellow, sharp or vigorous, coarse or pure, and hoarse or consonant) was then developed for the first time using linear regression, support vector regression, a neural network, and random forest algorithms. Experimental results showed the proposed model accurately predicted these attributes. Finally, an improved technique for 3D timbre space construction is proposed. Auditory perception attributes for this 3D timbre space were determined by analyzing the correlation between each spatial dimension and the 16 timbre evaluation terms.
Collapse
|
23
|
|
24
|
Siedenburg K, Schädler MR, Hülsmeier D. Modeling the onset advantage in musical instrument recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:EL523. [PMID: 31893751 DOI: 10.1121/1.5141369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 11/27/2019] [Indexed: 06/10/2023]
Abstract
Sound onsets provide particularly valuable cues for musical instrument identification by human listeners. It has remained unclear whether this onset advantage is due to enhanced perceptual encoding or the richness of acoustical information during onsets. Here this issue was approached by modeling a recent study on instrument identification from tone excerpts [Siedenburg. (2019). J. Acoust. Soc. Am. 145(2), 1078-1087]. A simple Hidden Markov Model classifier with separable Gabor filterbank features simulated human performance and replicated the onset advantage observed previously for human listeners. These results provide evidence that the onset advantage may be driven by the distinct acoustic qualities of onsets.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, , ,
| | - Marc René Schädler
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, , ,
| | - David Hülsmeier
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, , ,
| |
Collapse
|
25
|
Marty N, Marty M, Pfeuty M. Relative contribution of pitch and brightness to the auditory kappa effect. PSYCHOLOGICAL RESEARCH 2019; 85:55-67. [PMID: 31440814 DOI: 10.1007/s00426-019-01233-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 07/22/2019] [Indexed: 11/25/2022]
Abstract
Pitch height is known to interfere with temporal judgment. This is the case in the auditory kappa effect in which the relative degree of pitch distance separating two tones extends the perceived duration of the inter-onset interval (IOI). However, pitch variations which result from manipulations of the fundamental frequency of tones are associated with variations of the spectral centroid, which is related to the perceived brightness. The present study aimed at determining the relative contribution of pitch and brightness to the auditory kappa effect. Forty-eight participants performed an AXB paradigm (tone X was shifted to be closer to either tone A or B) in three conditions: the three tones varied in both pitch and brightness (PB condition), pitch varied but brightness was fixed (P condition) or brightness varied but pitch was fixed (B condition). Pitch and brightness were modified by manipulating the fundamental frequency (F0) and the spectral centroid of the tones, respectively. In each condition, the percentage of trials in which the first IOI was perceived as shorter increased as X was closer (in pitch and/or brightness) to A. Furthermore, the magnitude of the effect was larger in PB than in P condition, while it did not differ between PB and B conditions, suggesting that brightness would contribute more than pitch height to the auditory kappa effect. This study provides the first evidence that auditory brightness interferes with duration judgment and highlights the importance to consider jointly the role of pitch height and brightness in future studies on auditory temporal processing.
Collapse
Affiliation(s)
- Nicolas Marty
- Sorbonne University, 75000, Paris, France
- University of Bourgogne Franche-Comté, LEAD, UMR 5022, CNRS, 21000, Dijon, France
| | - Maxime Marty
- University of Bordeaux, INCIA, UMR 5287, CNRS, 146 rue Leo Saignat, 33076, Bordeaux, France
| | - Micha Pfeuty
- University of Bordeaux, INCIA, UMR 5287, CNRS, 146 rue Leo Saignat, 33076, Bordeaux, France.
| |
Collapse
|
26
|
Osses Vecchi A, Kohlrausch A, Chaigne A. Perceptual similarity between piano notes: Experimental method applicable to reverberant and non-reverberant sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1024. [PMID: 31472553 DOI: 10.1121/1.5121311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/22/2019] [Indexed: 06/10/2023]
Abstract
In this paper an experimental method to quantify perceptual differences between acoustic stimuli is presented. The experiments are implemented as a signal-in-noise task, where two sounds are to be discriminated. By adjusting the signal-to-noise ratio (SNR) the difficulty of the sound discrimination is manipulated. If two sounds are very similar already, a low level of added noise (high SNR) makes the discrimination task difficult. For more dissimilar sounds, a higher amount of noise (lower SNR) is needed to affect discriminability. In other words, a strong correlation between SNR and similarity is expected. The experimental noises are generated to have similar spectro-temporal properties to those of the test stimuli. As a study case, the suggested method was used to evaluate recordings of one note played on seven Viennese pianos using (1) non-reverberant sounds (as recorded) and (2) reverberant sounds, where reverberation was added by means of digital convolution. The experimental results of the suggested method were compared with a similarity experiment using the method of triadic comparisons. The results of both methods were significantly correlated with each other.
Collapse
Affiliation(s)
- Alejandro Osses Vecchi
- Human-Technology Interaction group, Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, 5600MB Eindhoven, The Netherlands
| | - Armin Kohlrausch
- Human-Technology Interaction group, Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, 5600MB Eindhoven, The Netherlands
| | - Antoine Chaigne
- Institute of Music Acoustics, University of Music and Performing Arts, Vienna, Austria
| |
Collapse
|
27
|
Burns T, Rajan R. A Mathematical Approach to Correlating Objective Spectro-Temporal Features of Non-linguistic Sounds With Their Subjective Perceptions in Humans. Front Neurosci 2019; 13:794. [PMID: 31417350 PMCID: PMC6685481 DOI: 10.3389/fnins.2019.00794] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/16/2019] [Indexed: 11/13/2022] Open
Abstract
Non-linguistic sounds (NLSs) are a core feature of our everyday life and many evoke powerful cognitive and emotional outcomes. The subjective perception of NLSs by humans has occasionally been defined for single percepts, e.g., their pleasantness, whereas many NLSs evoke multiple perceptions. There has also been very limited attempt to determine if NLS perceptions are predicted from objective spectro-temporal features. We therefore examined three human perceptions well-established in previous NLS studies ("Complexity," "Pleasantness," and "Familiarity"), and the accuracy of identification, for a large NLS database and related these four measures to objective spectro-temporal NLS features, defined using rigorous mathematical descriptors including stimulus entropic and algorithmic complexity measures, peaks-related measures, fractal dimension estimates, and various spectral measures (mean spectral centroid, power in discrete frequency ranges, harmonicity, spectral flatness, and spectral structure). We mapped the perceptions to the spectro-temporal measures individually and in combinations, using complex multivariate analyses including principal component analyses and agglomerative hierarchical clustering.
Collapse
Affiliation(s)
| | - Ramesh Rajan
- Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
28
|
Mehrabi A, Dixon S, Sandler M. Vocal imitation of percussion sounds: On the perceptual similarity between imitations and imitated sounds. PLoS One 2019; 14:e0219955. [PMID: 31344080 PMCID: PMC6657857 DOI: 10.1371/journal.pone.0219955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 07/06/2019] [Indexed: 11/19/2022] Open
Abstract
Recent studies have demonstrated the effectiveness of the voice for communicating sonic ideas, and the accuracy with which it can be used to imitate acoustic instruments, synthesised sounds and environmental sounds. However, there has been little research on vocal imitation of percussion sounds, particularly concerning the perceptual similarity between imitations and the sounds being imitated. In the present study we address this by investigating how accurately musicians can vocally imitate percussion sounds, in terms of whether listeners consider the imitations 'more similar' to the imitated sounds than to other same-category sounds. In a vocal production task, 14 musicians imitated 30 drum sounds from five categories (cymbals, hats, kicks, snares, toms). Listeners were then asked to rate the similarity between the imitations and same-category drum sounds via web based listening test. We found that imitated sounds received the highest similarity ratings for 16 of the 30 sounds. The similarity between a given drum sound and its imitation was generally rated higher than for imitations of another same-category sound, however for some drum categories (snares and toms) certain sounds were consistently considered most similar to the imitations, irrespective of the sound being imitated. Finally, we apply an existing auditory image based measure for perceptual similarity between same-category drum sounds, to model the similarity ratings using linear mixed effect regression. The results indicate that this measure is a good predictor of perceptual similarity between imitations and imitated sounds, when compared to acoustic features containing only temporal or spectral features.
Collapse
Affiliation(s)
- Adib Mehrabi
- Department of Linguistics, Queen Mary University of London, London, England
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, England
| | - Simon Dixon
- Department of Linguistics, Queen Mary University of London, London, England
| | - Mark Sandler
- Department of Linguistics, Queen Mary University of London, London, England
| |
Collapse
|
29
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
30
|
Lim HP, Sanderson P. A comparison of two designs for earcons conveying pulse oximetry information. APPLIED ERGONOMICS 2019; 78:110-119. [PMID: 31046941 DOI: 10.1016/j.apergo.2019.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/20/2018] [Accepted: 01/27/2019] [Indexed: 06/09/2023]
Abstract
We performed a randomised controlled trial comparing two kinds of earcons that could provide intermittent pulse oximetry information about a patient's oxygen saturation (SpO2) and heart rate (HR). Timbre-earcons represented SpO2 levels with different levels of timbre, and pitch-earcons with different levels of pitch. Both kinds of earcons represented HR with tremolo. Participants using pitch-earcons identified SpO2 levels alone, and both SpO2 plus HR levels, significantly better than participants using timbre-earcons: p < .001 in both cases. However, there was no difference between earcon conditions in how effectively HR was identified, p = .422. For both kinds of earcons, identification of SpO2 levels was more compromised by simultaneous changes in HR than identification of HR levels was compromised by simultaneous changes in SpO2, suggesting asymmetric integrality. Overall, pitch-earcons may provide a better intermittent auditory pulse oximetry display than timbre-earcons, especially for clinical contexts when quiet is needed.
Collapse
Affiliation(s)
- Hai-Ping Lim
- School of Psychology, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Penelope Sanderson
- School of Psychology, The University of Queensland, St Lucia, QLD, 4072, Australia; School of Information Technology and Electrical Engineering, The University of Queensland, St Lucia, QLD, Australia; School of Clinical Medicine, The University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
31
|
Giraldo S, Waddell G, Nou I, Ortega A, Mayor O, Perez A, Williamon A, Ramirez R. Automatic Assessment of Tone Quality in Violin Music Performance. Front Psychol 2019; 10:334. [PMID: 30930804 PMCID: PMC6427949 DOI: 10.3389/fpsyg.2019.00334] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 02/04/2019] [Indexed: 11/13/2022] Open
Abstract
The automatic assessment of music performance has become an area of increasing interest due to the growing number of technology-enhanced music learning systems. In most of these systems, the assessment of musical performance is based on pitch and onset accuracy, but very few pay attention to other important aspects of performance, such as sound quality or timbre. This is particularly true in violin education, where the quality of timbre plays a significant role in the assessment of musical performances. However, obtaining quantifiable criteria for the assessment of timbre quality is challenging, as it relies on consensus among the subjective interpretations of experts. We present an approach to assess the quality of timbre in violin performances using machine learning techniques. We collected audio recordings of several tone qualities and performed perceptual tests to find correlations among different timbre dimensions. We processed the audio recordings to extract acoustic features for training tone-quality models. Correlations among the extracted features were analyzed and feature information for discriminating different timbre qualities were investigated. A real-time feedback system designed for pedagogical use was implemented in which users can train their own timbre models to assess and receive feedback on their performances.
Collapse
Affiliation(s)
- Sergio Giraldo
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - George Waddell
- Centre for Performance Science, Royal College of Music, London, United Kingdom.,Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Ignasi Nou
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Ariadna Ortega
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Oscar Mayor
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Alfonso Perez
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| | - Aaron Williamon
- Centre for Performance Science, Royal College of Music, London, United Kingdom.,Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Rafael Ramirez
- Music Technology Group, Music and Machine Learning Lab, Department of Communications and Technology, Pompeu Fabra University, Barcelona, Spain
| |
Collapse
|
32
|
Cortical Correlates of Attention to Auditory Features. J Neurosci 2019; 39:3292-3300. [PMID: 30804086 DOI: 10.1523/jneurosci.0588-18.2019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 02/12/2019] [Accepted: 02/13/2019] [Indexed: 11/21/2022] Open
Abstract
Pitch and timbre are two primary features of auditory perception that are generally considered independent. However, an increase in pitch (produced by a change in fundamental frequency) can be confused with an increase in brightness (an attribute of timbre related to spectral centroid) and vice versa. Previous work indicates that pitch and timbre are processed in overlapping regions of the auditory cortex, but are separable to some extent via multivoxel pattern analysis. Here, we tested whether attention to one or other feature increases the spatial separation of their cortical representations and if attention can enhance the cortical representation of these features in the absence of any physical change in the stimulus. Ten human subjects (four female, six male) listened to pairs of tone triplets varying in pitch, timbre, or both and judged which tone triplet had the higher pitch or brighter timbre. Variations in each feature engaged common auditory regions with no clear distinctions at a univariate level. Attending to one did not improve the separability of the neural representations of pitch and timbre at the univariate level. At the multivariate level, the classifier performed above chance in distinguishing between conditions in which pitch or timbre was discriminated. The results confirm that the computations underlying pitch and timbre perception are subserved by strongly overlapping cortical regions, but reveal that attention to one or other feature leads to distinguishable activation patterns even in the absence of physical differences in the stimuli.SIGNIFICANCE STATEMENT Although pitch and timbre are generally thought of as independent auditory features of a sound, pitch height and timbral brightness can be confused for one another. This study shows that pitch and timbre variations are represented in overlapping regions of auditory cortex, but that they produce distinguishable patterns of activation. Most importantly, the patterns of activation can be distinguished based on whether subjects attended to pitch or timbre even when the stimuli remained physically identical. The results therefore show that variations in pitch and timbre are represented by overlapping neural networks, but that attention to different features of the same sound can lead to distinguishable patterns of activation.
Collapse
|
33
|
Siedenburg K. Specifying the perceptual relevance of onset transients for musical instrument identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1078. [PMID: 30823780 DOI: 10.1121/1.5091778] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 02/04/2019] [Indexed: 06/09/2023]
Abstract
Sound onsets are commonly considered to play a privileged role in the identification of musical instruments, but the underlying acoustic features remain unclear. By using sounds resynthesized with and without rapidly varying transients (not to be confused with the onset as a whole), this study set out to specify precisely the role of transients and quasi-stationary components in the perception of musical instrument sounds. In experiment 1, listeners were trained to identify ten instruments from 250 ms sounds. In a subsequent test phase, listeners identified instruments from 64 ms segments of sounds presented with or without transient components, either taken from the onset, or from the middle portion of the sounds. The omission of transient components at the onset impaired overall identification accuracy only by 6%, even though experiment 2 suggested that their omission was discriminable. Shifting the position of the gate from the onset to the middle portion of the tone impaired overall identification accuracy by 25%. Taken together, these findings confirm the prominent status of onsets in musical instrument identification, but suggest that rapidly varying transients are less indicative of instrument identity compared to the relatively slow buildup of sinusoidal components during onsets.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
34
|
Interaction Between Pitch and Timbre Perception in Normal-Hearing Listeners and Cochlear Implant Users. J Assoc Res Otolaryngol 2018; 20:57-72. [PMID: 30377852 DOI: 10.1007/s10162-018-00701-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 10/07/2018] [Indexed: 10/28/2022] Open
Abstract
Despite their mutually exclusive definitions, pitch and timbre perception interact with each other in normal-hearing (NH) listeners. Cochlear implant (CI) users have worse than normal pitch and timbre perception. However, the pitch-timbre interaction with CIs is not well understood. This study tested the interaction between pitch and sharpness (an aspect of timbre) perception related to the fundamental frequency (F0) and spectral slope of harmonic complex tones, respectively, in both NH listeners and CI users. In experiment 1, the F0 (and spectral slope) difference limens (DLs) were measured with a fixed spectral slope (and F0) and 20-dB amplitude roving. Then, the F0 and spectral slope were varied congruently or incongruently by the same multiple of individual DLs to assess the pitch and sharpness ranking sensitivity. Both NH and CI subjects had significantly higher pitch and sharpness ranking sensitivity with congruent than with incongruent F0 and spectral slope variations, and showed a similar symmetric interaction between pitch and timbre perception. In experiment 2, CI users' melodic contour identification (MCI) was tested in three spectral slope (no, congruent, and incongruent spectral slope variations by the same multiple of individual DLs as the F0 variations) and two amplitude conditions (0- and 20-dB amplitude roving). When there was no amplitude roving, the MCI scores were significantly higher with congruent than with no, and in turn than with incongruent spectral slope variations. The 20-dB amplitude roving significantly reduced the overall MCI scores and the effect of spectral slope variations. These results reflected a confusion between higher (or lower) pitch and sharper (or duller) timbre and offered important implications for understanding and enhancing pitch and timbre perception with CIs.
Collapse
|
35
|
Li H, Chen K, Wang X, Gao Y, Yu W. A perceptual dissimilarities based nonlinear sound quality model for range hood noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2300. [PMID: 30404500 DOI: 10.1121/1.5064280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 09/27/2018] [Indexed: 06/08/2023]
Abstract
The application of sound quality in household appliances has gradually increased in recent years. In addition to modeling algorithms, appropriate acoustic metrics that characterize product sounds also play an important role in developing models. In this study, an artificial neural network based sound quality model for range hood noise was established with the combination of prior metric selection by multidimensional scaling (MDS) analysis of perceptual dissimilarities. First, sounds in different environments, speeds, and positions were recorded, and their annoyance was evaluated by grouped anchor semantic differential subjective jury testing. Then, the timbre space underlying dissimilarity judgments were analyzed by CLASCAL, an accurate MDS algorithm. Each dimension of the space was well explained by some metrics through stepwise regression. Finally, a sound quality model was established based on a back propagation neural network (BPNN). Results show that the combination of BPNN and CLASCAL can address the interpretation of the sound quality model and the ability to model nonlinearity for high accuracy. In addition, the application of noise control on range hoods showed that passive and active noise control (ANC) measures improve sound quality, especially ANC systems.
Collapse
Affiliation(s)
- Han Li
- Department of Environmental Engineering, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, People's Republic of China
| | - Kean Chen
- Department of Environmental Engineering, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, People's Republic of China
| | - Xue Wang
- Department of Environmental Engineering, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, People's Republic of China
| | - Yan Gao
- Department of Environmental Engineering, School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, People's Republic of China
| | - Weiwei Yu
- Hangzhou Robam Appliances Company, Limited, Hangzhou 311100, People's Republic of China
| |
Collapse
|
36
|
Sims CR. Efficient coding explains the universal law of generalization in human perception. Science 2018; 360:652-656. [PMID: 29748284 DOI: 10.1126/science.aaq1118] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 04/03/2018] [Indexed: 11/02/2022]
Abstract
Perceptual generalization and discrimination are fundamental cognitive abilities. For example, if a bird eats a poisonous butterfly, it will learn to avoid preying on that species again by generalizing its past experience to new perceptual stimuli. In cognitive science, the "universal law of generalization" seeks to explain this ability and states that generalization between stimuli will follow an exponential function of their distance in "psychological space." Here, I challenge existing theoretical explanations for the universal law and offer an alternative account based on the principle of efficient coding. I show that the universal law emerges inevitably from any information processing system (whether biological or artificial) that minimizes the cost of perceptual error subject to constraints on the ability to process or transmit information.
Collapse
Affiliation(s)
- Chris R Sims
- Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
| |
Collapse
|
37
|
Tużnik P, Augustynowicz P, Francuz P. Electrophysiological correlates of timbre imagery and perception. Int J Psychophysiol 2018; 129:9-17. [DOI: 10.1016/j.ijpsycho.2018.05.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 05/07/2018] [Accepted: 05/10/2018] [Indexed: 11/28/2022]
|
38
|
Müller V, Klünter H, Fürstenberg D, Meister H, Walger M, Lang-Roth R. Examination of Prosody and Timbre Perception in Adults With Cochlear Implants Comparing Different Fine Structure Coding Strategies. Am J Audiol 2018. [PMID: 29536106 DOI: 10.1044/2017_aja-17-0046] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE This study aimed to investigate whether adults with cochlear implants benefit from a change of fine structure (FS) coding strategies regarding the discrimination of prosodic speech cues, timbre cues, and the identification of natural instruments. The FS processing (FSP) coding strategy was compared to 2 settings of the FS4 strategy. METHOD A longitudinal crossover, double-blinded study was conducted. This study consisted of 2 parts, with 14 participants in the first part and 12 participants in the second part. Each part lasted 3 months, in which participants were alternately fitted with either the established FSP strategy or 1 of the 2 newly developed FS4 settings. Participants had to complete an intonation identification test; a timbre discrimination test in which 1 of 2 isolated cues changed, either the spectral centroid or the spectral irregularity; and an instrument identification test. RESULTS A significant effect was seen in the discrimination of spectral irregularity with 1 of the 2 FS4 settings. The improvement was seen in the FS4 setting in which the upper envelope channels had a low stimulation rate. This improvement was not seen with the FS4 setting that had a higher stimulation rate on the envelope channels. CONCLUSIONS In general, the FSP strategy and the 2 settings of the FS4 strategy provided similar levels in the perception of prosody and timbre cues, as well as in the identification of instruments.
Collapse
Affiliation(s)
- Verena Müller
- Clinic of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Centre, University of Cologne, Germany
| | - Heinz Klünter
- Clinic of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Centre, University of Cologne, Germany
| | - Dirk Fürstenberg
- Clinic of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Centre, University of Cologne, Germany
| | - Hartmut Meister
- Jean Uhrmacher Institute for Clinical ENT-Research, University of Cologne, Germany
| | - Martin Walger
- Clinic of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Centre, University of Cologne, Germany
- Jean Uhrmacher Institute for Clinical ENT-Research, University of Cologne, Germany
| | - Ruth Lang-Roth
- Clinic of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Centre, University of Cologne, Germany
| |
Collapse
|
39
|
Nie Y, Galvin JJ, Morikawa M, André V, Wheeler H, Fu QJ. Music and Speech Perception in Children Using Sung Speech. Trends Hear 2018; 22:2331216518766810. [PMID: 29609496 PMCID: PMC5888806 DOI: 10.1177/2331216518766810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Collapse
Affiliation(s)
- Yingjiu Nie
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | | | - Michael Morikawa
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | - Victoria André
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | - Harley Wheeler
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | - Qian-Jie Fu
- 3 Department of Head and Neck Surgery, University of California-Los Angeles, CA, USA
| |
Collapse
|
40
|
Incongruent pitch cues are associated with increased activation and functional connectivity in the frontal areas. Sci Rep 2018; 8:5206. [PMID: 29581445 PMCID: PMC5980092 DOI: 10.1038/s41598-018-23287-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 03/08/2018] [Indexed: 12/03/2022] Open
Abstract
Pitch plays a crucial role in music and speech perception. Pitch perception is characterized by multiple perceptual dimensions, such as pitch height and chroma. Information provided by auditory signals that are related to these perceptual dimensions can be either congruent or incongruent. To create conflicting cues for pitch perception, we modified Shepard tones by varying the pitch height and pitch chroma dimensions in either the same or opposite directions. Our behavioral data showed that most listeners judged pitch changes based on pitch chroma, instead of pitch height, when incongruent information was provided. The reliance on pitch chroma resulted in a stable percept of upward or downward pitch shift, rather than alternating between two different percepts. Across the incongruent and congruent conditions, consistent activation was found in the bilateral superior temporal and inferior frontal areas. In addition, significantly stronger activation was observed in the inferior frontal areas during the incongruent compared to congruent conditions. Enhanced functional connectivity was found between the left temporal and bilateral frontal areas in the incongruent than congruent conditions. Increased intra-hemispheric and inter-hemispheric connectivity was also observed in the frontal areas. Our results suggest the involvement of the frontal lobe in top-down and bottom-up processes to generate a stable percept of pitch change with conflicting perceptual cues.
Collapse
|
41
|
Hamilton-Fletcher G, Wright TD, Ward J. Cross-Modal Correspondences Enhance Performance on a Colour-to-Sound Sensory Substitution Device. Multisens Res 2018; 29:337-63. [PMID: 29384607 DOI: 10.1163/22134808-00002519] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Visual sensory substitution devices (SSDs) can represent visual characteristics through distinct patterns of sound, allowing a visually impaired user access to visual information. Previous SSDs have avoided colour and when they do encode colour, have assigned sounds to colour in a largely unprincipled way. This study introduces a new tablet-based SSD termed the ‘Creole’ (so called because it combines tactile scanning with image sonification) and a new algorithm for converting colour to sound that is based on established cross-modal correspondences (intuitive mappings between different sensory dimensions). To test the utility of correspondences, we examined the colour–sound associative memory and object recognition abilities of sighted users who had their device either coded in line with or opposite to sound–colour correspondences. Improved colour memory and reduced colour-errors were made by users who had the correspondence-based mappings. Interestingly, the colour–sound mappings that provided the highest improvements during the associative memory task also saw the greatest gains for recognising realistic objects that also featured these colours, indicating a transfer of abilities from memory to recognition. These users were also marginally better at matching sounds to images varying in luminance, even though luminance was coded identically across the different versions of the device. These findings are discussed with relevance for both colour and correspondences for sensory substitution use.
Collapse
|
42
|
Abstract
The effects of subunit formation on adult listeners’ ability to notice changes in a continuous spectral gradient of sound were studied. Results of this experiment support the idea that the auditory system processes information differently within a unit, and that this processing does not occur unless the perceptual system detects unit boundaries. In this experiment, silences were inserted into a continuously changing sound to cause the formation of short units. Listeners noticed the change earlier in conditions with silences inserted than in to conditions where the transition was either unbroken or broken by loud noise bursts. Results are discussed in terms of two processes, one that accentuates stimulus properties present at moments of onset and offset, and a second that uses onsets and offsets to signal the beginnings and ends of units and reduces the change perceived within units.
Collapse
|
43
|
Temporal and spectral contributions to musical instrument identification and discrimination among cochlear implant users. World J Otorhinolaryngol Head Neck Surg 2017; 2:148-156. [PMID: 29204560 PMCID: PMC5698532 DOI: 10.1016/j.wjorl.2016.09.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 08/16/2016] [Accepted: 09/05/2016] [Indexed: 11/20/2022] Open
Abstract
Objective To investigate the contributions of envelope and fine-structure to the perception of timbre by cochlear implant (CI) users as compared to normal hearing (NH) listeners. Methods This was a prospective cohort comparison study. Normal hearing and cochlear implant patients were tested. Three experiments were performed in sound field using musical notes altered to affect the characteristic pitch of an instrument and the acoustic envelope. Experiment 1 assessed the ability to identify the instrument playing each note, while experiments 2 and 3 assessed the ability to discriminate the different stimuli. Results Normal hearing subjects performed better than CI subjects in all instrument identification tasks, reaching statistical significance for 4 of 5 stimulus conditions. Within the CI population, acoustic envelope modifications did not significantly affect instrument identification or discrimination. With envelope and pitch cues removed, fine structure discrimination performance was similar between normal hearing and CI users for the majority of conditions, but some specific instrument comparisons were significantly more challenging for CI users. Conclusions Cochlear implant users perform significantly worse than normal hearing listeners on tasks of instrument identification. However, cochlear implant listeners can discriminate differences in envelope and some fine structure components of musical instrument sounds as well as normal hearing listeners. The results indicated that certain fine structure cues are important for cochlear implant users to make discrimination judgments, and therefore may affect interpretation toward associating with a specific instrument for identification.
Collapse
|
44
|
Encoding of natural timbre dimensions in human auditory cortex. Neuroimage 2017; 166:60-70. [PMID: 29080711 DOI: 10.1016/j.neuroimage.2017.10.050] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 10/19/2017] [Accepted: 10/24/2017] [Indexed: 11/22/2022] Open
Abstract
Timbre, or sound quality, is a crucial but poorly understood dimension of auditory perception that is important in describing speech, music, and environmental sounds. The present study investigates the cortical representation of different timbral dimensions. Encoding models have typically incorporated the physical characteristics of sounds as features when attempting to understand their neural representation with functional MRI. Here we test an encoding model that is based on five subjectively derived dimensions of timbre to predict cortical responses to natural orchestral sounds. Results show that this timbre model can outperform other models based on spectral characteristics, and can perform as well as a complex joint spectrotemporal modulation model. In cortical regions at the medial border of Heschl's gyrus, bilaterally, and regions at its posterior adjacency in the right hemisphere, the timbre model outperforms even the complex joint spectrotemporal modulation model. These findings suggest that the responses of cortical neuronal populations in auditory cortex may reflect the encoding of perceptual timbre dimensions.
Collapse
|
45
|
Isaac AMC. Hubris to humility: Tonal volume and the fundamentality of psychophysical quantities. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE 2017; 65-66:99-111. [PMID: 29195654 DOI: 10.1016/j.shpsa.2017.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 06/01/2016] [Accepted: 06/20/2017] [Indexed: 06/07/2023]
Abstract
Psychophysics measures the attributes of perceptual experience. The question of whether some of these attributes should be interpreted as more fundamental, or "real," than others has been answered differently throughout its history. The operationism of Stevens and Boring answers "no," reacting to the perceived vacuity of earlier debates about fundamentality. The subsequent rise of multidimensional scaling (MDS) implicitly answers "yes" in its insistence that psychophysical data be represented in spaces of low dimensionality. I argue the return of fundamentality follows from a trend toward increasing epistemic humility. Operationism exhibited a kind of hubris in the constitutive role it assigned to the experimenter's presuppositions that is abandoned by the algorithmic methods of MDS. This broad epistemic trend is illustrated by following the trajectory of research on a particular candidate attribute: tonal volume.
Collapse
Affiliation(s)
- Alistair M C Isaac
- University of Edinburgh, 3 Charles Street, Edinburgh EH8 9AD, United Kingdom.
| |
Collapse
|
46
|
Abstract
Brightness is an attribute often used by musicians when describing timbral characteristics. It is related to the spectral distribution of energy, as is sharpness, studied by Zwicker (Psychoacoustics: Facts and Models, 1990). In the current work, subjects adjusted the spectral slope and thus the spectral centroid (SC) of one of a pair of sounds to make it twice as bright as the other, so as to build a perceptual scale. The ratio of SC required to double brightness is a little less than 2 and decreases as the SC of the tones increases. For these tones, the ratio of brightness is statistically different from the ratio of sharpness calculated from published models.
Collapse
|
47
|
Paté A, Le Carrou JL, Givois A, Roy A. Influence of plectrum shape and jack velocity on the sound of the harpsichord: An experimental study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:1523. [PMID: 28372146 DOI: 10.1121/1.4976955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A controversial discussion in the musical community regards the ability of the harpsichord to produce sound level or timbre changes. The jack velocity (controlled in real time within a musical context) and the plectrum shape (modified by the musician or maker prior to the performance) appear to be the two control parameters at the disposal of the harpsichord makers and players for shaping the sound. This article initiates the acoustical study of the control parameters of the harpsichord, presenting a framework for the investigation of these two parameters with means of experimental mechanics measurement. A robotic finger is used for producing repeatable plucks with various jack velocities and plectrum shapes. The plectrum bending, vibrating string's initial conditions, and radiated sound are recorded and analysed. First, results are obtained from measurements carried out on one string, for four plectrum shapes and four jack velocities. The plectrum shape has been found to have an influence on its bending behavior when interacting with the string; on the string's initial conditions (position and velocity); and on the resulting sound (sound level, spectral centroid, and decay time). The jack velocity does not have an influence on any of the measured quantities.
Collapse
Affiliation(s)
- Arthur Paté
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 05, France
| | - Jean-Loïc Le Carrou
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 05, France
| | - Arthur Givois
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 05, France
| | - Alexandre Roy
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 05, France
| |
Collapse
|
48
|
Rozé J, Aramaki M, Kronland-Martinet R, Ystad S. Exploring the perceived harshness of cello sounds by morphing and synthesis techniques. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2121. [PMID: 28372142 DOI: 10.1121/1.4978522] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Cello bowing requires a very fine control of the musicians' gestures to ensure the quality of the perceived sound. When the interaction between the bow hair and the string is optimal, the sound is perceived as broad and round. On the other hand, when the gestural control becomes more approximate, the sound quality deteriorates and often becomes harsh, shrill, and quavering. In this study, such a timbre degradation, often described by French cellists as harshness (décharnement), is investigated from both signal and perceptual perspectives. Harsh sounds were obtained from experienced cellists subjected to a postural constraint. A signal approach based on Gabor masks enabled us to capture the main dissimilarities between round and harsh sounds. Two complementary methods perceptually validated these signal features: First, a predictive regression model of the perceived harshness was built from sound continua obtained by a morphing technique. Next, the signal structures identified by the model were validated within a perceptual timbre space, obtained by multidimensional scaling analysis on pairs of synthesized stimuli controlled in harshness. The results revealed that the perceived harshness was due to a combination between a more chaotic harmonic behavior, a formantic emergence, and a weaker attack slope.
Collapse
Affiliation(s)
- Jocelyn Rozé
- Aix Marseille Univ, CNRS, PRISM (Perception, Representation, Image, Sound, Music), 31 Chemin J. Aiguier, 13402 Marseille Cedex 20, France
| | - Mitsuko Aramaki
- Aix Marseille Univ, CNRS, PRISM (Perception, Representation, Image, Sound, Music), 31 Chemin J. Aiguier, 13402 Marseille Cedex 20, France
| | - Richard Kronland-Martinet
- Aix Marseille Univ, CNRS, PRISM (Perception, Representation, Image, Sound, Music), 31 Chemin J. Aiguier, 13402 Marseille Cedex 20, France
| | - Sølvi Ystad
- Aix Marseille Univ, CNRS, PRISM (Perception, Representation, Image, Sound, Music), 31 Chemin J. Aiguier, 13402 Marseille Cedex 20, France
| |
Collapse
|
49
|
Paté A, Boschi L, Dubois D, Le Carrou JL, Holtzman B. Auditory display of seismic data: On the use of experts' categorizations and verbal descriptions as heuristics for geoscience. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2143. [PMID: 28372076 DOI: 10.1121/1.4978441] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Auditory display can complement visual representations in order to better interpret scientific data. A previous article showed that the free categorization of "audified seismic signals" operated by listeners can be explained by various geophysical parameters. The present article confirms this result and shows that cognitive representations of listeners can be used as heuristics for the characterization of seismic signals. Free sorting tests are conducted with audified seismic signals, with the earthquake/seismometer relative location, playback audification speed, and earthquake magnitude as controlled variables. The analysis is built on partitions (categories) and verbal comments (categorization criteria). Participants from different backgrounds (acousticians or geoscientists) are contrasted in order to investigate the role of the participants' expertise. Sounds resulting from different earthquake/station distances or azimuths, crustal structure and topography along the path of the seismic wave, earthquake magnitude, are found to (a) be sorted into different categories, (b) elicit different verbal descriptions mainly focused on the perceived number of events, frequency content, and background noise level. Building on these perceptual results, acoustic descriptors are computed and geophysical interpretations are proposed in order to match the verbal descriptions. Another result is the robustness of the categories with respect to the audification speed factor.
Collapse
Affiliation(s)
- Arthur Paté
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 5, France
| | - Lapo Boschi
- Institut des Sciences de la Terre de Paris, Sorbonne Universités, University Pierre and Marie Curie Univ Paris 06, CNRS, Unité Mixte de Recherche 7193, F-75005 Paris, France
| | - Danièle Dubois
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 5, France
| | - Jean-Loïc Le Carrou
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, LAM/Institut d'Alembert, 4 place Jussieu, 75252 Paris Cedex 5, France
| | - Benjamin Holtzman
- Lamont Doherty Earth Observatory, Columbia University, Palisades, New York 10964, USA
| |
Collapse
|
50
|
Grossberg S. Towards solving the hard problem of consciousness: The varieties of brain resonances and the conscious experiences that they support. Neural Netw 2016; 87:38-95. [PMID: 28088645 DOI: 10.1016/j.neunet.2016.11.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/21/2016] [Accepted: 11/20/2016] [Indexed: 10/20/2022]
Abstract
The hard problem of consciousness is the problem of explaining how we experience qualia or phenomenal experiences, such as seeing, hearing, and feeling, and knowing what they are. To solve this problem, a theory of consciousness needs to link brain to mind by modeling how emergent properties of several brain mechanisms interacting together embody detailed properties of individual conscious psychological experiences. This article summarizes evidence that Adaptive Resonance Theory, or ART, accomplishes this goal. ART is a cognitive and neural theory of how advanced brains autonomously learn to attend, recognize, and predict objects and events in a changing world. ART has predicted that "all conscious states are resonant states" as part of its specification of mechanistic links between processes of consciousness, learning, expectation, attention, resonance, and synchrony. It hereby provides functional and mechanistic explanations of data ranging from individual spikes and their synchronization to the dynamics of conscious perceptual, cognitive, and cognitive-emotional experiences. ART has reached sufficient maturity to begin classifying the brain resonances that support conscious experiences of seeing, hearing, feeling, and knowing. Psychological and neurobiological data in both normal individuals and clinical patients are clarified by this classification. This analysis also explains why not all resonances become conscious, and why not all brain dynamics are resonant. The global organization of the brain into computationally complementary cortical processing streams (complementary computing), and the organization of the cerebral cortex into characteristic layers of cells (laminar computing), figure prominently in these explanations of conscious and unconscious processes. Alternative models of consciousness are also discussed.
Collapse
Affiliation(s)
- Stephen Grossberg
- Center for Adaptive Systems, Boston University, 677 Beacon Street, Boston, MA 02215, USA; Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering Boston University, 677 Beacon Street, Boston, MA 02215, USA.
| |
Collapse
|