1
|
Wycisk Y, Sander K, Kopiez R, Platz F, Preihs S, Peissig J. Wrapped into sound: Development of the Immersive Music Experience Inventory (IMEI). Front Psychol 2022; 13:951161. [PMID: 36186277 PMCID: PMC9524455 DOI: 10.3389/fpsyg.2022.951161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 07/21/2022] [Indexed: 11/15/2022] Open
Abstract
Although virtual reality, video entertainment, and computer games are dependent on the three-dimensional reproduction of sound (including front, rear, and height channels), it remains unclear whether 3D-audio formats actually intensify the emotional listening experience. There is currently no valid inventory for the objective measurement of immersive listening experiences resulting from audio playback formats with increasing degrees of immersion (from mono to stereo, 5.1, and 3D). The development of the Immersive Music Experience Inventory (IMEI) could close this gap. An initial item list (N = 25) was derived from studies in virtual reality and spatial audio, supplemented by researcher-developed items and items extracted from historical descriptions. Psychometric evaluation was conducted by an online study (N = 222 valid cases). The N = 222 Participants (female = 112, mean age = 38.6) were recruited via mailing lists (n = 34) and via a panel provider (n = 188). Based on controlled headphone playback, participants listened to four songs/pieces, each in the three formats of mono, stereo, and binaural 3D audio. The latent construct “immersive listening experience” was determined by probabilistic test theory (item response theory, IRT) and by means of the many-facet Rasch measurement (MFRM). As a result, the specified MFRM model showed good model fit (62.69% of explained variance). The final one-dimensional inventory consists of 10 items and will be made available in English and German.
Collapse
Affiliation(s)
- Yves Wycisk
- Institute for Musicology, Hanover University of Music, Drama, and Media, Hanover, Germany
| | - Kilian Sander
- Institute for Musicology, Hanover University of Music, Drama, and Media, Hanover, Germany
| | - Reinhard Kopiez
- Institute for Musicology, Hanover University of Music, Drama, and Media, Hanover, Germany
- *Correspondence: Reinhard Kopiez,
| | - Friedrich Platz
- Institute for Musicology, Music Pedagogy and Aesthetic, State University of Music and Performing Arts Stuttgart, Stuttgart, Germany
| | - Stephan Preihs
- Institute of Communication Technology, Leibniz University Hanover, Hanover, Germany
| | - Jürgen Peissig
- Institute of Communication Technology, Leibniz University Hanover, Hanover, Germany
| |
Collapse
|
2
|
Robotham T, Rummukainen OS, Kurz M, Eckert M, Habets EAP. Comparing Direct and Indirect Methods of Audio Quality Evaluation in Virtual Reality Scenes of Varying Complexity. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2091-2101. [PMID: 35167464 DOI: 10.1109/tvcg.2022.3150491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Many quality evaluation methods are used to assess uni-modal audio or video content without considering perceptual, cognitive, and interactive aspects present in virtual reality (VR) settings. Consequently, little is known regarding the repercussions of the employed evaluation method, content, and subject behavior on the quality ratings in VR. This mixed between- and within-subjects study uses four subjective audio quality evaluation methods (viz. multiple-stimulus with and without reference for direct scaling, and rank-order elimination and pairwise comparison for indirect scaling) to investigate the contributing factors present in multi-modal 6-DoF VR on quality ratings of real-time audio rendering. For each between-subjects employed method, two sets of conditions in five VR scenes were evaluated within-subjects. The conditions targeted relevant attributes for binaural audio reproduction using scenes with various amounts of user interactivity. Our results show all referenceless methods produce similar results using both condition sets. However, rank-order elimination proved to be the fastest method, required the least amount of repetitive motion, and yielded the highest discrimination between spatial conditions. Scene complexity was found to be a main effect within results, with behavioral and task load index results implying more complex scenes and interactive aspects of 6-DoF VR can impede quality judgments.
Collapse
|
3
|
Ifergan I, Rafaely B. On the selection of the number of beamformers in beamforming-based binaural reproduction. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2022; 2022:6. [PMID: 35371191 PMCID: PMC8965231 DOI: 10.1186/s13636-022-00238-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 02/14/2022] [Indexed: 06/14/2023]
Abstract
In recent years, spatial audio reproduction has been widely researched with many studies focusing on headphone-based spatial reproduction. A popular format for spatial audio is higher order Ambisonics (HOA), where a spherical microphone array is typically used to obtain the HOA signals. When a spherical array is not available, beamforming-based binaural reproduction (BFBR) can be used, where signals are captured with arrays of a general configuration. While shown to be useful, no comprehensive studies of BFBR have been presented and so its limitations and other design aspects are not well understood. This paper takes an initial step towards developing a theory for BFBR and develops guidelines for selecting the number of beamformers. In particular, the average directivity factor of the microphone array is proposed as a measure for supporting this selection. The effect of head-related transfer function (HRTF) order truncation that occurs when using too many beamformer directions is presented and studied. In addition, the relation between HOA-based binaural reproduction and BFBR is discussed through analysis based on a spherical array. A simulation study is then presented, based on both a spherical and a planar array, demonstrating the proposed guidelines. A listening test verifies the perceptual attributes of the methods presented in this study. These results can be used for more informed beamformer design for BFBR.
Collapse
Affiliation(s)
- Itay Ifergan
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, Israel
| | - Boaz Rafaely
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be’er Sheva, Israel
| |
Collapse
|
4
|
Pelzer R, Dinakaran M, Brinkmann F, Lepa S, Grosche P, Weinzierl S. Head-related transfer function recommendation based on perceptual similarities and anthropometric features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3809. [PMID: 33379931 DOI: 10.1121/10.0002884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 11/18/2020] [Indexed: 06/12/2023]
Abstract
Individualization of head-related transfer functions (HRTFs) can improve the quality of binaural applications with respect to the localization accuracy, coloration, and other aspects. Using anthropometric features (AFs) of the head, neck, and pinna for individualization is a promising approach to avoid elaborate acoustic measurements or numerical simulations. Previous studies on HRTF individualization analyzed the link between AFs and technical HRTF features. However, the perceptual relevance of specific errors might not always be clear. Hence, the effects of AFs on perceived perceptual qualities with respect to the overall difference, coloration, and localization error are directly explored. To this end, a listening test was conducted in which subjects rated differences between their own HRTF and a set of nonindividual HRTFs. Based on these data, a machine learning model was developed to predict the perceived differences using ratios of a subject's individual AFs and those of presented nonindividual AFs. Results show that perceived differences can be predicted well and the HRTFs recommended by the models provide a clear improvement over generic or randomly selected HRTFs. In addition, the most relevant AFs for the prediction of each type of error were determined. The developed models are available under a free cultural license.
Collapse
Affiliation(s)
- Robert Pelzer
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587, Germany
| | - Manoj Dinakaran
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587, Germany
| | - Fabian Brinkmann
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587, Germany
| | - Steffen Lepa
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587, Germany
| | - Peter Grosche
- Huawei Technologies, Munich Research Centre, Riesstrasse 25, D-80992 Munich, Germany
| | - Stefan Weinzierl
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587, Germany
| |
Collapse
|
5
|
Gößling N, Marquardt D, Doclo S. Perceptual Evaluation of Binaural MVDR-Based Algorithms to Preserve the Interaural Coherence of Diffuse Noise Fields. Trends Hear 2020; 24:2331216520919573. [PMID: 32339061 PMCID: PMC7225838 DOI: 10.1177/2331216520919573] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Besides improving speech intelligibility in background noise, another important objective of noise reduction algorithms for binaural hearing devices is preserving the spatial impression for the listener. In this study, we evaluate the performance of several recently proposed noise reduction algorithms based on the binaural minimum-variance-distortionless-response (MVDR) beamformer, which trade-off between noise reduction performance and preservation of the interaural coherence (IC) for diffuse noise fields. Aiming at a perceptually optimized result, this trade-off is determined based on the IC discrimination ability of the human auditory system. The algorithms are evaluated with normal-hearing participants for an anechoic scenario and a reverberant cafeteria scenario, in terms of both speech intelligibility using a matrix sentence test and spatial quality using a MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA). The results show that all the binaural noise reduction algorithms are able to improve speech intelligibility compared with the unprocessed microphone signals, where partially preserving the IC of the diffuse noise field leads to a significant improvement in perceived spatial quality compared with the binaural MVDR beamformer while hardly affecting speech intelligibility.
Collapse
Affiliation(s)
- Nico Gößling
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg
| | - Daniel Marquardt
- Starkey Hearing Technologies, Eden Prairie, Minnesota, United States
| | - Simon Doclo
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg
| |
Collapse
|
6
|
Pausch F, Fels J. Localization Performance in a Binaural Real-Time Auralization System Extended to Research Hearing Aids. Trends Hear 2020; 24:2331216520908704. [PMID: 32324491 PMCID: PMC7198834 DOI: 10.1177/2331216520908704] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Auralization systems for auditory research should ideally be validated by perceptual experiments, as well as objective measures. This study employed perceptual tests to evaluate a recently proposed binaural real-time auralization system for hearing aid (HA) users. The dynamic localization of real sound sources was compared with that of virtualized ones, reproduced binaurally over headphones, loudspeakers with crosstalk cancellation (CTC) filters, research HAs, or combined via loudspeakers with CTC filters and research HAs under free-field conditions. System-inherent properties affecting localization cues were identified and their effects on overall horizontal localization, reversal rates, and angular error metrics were assessed. The general localization performance in combined reproduction was found to fall between what was measured for loudspeakers with CTC filters and research HAs alone. Reproduction via research HAs alone resulted in the highest reversal rates and angular errors. While combined reproduction helped decrease the reversal rates, no significant effect was observed on the angular error metrics. However, combined reproduction resulted in the same overall horizontal source localization performance as measured for real sound sources, while improving localization compared with reproduction over research HAs alone. Collectively, the results with respect to combined reproduction can be considered a performance indicator for future experiments involving HA users.
Collapse
Affiliation(s)
- Florian Pausch
- Teaching and Research Area of Medical Acoustics, Institute of Technical Acoustics, RWTH Aachen University
| | - Janina Fels
- Teaching and Research Area of Medical Acoustics, Institute of Technical Acoustics, RWTH Aachen University
| |
Collapse
|
7
|
Jenny C, Reuter C. Usability of Individualized Head-Related Transfer Functions in Virtual Reality: Empirical Study With Perceptual Attributes in Sagittal Plane Sound Localization. JMIR Serious Games 2020; 8:e17576. [PMID: 32897232 PMCID: PMC7509635 DOI: 10.2196/17576] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 05/07/2020] [Accepted: 07/26/2020] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In order to present virtual sound sources via headphones spatially, head-related transfer functions (HRTFs) can be applied to audio signals. In this so-called binaural virtual acoustics, the spatial perception may be degraded if the HRTFs deviate from the true HRTFs of the listener. OBJECTIVE In this study, participants wearing virtual reality (VR) headsets performed a listening test on the 3D audio perception of virtual audiovisual scenes, thus enabling us to investigate the necessity and influence of the individualization of HRTFs. Two hypotheses were investigated: first, general HRTFs lead to limitations of 3D audio perception in VR and second, the localization model for stationary localization errors is transferable to nonindividualized HRTFs in more complex environments such as VR. METHODS For the evaluation, 39 subjects rated individualized and nonindividualized HRTFs in an audiovisual virtual scene on the basis of 5 perceptual qualities: localizability, front-back position, externalization, tone color, and realism. The VR listening experiment consisted of 2 tests: in the first test, subjects evaluated their own and the general HRTF from the Massachusetts Institute of Technology Knowles Electronics Manikin for Acoustic Research database and in the second test, their own and 2 other nonindividualized HRTFs from the Acoustics Research Institute HRTF database. For the experiment, 2 subject-specific, nonindividualized HRTFs with a minimal and maximal localization error deviation were selected according to the localization model in sagittal planes. RESULTS With the Wilcoxon signed-rank test for the first test, analysis of variance for the second test, and a sample size of 78, the results were significant in all perceptual qualities, except for the front-back position between own and minimal deviant nonindividualized HRTF (P=.06). CONCLUSIONS Both hypotheses have been accepted. Sounds filtered by individualized HRTFs are considered easier to localize, easier to externalize, more natural in timbre, and thus more realistic compared to sounds filtered by nonindividualized HRTFs.
Collapse
Affiliation(s)
- Claudia Jenny
- Musicological Department, University of Vienna, Vienna, Austria
| | | |
Collapse
|
8
|
Uhrig S, Perkis A, Behne DM. Effects of speech transmission quality on sensory processing indicated by the cortical auditory evoked potential. J Neural Eng 2020; 17:046021. [PMID: 32422617 DOI: 10.1088/1741-2552/ab93e1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVE Degradations of transmitted speech have been shown to affect perceptual and cognitive processing in human listeners, as indicated by the P3 component of the event-related brain potential (ERP). However, research suggests that previously observed P3 modulations might actually be traced back to earlier neural modulations in the time range of the P1-N1-P2 complex of the cortical auditory evoked potential (CAEP). This study investigates whether auditory sensory processing, as reflected by the P1-N1-P2 complex, is already systematically altered by speech quality degradations. APPROACH Electrophysiological data from two studies were analyzed to examine effects of speech transmission quality (high-quality, noisy, bandpass-filtered) for spoken words on amplitude and latency parameters of individual P1, N1 and P2 components. MAIN RESULTS In the resultant ERP waveforms, an initial P1-N1-P2 manifested at stimulus onset, while a second N1-P2 occurred within the ongoing stimulus. Bandpass-filtered versus high-quality word stimuli evoked a faster and larger initial N1 as well as a reduced initial P2, hence exhibiting effects as early as the sensory stage of auditory information processing. SIGNIFICANCE The results corroborate the existence of systematic quality-related modulations in the initial N1-P2, which may potentially have carried over into P3 modulations demonstrated by previous studies. In future psychophysiological speech quality assessments, rigorous control procedures are needed to ensure the validity of P3-based indication of speech transmission quality. An alternative CAEP-based assessment approach is discussed, which promises to be more efficient and less constrained than the established approach based on P3.
Collapse
Affiliation(s)
- Stefan Uhrig
- Quality and Usability Lab, Technische Universität Berlin, D-10587 Berlin, Germany. Department of Electronic Systems, Norwegian University of Science and Technology, 7491 Trondheim, Norway. Author to whom any correspondence should be addressed
| | | | | |
Collapse
|
9
|
Rabini G, Lucin G, Pavani F. Certain, but incorrect: on the relation between subjective certainty and accuracy in sound localisation. Exp Brain Res 2020; 238:727-739. [PMID: 32080750 DOI: 10.1007/s00221-020-05748-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 02/05/2020] [Indexed: 10/25/2022]
Abstract
When asked to identify the position of a sound, listeners can report its perceived location as well as their subjective certainty about this spatial judgement. Yet, research to date focused primarily on measures of perceived location (e.g., accuracy and precision of pointing responses), neglecting instead the phenomenological experience of subjective spatial certainty. The present study aimed to investigate: (1) changes in subjective certainty about sound position induced by listening with one ear plugged (simulated monaural listening), compared to typical binaural listening and (2) the relation between subjective certainty about sound position and localisation accuracy. In two experiments (N = 20 each), participants localised single sounds delivered from one of 60 speakers hidden from view in front space. In each trial, they also provided a subjective rating of their spatial certainty about sound position. No feedback on response was provided. Overall, participants were mostly accurate and certain about sound position in binaural listening, whereas their accuracy and subjective certainty decreased in monaural listening. Interestingly, accuracy and certainty dissociated within single trials during monaural listening: in some trials participants were certain but incorrect, in others they were uncertain but correct. Furthermore, unlike accuracy, subjective certainty rapidly increased as a function of time during the monaural listening block. Finally, subjective certainty changed as a function of perceived location of the sound source. These novel findings reveal that listeners quickly update their subjective confidence on sound position, when they experience an altered listening condition, even in the absence of feedback. Furthermore, they document a dissociation between accuracy and subjective certainty when mapping auditory input to space.
Collapse
Affiliation(s)
- Giuseppe Rabini
- Centre for Mind/Brain Sciences (CIMeC), University of Trento, Via Angelo Bettini 31, 38068, Rovereto, TN, Italy.
| | - Giulia Lucin
- Department of Psychology and Cognitive Sciences (DiPSCo), University of Trento, Via Angelo Bettini 84, 38068, Rovereto, TN, Italy
| | - Francesco Pavani
- Centre for Mind/Brain Sciences (CIMeC), University of Trento, Via Angelo Bettini 31, 38068, Rovereto, TN, Italy.,Department of Psychology and Cognitive Sciences (DiPSCo), University of Trento, Via Angelo Bettini 84, 38068, Rovereto, TN, Italy.,IMPACT, Centre de Recherche en Neuroscience Lyon (CRNL), Lyon, France
| |
Collapse
|
10
|
Brinkmann F, Aspöck L, Ackermann D, Lepa S, Vorländer M, Weinzierl S. A round robin on room acoustical simulation and auralization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:2746. [PMID: 31046379 DOI: 10.1121/1.5096178] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 03/12/2019] [Indexed: 06/09/2023]
Abstract
A round robin was conducted to evaluate the state of the art of room acoustic modeling software both in the physical and perceptual realms. The test was based on six acoustic scenes highlighting specific acoustic phenomena and for three complex, "real-world" spatial environments. The results demonstrate that most present simulation algorithms generate obvious model errors once the assumptions of geometrical acoustics are no longer met. As a consequence, they are neither able to provide a reliable pattern of early reflections nor do they provide a reliable prediction of room acoustic parameters outside a medium frequency range. In the perceptual domain, the algorithms under test could generate mostly plausible but not authentic auralizations, i.e., the difference between simulated and measured impulse responses of the same scene was always clearly audible. Most relevant for this perceptual difference are deviations in tone color and source position between measurement and simulation, which to a large extent can be traced back to the simplified use of random incidence absorption and scattering coefficients and shortcomings in the simulation of early reflections due to the missing or insufficient modeling of diffraction.
Collapse
Affiliation(s)
- Fabian Brinkmann
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| | - Lukas Aspöck
- Institute of Technical Acoustics, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University, Kopernikusstraße 5, Aachen, D-52074, Germany
| | - David Ackermann
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| | - Steffen Lepa
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| | - Michael Vorländer
- Institute of Technical Acoustics, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University, Kopernikusstraße 5, Aachen, D-52074, Germany
| | - Stefan Weinzierl
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| |
Collapse
|
11
|
Ahrens J, Andersson C. Perceptual evaluation of headphone auralization of rooms captured with spherical microphone arrays with respect to spaciousness and timbre. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:2783. [PMID: 31046319 DOI: 10.1121/1.5096164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Accepted: 01/08/2019] [Indexed: 06/09/2023]
Abstract
A listening experiment is presented in which subjects rated the perceived differences in terms of spaciousness and timbre between a headphone-based headtracked dummy head auralization of a sound source in different rooms and a headphone-based headtracked auralization of a spherical microphone array recording of the same scenario. The underlying auralizations were based on measured impulse responses to assure equal conditions. Rigid-sphere arrays with different amounts of microphones ranging from 50 to up to 1202 were emulated through sequential measurements, and spherical harmonics orders of up to 12 were tested. The results show that the array auralizations are partially indistinguishable from the direct dummy head auralization at a spherical harmonics order of 8 or higher if the virtual sound source is located at a lateral position. No significant reduction of the perceived differences with increasing order is observed for frontal virtual sound sources. In this case, small differences with respect to both spaciousness and timbre persist. The evaluation of lowpass-filtered stimuli shows that the perceived differences occur exclusively at higher frequencies and can therefore be attributed to spatial aliasing. The room had only a minor effect on the results.
Collapse
Affiliation(s)
- Jens Ahrens
- Audio Technology Group, Division of Applied Acoustics, Chalmers University of Technology, 412 96 Gothenburg, Sweden
| | - Carl Andersson
- Audio Technology Group, Division of Applied Acoustics, Chalmers University of Technology, 412 96 Gothenburg, Sweden
| |
Collapse
|
12
|
Interaural Level Difference Optimization of Binaural Ambisonic Rendering. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9061226] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Ambisonics is a spatial audio technique appropriate for dynamic binaural rendering due to its sound field rotation and transformation capabilities, which has made it popular for virtual reality applications. An issue with low-order Ambisonics is that interaural level differences (ILDs) are often reproduced with lower values when compared to head-related impulse responses (HRIRs), which reduces lateralization and spaciousness. This paper introduces a method of Ambisonic ILD Optimization (AIO), a pre-processing technique to bring the ILDs produced by virtual loudspeaker binaural Ambisonic rendering closer to those of HRIRs. AIO is evaluated objectively for Ambisonic orders up to fifth order versus a reference dataset of HRIRs for all locations on the sphere via estimated ILD and spectral difference, and perceptually through listening tests using both simple and complex scenes. Results conclude AIO produces an overall improvement for all tested orders of Ambisonics, though the benefits are greatest at first and second order.
Collapse
|
13
|
Schutte M, Ewert SD, Wiegrebe L. The percept of reverberation is not affected by visual room impression in virtual environments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL229. [PMID: 31067971 DOI: 10.1121/1.5093642] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 02/21/2019] [Indexed: 06/09/2023]
Abstract
Humans possess mechanisms to suppress distracting early sound reflections, summarized as the precedence effect. Recent work shows that precedence is affected by visual stimulation. This paper investigates possible effects of visual stimulation on the perception of later reflections, i.e., reverberation. In a highly immersive audio-visual virtual reality environment, subjects were asked to quantify reverberation in conditions where simultaneously presented auditory and visual stimuli either match in room identity, sound source azimuth, and sound source distance, or diverge in one of these aspects. While subjects reliably judged reverberation across acoustic environments, the visual room impression did not affect reverberation estimates.
Collapse
Affiliation(s)
- Michael Schutte
- Division of Neurobiology, Department Biology II and Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Germany
| | - Stephan D Ewert
- Medical Physics and Cluster of Excellence Hearing4all, University of Oldenburg, , ,
| | - Lutz Wiegrebe
- Division of Neurobiology, Department Biology II and Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Germany
| |
Collapse
|
14
|
Pausch F, Aspöck L, Vorländer M, Fels J. An Extended Binaural Real-Time Auralization System With an Interface to Research Hearing Aids for Experiments on Subjects With Hearing Loss. Trends Hear 2019; 22:2331216518800871. [PMID: 30322347 PMCID: PMC6195018 DOI: 10.1177/2331216518800871] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Theory and implementation of acoustic virtual reality have matured and become a powerful tool for the simulation of entirely controllable virtual acoustic environments. Such virtual acoustic environments are relevant for various types of auditory experiments on subjects with normal hearing, facilitating flexible virtual scene generation and manipulation. When it comes to expanding the investigation group to subjects with hearing loss, choosing a reproduction system which offers a proper integration of hearing aids into the virtual acoustic scene is crucial. Current loudspeaker-based spatial audio reproduction systems rely on different techniques to synthesize a surrounding sound field, providing various possibilities for adaptation and extension to allow applications in the field of hearing aid-related research. Representing one option, the concept and implementation of an extended binaural real-time auralization system is presented here. This system is capable of generating complex virtual acoustic environments, including room acoustic simulations, which are reproduced as combined via loudspeakers and research hearing aids. An objective evaluation covers the investigation of different system components, a simulation benchmark analysis for assessing the processing performance, and end-to-end latency measurements.
Collapse
Affiliation(s)
- Florian Pausch
- 1 Institute of Technical Acoustics, Teaching and Research Area of Medical Acoustics, RWTH Aachen University, Germany
| | - Lukas Aspöck
- 2 Institute of Technical Acoustics, RWTH Aachen University, Germany
| | | | - Janina Fels
- 1 Institute of Technical Acoustics, Teaching and Research Area of Medical Acoustics, RWTH Aachen University, Germany
| |
Collapse
|
15
|
Haeussler A, van de Par S. Crispness, speech intelligibility, and coloration of reverberant recordings played back in another reverberant room (Room-In-Room). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:931. [PMID: 30823798 DOI: 10.1121/1.5090103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 01/22/2019] [Indexed: 06/09/2023]
Abstract
This work examines the acoustical and perceptual consequences that can be found in a transfer chain consisting of a sound recorded in one room which is played back over a loudspeaker in another room. The total resulting "Room-In-Room" (RinR) response can be modelled as a convolution of the Room Impulse Response of the first and second room. Due to the convolution an increase in the reverberation time, pulse density, and a change of the temporal envelope of the early reflections can be observed, compared to a single room. In the spectral domain, the convolution results in an increase in spectral modulation strength, responsible for coloration. The listening test investigating the perceptual consequences of RinR found a decrease in perceived crispness due to reproduction in a playback room, especially for highly reverberant conditions. When within normal sized rooms the reverberation time and total source-receiver distance were kept constant, RinR and a single room condition showed no reduction in crispness. On the other hand, a strong increase in the perceived coloration was measured. Furthermore, a decrease in speech intelligibility has been found for RinR conditions, compared to single rooms (Speech Reception Threshold of 2-3 dB).
Collapse
Affiliation(s)
- Andreas Haeussler
- Acoustics Group and Cluster of Excellence Hearing4All, University Oldenburg, D-26111 Oldenburg, Germany
| | - Steven van de Par
- Acoustics Group and Cluster of Excellence Hearing4All, University Oldenburg, D-26111 Oldenburg, Germany
| |
Collapse
|
16
|
Weinzierl S, Lepa S, Ackermann D. A measuring instrument for the auditory perception of rooms: The Room Acoustical Quality Inventory (RAQI). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1245. [PMID: 30424659 DOI: 10.1121/1.5051453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 08/08/2018] [Indexed: 06/09/2023]
Abstract
With the Room Acoustical Quality Inventory (RAQI), a measuring instrument for the perceptual space of performance venues for music and speech has been developed. First, a focus group with room acoustical experts determined relevant aspects of room acoustical impression in the form of a comprehensive list of 50 uni- and bipolar items in different categories. Then, n = 190 subjects rated their acoustical impression of 35 binaurally simulated rooms from 2 listening positions, with symphonic orchestra, solo trumpet, and dramatic speech as audio content. Subsequent explorative and confirmative factor analyses of the questionnaire data resulted in three possible solutions with four, six, and nine factors of room acoustical impression. The factor solutions, as well as the related RAQI items, were tested in terms of reliability, validity, and several types of measurement invariance, and were cross-validated by a follow-up experiment with a subsample of 46% of the original participants, which provided re-test reliabilities and stability coefficients for all RAQI constructs. The resulting psychometrically evaluated measurement instrument can be used for room quality assessment, acoustical planning, and the further development of room acoustical parameters in order to predict primary acoustical qualities of venues for music and speech.
Collapse
Affiliation(s)
- Stefan Weinzierl
- Audio Communication Group, Technische Universität Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| | - Steffen Lepa
- Audio Communication Group, Technische Universität Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| | - David Ackermann
- Audio Communication Group, Technische Universität Berlin, Einsteinufer 17c, Berlin, D-10587, Germany
| |
Collapse
|
17
|
Postma BNJ, Katz BFG. The influence of visual distance on the room-acoustic experience of auralizations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3035. [PMID: 29195448 DOI: 10.1121/1.5009554] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Auralizations have become more prevalent in architectural acoustics and virtual reality. Studies have shown that by employing a methodical calibration procedure, ecologically/perceptually valid auralizations can be obtained. Another study demonstrated a manner to include dynamic voice directivity with results indicating these auralizations were judged significantly more plausible than auralizations with static source orientations. With the increased plausibility of auralizations, it is possible to study room-acoustic experience employing virtual reality, having confidence that the results also apply to real-life situations. Limited studies have examined the influence of visuals on room-acoustic experience. Using a virtual reality framework, this study investigated the influence of visuals on the room-acoustic experience of auralizations. Evaluations compared dynamic voice auralizations coherently matched with visualization positions to incoherently matched audio-visual pairs. Based on the results, the test population could be divided into three subgroups: (1) those who judged auralizations more acoustically distant with increased visual distance, (2) those who judged auralizations louder with increased visual distance, and (3) those whose audio judgment was uninfluenced by the visual stimulus.
Collapse
Affiliation(s)
- Barteld N J Postma
- Audio Acoustics group, LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France
| | - Brian F G Katz
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Institut d'Alembert, Paris, France
| |
Collapse
|
18
|
Brinkmann F, Lindau A, Weinzierl S. On the authenticity of individual dynamic binaural synthesis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:1784. [PMID: 29092593 DOI: 10.1121/1.5005606] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A simulation that is perceptually indistinguishable from the corresponding real sound field could be termed authentic. Using binaural technology, such a simulation would theoretically be achieved by reconstructing the sound pressure at a listener's ears. However, inevitable errors in the measurement, rendering, and reproduction introduce audible degradations, as it has been demonstrated in previous studies for anechoic environments and static binaural simulations (fixed head orientation). The current study investigated the authenticity of individual dynamic binaural simulations for three different acoustic environments (anechoic, dry, wet) using a highly sensitive listening test design. The results show that about half of the participants failed to reliably detect any differences for a speech stimulus, whereas all participants were able to do so for pulsed pink noise. Higher detection rates were observed in the anechoic condition, compared to the reverberant spaces, while the source position had no significant effect. It is concluded that the authenticity mainly depends on how comprehensive the spectral cues are provided by the audio content, and the amount of reverberation, whereas the source position plays a minor role. This is confirmed by a broad qualitative evaluation, suggesting that remaining differences mainly affect the tone color rather than the spatial, temporal or dynamical qualities.
Collapse
Affiliation(s)
- Fabian Brinkmann
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17 c, D-10587 Berlin, Germany
| | - Alexander Lindau
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17 c, D-10587 Berlin, Germany
| | - Stefan Weinzierl
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17 c, D-10587 Berlin, Germany
| |
Collapse
|
19
|
Nowak J, Klockgether S. Perception and prediction of apparent source width and listener envelopment in binaural spherical microphone array auralizations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:1634. [PMID: 28964092 DOI: 10.1121/1.5003917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This article deals with the assessment and prediction of the reproduction quality when binaurally auralizing spherical microphone array data for room simulation applications. The auralization is perceptually assessed in a listening experiment using the two attributes, apparent source width (ASW) and listener envelopment (LEV), for spatial quality description, whereas the technical analysis employs a psychoacoustically motivated model for room acoustical perception (RAP) which is specifically designed to estimate ASW and LEV. Both analyses focus on the array configuration in terms of varying modal resolutions and its influence on the spatial reproduction quality. The auralizations comprise three simulated environments, i.e., free-field sound fields as well as a dry and a reverberant room. Ten different audio signals are used as test material. Perceptual results show that the array configuration strongly influences the perception of ASW and LEV which also depends on the reflection properties of the simulated room. The ASW and LEV predictions by the RAP model correlate well with the results from the listening experiment.
Collapse
Affiliation(s)
- Johannes Nowak
- Acoustics Department, Fraunhofer-Institute for Media Technology IDMT, Ilmenau, Germany
| | - Stefan Klockgether
- Department of Medical Physics and Acoustics, Cluster of Excellence "Hearing4all," University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
20
|
Ben-Hur Z, Brinkmann F, Sheaffer J, Weinzierl S, Rafaely B. Spectral equalization in binaural signals represented by order-truncated spherical harmonics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:4087. [PMID: 28618825 PMCID: PMC5457295 DOI: 10.1121/1.4983652] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Revised: 04/14/2017] [Accepted: 05/02/2017] [Indexed: 06/07/2023]
Abstract
The synthesis of binaural signals from spherical microphone array recordings has been recently proposed. The limited spatial resolution of the reproduced signal due to order-limited reproduction has been previously investigated perceptually, showing spatial perception ramifications, such as poor source localization and limited externalization. Furthermore, this spatial order limitation also has a detrimental effect on the frequency content of the signal and its perceived timbre, due to the rapid roll-off at high frequencies. In this paper, the underlying causes of this spectral roll-off are described mathematically and investigated numerically. A digital filter that equalizes the frequency spectrum of a low spatial order signal is introduced and evaluated. A comprehensive listening test was conducted to study the influence of the filter on the perception of the reproduced sound. Results indicate that the suggested filter is beneficial for restoring the timbral composition of order-truncated binaural signals, while conserving, and even improving, some spatial properties of the signal.
Collapse
Affiliation(s)
- Zamir Ben-Hur
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | - Fabian Brinkmann
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587 Berlin, Germany
| | - Jonathan Sheaffer
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | - Stefan Weinzierl
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, D-10587 Berlin, Germany
| | - Boaz Rafaely
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| |
Collapse
|
21
|
Kaplanis N, Bech S, Tervo S, Pätynen J, Lokki T, van Waterschoot T, Jensen SH. Perceptual aspects of reproduced sound in car cabin acoustics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:1459. [PMID: 28372066 DOI: 10.1121/1.4976816] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
An experiment was conducted to determine the perceptual effects of car cabin acoustics on the reproduced sound field. In-car measurements were conducted whilst the cabin's interior was physically modified. The captured sound fields were recreated in the laboratory using a three-dimensional loudspeaker array. A panel of expert assessors followed a rapid sensory analysis protocol, the flash profile, to perceptually characterize and evaluate 12 acoustical conditions of the car cabin using individually elicited attributes. A multivariate analysis revealed the panel's consensus and the identified perceptual constructs. Six perceptual constructs characterize the differences between the acoustical conditions of the cabin, related to bass, ambience, transparency, width and envelopment, brightness, and image focus. The current results indicate the importance of several acoustical properties of a car's interior on the perceived sound qualities. Moreover, they signify the capacity of the applied methodology in assessing spectral and spatial properties of automotive environments in laboratory settings using a time-efficient and flexible protocol.
Collapse
Affiliation(s)
| | - Søren Bech
- Bang and Olufsen a/s, Peter Bang vej 15, Struer, DK-7600, Denmark
| | - Sakari Tervo
- Department of Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland
| | - Jukka Pätynen
- Department of Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland
| | - Tapio Lokki
- Department of Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Finland
| | - Toon van Waterschoot
- Department of Electrical Engineering (ESAT-STADIUS/ETC), KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Søren Holdt Jensen
- Department of Electronic Systems, Aalborg University, 9220 Aalborg, Denmark
| |
Collapse
|
22
|
Simon LSR, Zacharov N, Katz BFG. Perceptual attributes for the comparison of head-related transfer functions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3623. [PMID: 27908072 DOI: 10.1121/1.4966115] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The benefit of using individual head-related transfer functions (HRTFs) in binaural audio is well documented with regards to improving localization precision. However, with the increased use of binaural audio in more complex scene renderings, cognitive studies, and virtual and augmented reality simulations, the perceptual impact of HRTF selection may go beyond simple localization. In this study, the authors develop a list of attributes which qualify the perceived differences between HRTFs, providing a qualitative understanding of the perceptual variance of non-individual binaural renderings. The list of attributes was designed using a Consensus Vocabulary Protocol elicitation method. Participants followed an Individual Vocabulary Protocol elicitation procedure, describing the perceived differences between binaural stimuli based on binauralized extracts of multichannel productions. This was followed by an automated lexical reduction and a series of consensus group meetings during which participants agreed on a list of relevant attributes. Finally, the proposed list of attributes was then evaluated through a listening test, leading to eight valid perceptual attributes for describing the perceptual dimensions affected by HRTF set variations.
Collapse
Affiliation(s)
- Laurent S R Simon
- Audio Acoustics Group, LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France
| | | | - Brian F G Katz
- Audio Acoustics Group, LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France
| |
Collapse
|