1
|
Niu Y, Chen N, Zhu H, Jin J, Li G. Music-oriented auditory attention detection from electroencephalogram. Neurosci Lett 2024; 818:137534. [PMID: 37871827 DOI: 10.1016/j.neulet.2023.137534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 10/16/2023] [Accepted: 10/19/2023] [Indexed: 10/25/2023]
Abstract
Music-oriented auditory attention detection (AAD) aims at determining which instrument in polyphonic music a listener is paying attention to by analyzing the listener's electroencephalogram (EEG). However, the existing linear models cannot effectively mimic the nonlinearity of the human brain, resulting in limited performance. Thus, a nonlinear music-oriented AAD model is proposed in this paper. Firstly, an auditory feature and a musical feature are fused to represent musical sources precisely and comprehensively. Secondly, the EEG is enhanced if music stimuli are presented in stereo. Thirdly, a neural network architecture is constructed to capture nonlinear and dynamic interactions between the EEG and auditory stimuli. Finally, the musical source most similar to the EEG in the common embedding space is identified as the attended one. Experimental results demonstrate that the proposed model outperforms all baseline models. On 1-s decision windows, it reaches accuracies of 92.6% and 81.7% under mono duo and trio stimuli, respectively. Additionally, it can be easily extended to speech-oriented AAD. This work can open up new possibilities for studies on both brain neural activity decoding and music information retrieval.
Collapse
Affiliation(s)
- Yixiang Niu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Ning Chen
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
| | - Hongqing Zhu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jing Jin
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China; Shenzhen Research Institute of East China University of Science and Technology, Shenzhen 518063, China
| | - Guangqiang Li
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
2
|
An Educational Guide through the FMP Notebooks for Teaching and Learning Fundamentals of Music Processing. SIGNALS 2021. [DOI: 10.3390/signals2020018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper provides a guide through the FMP notebooks, a comprehensive collection of educational material for teaching and learning fundamentals of music processing (FMP) with a particular focus on the audio domain. Organized in nine parts that consist of more than 100 individual notebooks, this collection discusses well-established topics in music information retrieval (MIR) such as beat tracking, chord recognition, music synchronization, audio fingerprinting, music segmentation, and source separation, to name a few. These MIR tasks provide motivating and tangible examples that students can hold onto when studying technical aspects in signal processing, information retrieval, or pattern analysis. The FMP notebooks comprise detailed textbook-like explanations of central techniques and algorithms combined with Python code examples that illustrate how to implement the methods. All components, including the introductions of MIR scenarios, illustrations, sound examples, technical concepts, mathematical details, and code examples, are integrated into a unified framework based on Jupyter notebooks. Providing a platform with many baseline implementations, the FMP notebooks are suited for conducting experiments and generating educational material for lectures, thus addressing students, teachers, and researchers. While giving a guide through the notebooks, this paper’s objective is to yield concrete examples on how to use the FMP notebooks to create an enriching, interactive, and interdisciplinary supplement for studies in science, technology, engineering, and mathematics. The FMP notebooks (including HTML exports) are publicly accessible under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Collapse
|
3
|
Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app10010019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.
Collapse
|
4
|
|
5
|
Maršík L, Martišek P, Pokorný J, Rusek M, Slaninová K, Martinovič J, Robine M, Hanna P, Bayle Y. KaraMIR: A Project for Cover Song Identification and Singing Voice Analysis Using a Karaoke Songs Dataset. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2018. [DOI: 10.1142/s1793351x18400202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We introduce KaraMIR, a musical project dedicated to karaoke song analysis. Within KaraMIR, we define Kara1k, a dataset composed of 1000 cover songs provided by Recisio Karafun application, and the corresponding 1000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, Kara1k offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer’s gender, backing vocals, duets, and lyrics’ language. KaraMIR project focuses on defining new problems and describing features and tools to solve them. We thus provide a comparison of traditional and new features for a cover song identification task using statistical methods, as well as the dynamic time warping method on chroma, MFCC, chords, keys, and chord distance features. A supporting experiment on the singer gender classification task is also proposed. The KaraMIR project website facilitates the continuous research.
Collapse
Affiliation(s)
- Ladislav Maršík
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25 Prague, Czech Republic
| | - Petr Martišek
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25 Prague, Czech Republic
| | - Jaroslav Pokorný
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25 Prague, Czech Republic
| | - Martin Rusek
- IT4 Innovations, VŠB — Technical University of Ostrava, 17. Listopadu 15/2172, 708 33 Ostrava-Poruba, Czech Republic
| | - Kateřina Slaninová
- IT4 Innovations, VŠB — Technical University of Ostrava, 17. Listopadu 15/2172, 708 33 Ostrava-Poruba, Czech Republic
| | - Jan Martinovič
- IT4 Innovations, VŠB — Technical University of Ostrava, 17. Listopadu 15/2172, 708 33 Ostrava-Poruba, Czech Republic
| | - Matthias Robine
- Univ. Bordeaux, LaBRI, UMR 5800, F-33400 Talence, France
- CNRS, LaBRI, UMR 5800, F-33400 Talence, France
| | - Pierre Hanna
- Univ. Bordeaux, LaBRI, UMR 5800, F-33400 Talence, France
- CNRS, LaBRI, UMR 5800, F-33400 Talence, France
| | - Yann Bayle
- Univ. Bordeaux, LaBRI, UMR 5800, F-33400 Talence, France
- CNRS, LaBRI, UMR 5800, F-33400 Talence, France
| |
Collapse
|
6
|
Marczak R, Schott G, Hanna P. Postprocessing Gameplay Metrics for Gameplay Performance Segmentation Based on Audiovisual Analysis. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2015. [DOI: 10.1109/tciaig.2014.2382718] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
7
|
Abstract
Creating emotionally sensitive machines will significantly enhance the interaction between humans and machines. In this chapter we focus on enabling this ability for music. Music is extremely powerful to induce emotions. If machines can somehow apprehend emotions in music, it gives them a relevant competence to communicate with humans. In this chapter we review the theories of music and emotions. We detail different representations of musical emotions from the literature, together with related musical features. Then, we focus on techniques to detect the emotion in music from audio content. As a proof of concept, we detail a machine learning method to build such a system. We also review the current state of the art results, provide evaluations and give some insights into the possible applications and future trends of these techniques.
Collapse
|
8
|
Durrieu JL, Richard G, David B, Fevotte C. Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2010.2041114] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
9
|
|
10
|
YU YI, JOE KAZUKI, ORIA VINCENT, MOERCHEN FABIAN, DOWNIE JSTEPHEN, CHEN LEI. MULTI-VERSION MUSIC SEARCH USING ACOUSTIC FEATURE UNION AND EXACT SOFT MAPPING. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2009. [DOI: 10.1142/s1793351x09000732] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Research on audio-based music retrieval has primarily concentrated on refining audio features to improve search quality. However, much less work has been done on improving the time efficiency of music audio searches. Representing music audio documents in an indexable format provides a mechanism for achieving efficiency. To address this issue, in this work Exact Locality Sensitive Mapping (ELSM) is suggested to join the concatenated feature sets and soft hash values. On this basis we propose audio-based music indexing techniques, ELSM and Soft Locality Sensitive Hash (SoftLSH) using an optimized Feature Union (FU) set of extracted audio features. Two contributions are made here. First, the principle of similarity-invariance is applied in summarizing audio feature sequences and utilized in training semantic audio representations based on regression. Second, soft hash values are pre-calculated to help locate the searching range more accurately and improve collision probability among features similar to each other. Our algorithms are implemented in a demonstration system to show how to retrieve and evaluate multi-version audio documents. Experimental evaluation over a real "multi-version" audio dataset confirms the practicality of ELSM and SoftLSH with FU and proves that our algorithms are effective for both multi-version detection (online query, one-query vs. multi-object) and same content detection (batch queries, multi-queries vs. one-object).
Collapse
Affiliation(s)
- YI YU
- Department of Advanced Information and Computer Sciences, Nara Women's University, Kitauoya nishi-machi, Nara 630-8506, Japan
| | - KAZUKI JOE
- Department of Advanced Information and Computer Sciences, Nara Women's University, Kitauoya nishi-machi, Nara 630-8506, Japan
| | - VINCENT ORIA
- Department of Computer Science, New Jersey Institute of Technology, University Heights Newark, NJ 07102, USA
| | - FABIAN MOERCHEN
- Siemens Corporate Research, Integrated Data Systems, 755 College Road East Princeton, NJ 08540, USA
| | - J. STEPHEN DOWNIE
- Graduate School of Library and Information Science, University of Illinois at Urbana Champaign, USA
| | - LEI CHEN
- Department of Computer Science, Hong Kong University of Science and Technology, HKSAR, China
| |
Collapse
|