1
|
Zhang C, Li H. Adoption of Artificial Intelligence Along with Gesture Interactive Robot in Musical Perception Education based on Deep Learning Method. INT J HUM ROBOT 2022. [DOI: 10.1142/s0219843622400084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
2
|
Kuroda J, Koutaki G. Sensing Control Parameters of Flute from Microphone Sound Based on Machine Learning from Robotic Performer. SENSORS 2022; 22:s22052074. [PMID: 35271221 PMCID: PMC8914778 DOI: 10.3390/s22052074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 12/10/2022]
Abstract
When learning to play a musical instrument, it is important to improve the quality of self-practice. Many systems have been developed to assist practice. Some practice assistance systems use special sensors (pressure, flow, and motion sensors) to acquire the control parameters of the musical instrument, and provide specific guidance. However, it is difficult to acquire the control parameters of wind instruments (e.g., saxophone or flute) such as flow and angle between the player and the musical instrument, since it is not possible to place sensors into the mouth. In this paper, we propose a sensorless control parameter estimation system based on the recorded sound of a wind instrument using only machine learning. In the machine learning framework, many training samples that have both sound and correct labels are required. Therefore, we generated training samples using a robotic performer. This has two advantages: (1) it is easy to obtain many training samples with exhaustive control parameters, and (2) we can use the correct labels as the given control parameters of the robot. In addition to the samples generated by the robot, some human performance data were also used for training to construct an estimation model that enhanced the feature differences between robot and human performance. Finally, a flute control parameter estimation system was developed, and its estimation accuracy for eight novice flute players was evaluated using the Spearman's rank correlation coefficient. The experimental results showed that the proposed system was able to estimate human control parameters with high accuracy.
Collapse
|
3
|
Bishop L, Jensenius AR, Laeng B. Musical and Bodily Predictors of Mental Effort in String Quartet Music: An Ecological Pupillometry Study of Performers and Listeners. Front Psychol 2021; 12:653021. [PMID: 34262504 PMCID: PMC8274478 DOI: 10.3389/fpsyg.2021.653021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 05/17/2021] [Indexed: 11/25/2022] Open
Abstract
Music performance can be cognitively and physically demanding. These demands vary across the course of a performance as the content of the music changes. More demanding passages require performers to focus their attention more intensity, or expend greater “mental effort.” To date, it remains unclear what effect different cognitive-motor demands have on performers' mental effort. It is likewise unclear how fluctuations in mental effort compare between performers and perceivers of the same music. We used pupillometry to examine the effects of different cognitive-motor demands on the mental effort used by performers and perceivers of classical string quartet music. We collected pupillometry, motion capture, and audio-video recordings of a string quartet as they performed a rehearsal and concert (for live audience) in our lab. We then collected pupillometry data from a remote sample of musically-trained listeners, who heard the audio recordings (without video) that we captured during the concert. We used a modelling approach to assess the effects of performers' bodily effort (head and arm motion; sound level; performers' ratings of technical difficulty), musical complexity (performers' ratings of harmonic complexity; a score-based measure of harmonic tension), and expressive difficulty (performers' ratings of expressive difficulty) on performers' and listeners' pupil diameters. Our results show stimulating effects of bodily effort and expressive difficulty on performers' pupil diameters, and stimulating effects of expressive difficulty on listeners' pupil diameters. We also observed negative effects of musical complexity on both performers and listeners, and negative effects of performers' bodily effort on listeners, which we suggest may reflect the complex relationships that these features share with other aspects of musical structure. Looking across the concert, we found that both of the quartet violinists (who exchanged places halfway through the concert) showed more dilated pupils during their turns as 1st violinist than when playing as 2nd violinist, suggesting that they experienced greater arousal when “leading” the quartet in the 1st violin role. This study shows how eye tracking and motion capture technologies can be used in combination in an ecological setting to investigate cognitive processing in music performance.
Collapse
Affiliation(s)
- Laura Bishop
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway.,Department of Musicology, University of Oslo, Oslo, Norway
| | - Alexander Refsum Jensenius
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway.,Department of Musicology, University of Oslo, Oslo, Norway
| | - Bruno Laeng
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway.,Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
4
|
Blanco AD, Tassani S, Ramirez R. Real-Time Sound and Motion Feedback for Violin Bow Technique Learning: A Controlled, Randomized Trial. Front Psychol 2021; 12:648479. [PMID: 33981275 PMCID: PMC8107276 DOI: 10.3389/fpsyg.2021.648479] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Accepted: 03/22/2021] [Indexed: 11/30/2022] Open
Abstract
The production of good sound generation in the violin is a complex task that requires coordination and spatiotemporal control of bowing gestures. The use of motion-capture technologies to improve performance or reduce injury risks in the area of kinesiology is becoming widespread. The combination of motion accuracy and sound quality feedback has the potential of becoming an important aid in violin learning. In this study, we evaluate motion-capture and sound-quality analysis technologies developed inside the context of the TELMI, a technology-enhanced music learning project. We analyzed the sound and bow motion of 50 participants with no prior violin experience while learning to produce a stable sound in the violin. Participants were divided into two groups: the experimental group (N = 24) received real-time visual feedback both on kinematics and sound quality, while participants in the control group (N = 26) practiced without any type of external help. An additional third group of violin experts performed the same task for comparative purposes (N = 15). After the practice session, all groups were evaluated in a transfer phase without feedback. At the practice phase, the experimental group improved their bowing kinematics in comparison to the control group, but this was at the expense of impairing the sound quality of their performance. At the retention phase, the experimental group showed better results in sound quality, especially concerning control of sound dynamics. Besides, we found that the expert group improved the stability of their sound while using the technology. All in all, these results emphasize the importance of feedback technologies in learning complex tasks, such as musical instrument learning.
Collapse
Affiliation(s)
- Angel David Blanco
- Music and Machine Learning Lab, Department of Information and Communications Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Simone Tassani
- Multiscale and Computational Biomechanics and Mechanobiology Team, Department of Information and Communications Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - Rafael Ramirez
- Music and Machine Learning Lab, Department of Information and Communications Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
5
|
Dalmazzo D, Waddell G, Ramírez R. Applying Deep Learning Techniques to Estimate Patterns of Musical Gesture. Front Psychol 2021; 11:575971. [PMID: 33469435 PMCID: PMC7813937 DOI: 10.3389/fpsyg.2020.575971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 11/23/2020] [Indexed: 11/30/2022] Open
Abstract
Repetitive practice is one of the most important factors in improving the performance of motor skills. This paper focuses on the analysis and classification of forearm gestures in the context of violin playing. We recorded five experts and three students performing eight traditional classical violin bow-strokes: martelé, staccato, detaché, ricochet, legato, trémolo, collé, and col legno. To record inertial motion information, we utilized the Myo sensor, which reports a multidimensional time-series signal. We synchronized inertial motion recordings with audio data to extract the spatiotemporal dynamics of each gesture. Applying state-of-the-art deep neural networks, we implemented and compared different architectures where convolutional neural networks (CNN) models demonstrated recognition rates of 97.147%, 3DMultiHeaded_CNN models showed rates of 98.553%, and rates of 99.234% were demonstrated by CNN_LSTM models. The collected data (quaternion of the bowing arm of a violinist) contained sufficient information to distinguish the bowing techniques studied, and deep learning methods were capable of learning the movement patterns that distinguish these techniques. Each of the learning algorithms investigated (CNN, 3DMultiHeaded_CNN, and CNN_LSTM) produced high classification accuracies which supported the feasibility of training classifiers. The resulting classifiers may provide the foundation of a digital assistant to enhance musicians' time spent practicing alone, providing real-time feedback on the accuracy and consistency of their musical gestures in performance.
Collapse
Affiliation(s)
- David Dalmazzo
- Music Technology Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| | - George Waddell
- Centre for Performance Science, Royal College of Music, London, United Kingdom.,Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Rafael Ramírez
- Music Technology Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
6
|
New Interfaces and Approaches to Machine Learning When Classifying Gestures within Music. ENTROPY 2020; 22:e22121384. [PMID: 33297582 PMCID: PMC7762429 DOI: 10.3390/e22121384] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 12/03/2020] [Accepted: 12/03/2020] [Indexed: 11/25/2022]
Abstract
Interactive music uses wearable sensors (i.e., gestural interfaces—GIs) and biometric datasets to reinvent traditional human–computer interaction and enhance music composition. In recent years, machine learning (ML) has been important for the artform. This is because ML helps process complex biometric datasets from GIs when predicting musical actions (termed performance gestures). ML allows musicians to create novel interactions with digital media. Wekinator is a popular ML software amongst artists, allowing users to train models through demonstration. It is built on the Waikato Environment for Knowledge Analysis (WEKA) framework, which is used to build supervised predictive models. Previous research has used biometric data from GIs to train specific ML models. However, previous research does not inform optimum ML model choice, within music, or compare model performance. Wekinator offers several ML models. Thus, we used Wekinator and the Myo armband GI and study three performance gestures for piano practice to solve this problem. Using these, we trained all models in Wekinator and investigated their accuracy, how gesture representation affects model accuracy and if optimisation can arise. Results show that neural networks are the strongest continuous classifiers, mapping behaviour differs amongst continuous models, optimisation can occur and gesture representation disparately affects model mapping behaviour; impacting music practice.
Collapse
|
7
|
Sun SW, Liu BY, Chang PC. Deep Learning-Based Violin Bowing Action Recognition. SENSORS (BASEL, SWITZERLAND) 2020; 20:s20205732. [PMID: 33050164 PMCID: PMC7601403 DOI: 10.3390/s20205732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/02/2020] [Accepted: 10/04/2020] [Indexed: 06/11/2023]
Abstract
We propose a violin bowing action recognition system that can accurately recognize distinct bowing actions in classical violin performance. This system can recognize bowing actions by analyzing signals from a depth camera and from inertial sensors that are worn by a violinist. The contribution of this study is threefold: (1) a dataset comprising violin bowing actions was constructed from data captured by a depth camera and multiple inertial sensors; (2) data augmentation was achieved for depth-frame data through rotation in three-dimensional world coordinates and for inertial sensing data through yaw, pitch, and roll angle transformations; and, (3) bowing action classifiers were trained using different modalities, to compensate for the strengths and weaknesses of each modality, based on deep learning methods with a decision-level fusion process. In experiments, large external motions and subtle local motions produced from violin bow manipulations were both accurately recognized by the proposed system (average accuracy > 80%).
Collapse
Affiliation(s)
- Shih-Wei Sun
- Deptartment of New Media Art, Taipei National University of the Arts, Taipei 11201, Taiwan
- Computer Center, Taipei National University of the Arts, Taipei 11201, Taiwan
| | - Bao-Yun Liu
- Deptartment of Communication Engineering, National Central University, Taoyuan 32001, Taiwan; (B.-Y.L.); (P.-C.C.)
| | - Pao-Chi Chang
- Deptartment of Communication Engineering, National Central University, Taoyuan 32001, Taiwan; (B.-Y.L.); (P.-C.C.)
| |
Collapse
|
8
|
Abstract
AbstractLearning to play and perform a music instrument is a complex cognitive task, requiring high conscious control and coordination of an impressive number of cognitive and sensorimotor skills. For professional violinists, there exists a physical connection with the instrument allowing the player to continuously manage the sound through sophisticated bowing techniques and fine hand movements. Hence, it is not surprising that great importance in violin training is given to right hand techniques, responsible for most of the sound produced. In this paper, our aim is to understand which motion features can be used to efficiently and effectively distinguish a professional performance from that of a student without exploiting sound-based features. We collected and made freely available a dataset consisting of motion capture recordings of different violinists with different skills performing different exercises covering different pedagogical and technical aspects. We then engineered peculiar features and trained a data-driven classifier to distinguish among two different levels of violinist experience, namely beginners and experts. In accordance with the hierarchy present in the dataset, we study two different scenarios: extrapolation with respect to different exercises and violinists. Furthermore, we study which features are the most predictive ones of the quality of a violinist to corroborate the significance of the results. The results, both in terms of accuracy and insight on the cognitive problem, support the proposal and support the use of the proposed technique as a support tool for students to monitor and enhance their home study and practice.
Collapse
|
9
|
Hachaj T, Piekarczyk M. Evaluation of Pattern Recognition Methods for Head Gesture-Based Interface of a Virtual Reality Helmet Equipped with a Single IMU Sensor. SENSORS 2019; 19:s19245408. [PMID: 31817991 PMCID: PMC6960875 DOI: 10.3390/s19245408] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 12/05/2019] [Accepted: 12/06/2019] [Indexed: 11/20/2022]
Abstract
The motivation of this paper is to examine the effectiveness of state-of-the-art and newly proposed motion capture pattern recognition methods in the task of head gesture classifications. The head gestures are designed for a user interface that utilizes a virtual reality helmet equipped with an internal measurement unit (IMU) sensor that has 6-axis accelerometer and gyroscope. We will validate a classifier that uses Principal Components Analysis (PCA)-based features with various numbers of dimensions, a two-stage PCA-based method, a feedforward artificial neural network, and random forest. Moreover, we will also propose a Dynamic Time Warping (DTW) classifier trained with extension of DTW Barycenter Averaging (DBA) algorithm that utilizes quaternion averaging and a bagged variation of previous method (DTWb) that utilizes many DTW classifiers that perform voting. The evaluation has been performed on 975 head gesture recordings in seven classes acquired from 12 persons. The highest value of recognition rate in a leave-one-out test has been obtained for DTWb and it equals 0.975 (0.026 better than the best of state-of-the-art methods to which we have compared our approach). Among the most important applications of the proposed method is improving life quality for people who are disabled below the neck by supporting, for example, an assistive autonomous power chair with a head gesture interface or remote controlled interfaces in robotics.
Collapse
|