1
|
Ullah R, Zhang S, Asif M, Wahab F. Multimodal learning-based speech enhancement and separation, recent innovations, new horizons, challenges and real-world applications. Comput Biol Med 2025; 190:110082. [PMID: 40174498 DOI: 10.1016/j.compbiomed.2025.110082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/18/2025] [Accepted: 03/24/2025] [Indexed: 04/04/2025]
Abstract
With the increasing global prevalence of disabling hearing loss, speech enhancement technologies have become crucial for overcoming communication barriers and improving the quality of life for those affected. Multimodal learning has emerged as a powerful approach for speech enhancement and separation, integrating information from various sensory modalities such as audio signals, visual cues, and textual data. Despite substantial progress, challenges remain in synchronizing modalities, ensuring model robustness, and achieving scalability for real-time applications. This paper provides a comprehensive review of the latest advances in the most promising strategy, multimodal learning for speech enhancement and separation. We underscore the limitations of various methods in noisy and dynamic real-world environments and demonstrate how multimodal systems leverage complementary information from lip movements, text transcripts, and even brain signals to enhance performance. Critical deep learning architectures are covered, such as Transformers, Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), and generative models like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models. Various fusion strategies, including early and late fusion and attention mechanisms, are explored to address challenges in aligning and integrating multimodal inputs effectively. Furthermore, the paper explores important real-world applications in areas like automatic driver monitoring in autonomous vehicles, emotion recognition for mental health monitoring, augmented reality in interactive retail, smart surveillance for public safety, remote healthcare and telemedicine, and hearing assistive devices. Additionally, critical advanced procedures, comparisons, future challenges, and prospects are discussed to guide future research in multimodal learning for speech enhancement and separation, offering a roadmap for new horizons in this transformative field.
Collapse
Affiliation(s)
- Rizwan Ullah
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, 523808, PR China; School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou, 510640, PR China
| | - Shaohui Zhang
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan, 523808, PR China.
| | - Muhammad Asif
- Department of Electrical Engineering, Main Campus, University of Science & Technology, Bannu, 28100, Pakistan
| | - Fazale Wahab
- Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, 230026, Anhui, PR China
| |
Collapse
|
2
|
Magalhães SS, Lucas-Ochoa AM, Gonzalez-Cuello AM, Fernández-Villalba E, Pereira Toralles MB, Herrero MT. The mind-machine connection: adaptive information processing and new technologies promoting mental health in older adults. Neuroscientist 2025:10738584251318948. [PMID: 39969013 DOI: 10.1177/10738584251318948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]
Abstract
The human brain demonstrates an exceptional adaptability, which encompasses the ability to regulate emotions, exhibit cognitive flexibility, and generate behavioral responses, all supported by neuroplasticity. Brain-computer interfaces (BCIs) employ adaptive algorithms and machine learning techniques to adapt to variations in the user's brain activity, allowing for customized interactions with external devices. Older adults may experience cognitive decline, which could affect the ability to learn and adapt to new technologies such as BCIs, but both (human brain and BCI) demonstrate adaptability in their responses. The human brain is skilled at quickly switching between tasks and regulating emotions, while BCIs can modify signal-processing algorithms to accommodate changes in brain activity. Furthermore, the human brain and BCI participate in knowledge acquisition; the first one strengthens cognitive abilities through exposure to new experiences, and the second one improves performance through ongoing adjustment and improvement. Current research seeks to incorporate emotional states into BCI systems to improve the user experience, despite the exceptional emotional regulation abilities of the human brain. The implementation of BCIs for older adults could be more effective, inclusive, and beneficial in improving their quality of life. This review aims to improve the understanding of brain-machine interfaces and their implications for mental health in older adults.
Collapse
Affiliation(s)
- S S Magalhães
- Clinical and Experimental Neuroscience (NiCE-IMIB Pascual Parilla), Institute for Aging Research, School of Medicine, University of Murcia, Murcia, Spain
- Institute of Health Sciences, Postgraduate Program in Interactive Processes of Organs and Systems, Federal University of Bahia (UFBA) of Brazil, Salvador, Brazil
| | - A M Lucas-Ochoa
- Clinical and Experimental Neuroscience (NiCE-IMIB Pascual Parilla), Institute for Aging Research, School of Medicine, University of Murcia, Murcia, Spain
| | - A M Gonzalez-Cuello
- Clinical and Experimental Neuroscience (NiCE-IMIB Pascual Parilla), Institute for Aging Research, School of Medicine, University of Murcia, Murcia, Spain
| | - E Fernández-Villalba
- Clinical and Experimental Neuroscience (NiCE-IMIB Pascual Parilla), Institute for Aging Research, School of Medicine, University of Murcia, Murcia, Spain
| | - M B Pereira Toralles
- Institute of Health Sciences, Postgraduate Program in Interactive Processes of Organs and Systems, Federal University of Bahia (UFBA) of Brazil, Salvador, Brazil
| | - M T Herrero
- Clinical and Experimental Neuroscience (NiCE-IMIB Pascual Parilla), Institute for Aging Research, School of Medicine, University of Murcia, Murcia, Spain
| |
Collapse
|
3
|
Rahman N, Khan DM, Masroor K, Arshad M, Rafiq A, Fahim SM. Advances in brain-computer interface for decoding speech imagery from EEG signals: a systematic review. Cogn Neurodyn 2024; 18:3565-3583. [PMID: 39712121 PMCID: PMC11655741 DOI: 10.1007/s11571-024-10167-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 07/16/2024] [Accepted: 08/17/2024] [Indexed: 12/24/2024] Open
Abstract
Numerous individuals encounter challenges in verbal communication due to various factors, including physical disabilities, neurological disorders, and strokes. In response to this pressing need, technology has actively pursued solutions to bridge the communication gap, recognizing the inherent difficulties faced in verbal communication, particularly in contexts where traditional methods may be inadequate. Electroencephalogram (EEG) has emerged as a primary non-invasive method for measuring brain activity, offering valuable insights from a cognitive neurodevelopmental perspective. It forms the basis for Brain-Computer Interfaces (BCIs) that provide a communication channel for individuals with neurological impairments, thereby empowering them to express themselves effectively. EEG-based BCIs, especially those adapted to decode imagined speech from EEG signals, represent a significant advancement in enabling individuals with speech disabilities to communicate through text or synthesized speech. By utilizing cognitive neurodevelopmental insights, researchers have been able to develop innovative approaches for interpreting EEG signals and translating them into meaningful communication outputs. To aid researchers in effectively addressing this complex challenge, this review article synthesizes key findings from state-of-the-art significant studies. It investigates into the methodologies employed by various researchers, including preprocessing techniques, feature extraction methods, and classification algorithms utilizing Deep Learning and Machine Learning approaches and their integration. Furthermore, the review outlines the potential avenues for future research, with the goal of advancing the practical implementation of EEG-based BCI systems for decoding imagined speech from a cognitive neurodevelopmental perspective.
Collapse
Affiliation(s)
- Nimra Rahman
- Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Sindh 75270 Pakistan
| | - Danish Mahmood Khan
- Department of Electronic Engineering, NED University of Engineering and Technology, Karachi, Sindh 75270 Pakistan
- Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, Selangor 47500 Petaling Jaya, Malaysia
| | - Komal Masroor
- Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Sindh 75270 Pakistan
| | - Mehak Arshad
- Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Sindh 75270 Pakistan
| | - Amna Rafiq
- Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Sindh 75270 Pakistan
| | - Syeda Maham Fahim
- Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Sindh 75270 Pakistan
| |
Collapse
|
4
|
Lian S, Li Z. An end-to-end multi-task motor imagery EEG classification neural network based on dynamic fusion of spectral-temporal features. Comput Biol Med 2024; 178:108727. [PMID: 38897146 DOI: 10.1016/j.compbiomed.2024.108727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 05/18/2024] [Accepted: 06/07/2024] [Indexed: 06/21/2024]
Abstract
Electroencephalograph (EEG) brain-computer interfaces (BCI) have potential to provide new paradigms for controlling computers and devices. The accuracy of brain pattern classification in EEG BCI is directly affected by the quality of features extracted from EEG signals. Currently, feature extraction heavily relies on prior knowledge to engineer features (for example from specific frequency bands); therefore, better extraction of EEG features is an important research direction. In this work, we propose an end-to-end deep neural network that automatically finds and combines features for motor imagery (MI) based EEG BCI with 4 or more imagery classes (multi-task). First, spectral domain features of EEG signals are learned by compact convolutional neural network (CCNN) layers. Then, gated recurrent unit (GRU) neural network layers automatically learn temporal patterns. Lastly, an attention mechanism dynamically combines (across EEG channels) the extracted spectral-temporal features, reducing redundancy. We test our method using BCI Competition IV-2a and a data set we collected. The average classification accuracy on 4-class BCI Competition IV-2a was 85.1 % ± 6.19 %, comparable to recent work in the field and showing low variability among participants; average classification accuracy on our 6-class data was 64.4 % ± 8.35 %. Our dynamic fusion of spectral-temporal features is end-to-end and has relatively few network parameters, and the experimental results show its effectiveness and potential.
Collapse
Affiliation(s)
- Shidong Lian
- School of Systems Science, Beijing Normal University, Beijing, China; International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, China
| | - Zheng Li
- Center for Cognition and Neuroergonomics, State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Zhuhai, China; Department of Psychology, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai, China.
| |
Collapse
|
5
|
Herbert C. Brain-computer interfaces and human factors: the role of language and cultural differences-Still a missing gap? Front Hum Neurosci 2024; 18:1305445. [PMID: 38665897 PMCID: PMC11043545 DOI: 10.3389/fnhum.2024.1305445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 02/02/2024] [Indexed: 04/28/2024] Open
Abstract
Brain-computer interfaces (BCIs) aim at the non-invasive investigation of brain activity for supporting communication and interaction of the users with their environment by means of brain-machine assisted technologies. Despite technological progress and promising research aimed at understanding the influence of human factors on BCI effectiveness, some topics still remain unexplored. The aim of this article is to discuss why it is important to consider the language of the user, its embodied grounding in perception, action and emotions, and its interaction with cultural differences in information processing in future BCI research. Based on evidence from recent studies, it is proposed that detection of language abilities and language training are two main topics of enquiry of future BCI studies to extend communication among vulnerable and healthy BCI users from bench to bedside and real world applications. In addition, cultural differences shape perception, actions, cognition, language and emotions subjectively, behaviorally as well as neuronally. Therefore, BCI applications should consider cultural differences in information processing to develop culture- and language-sensitive BCI applications for different user groups and BCIs, and investigate the linguistic and cultural contexts in which the BCI will be used.
Collapse
Affiliation(s)
- Cornelia Herbert
- Applied Emotion and Motivation Psychology, Institute of Psychology and Education, Ulm University, Ulm, Germany
| |
Collapse
|
6
|
Giansanti D. An Umbrella Review of the Fusion of fMRI and AI in Autism. Diagnostics (Basel) 2023; 13:3552. [PMID: 38066793 PMCID: PMC10706112 DOI: 10.3390/diagnostics13233552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/22/2023] [Accepted: 11/25/2023] [Indexed: 04/05/2024] Open
Abstract
The role of functional magnetic resonance imaging (fMRI) is assuming an increasingly central role in autism diagnosis. The integration of Artificial Intelligence (AI) into the realm of applications further contributes to its development. This study's objective is to analyze emerging themes in this domain through an umbrella review, encompassing systematic reviews. The research methodology was based on a structured process for conducting a literature narrative review, using an umbrella review in PubMed and Scopus. Rigorous criteria, a standard checklist, and a qualification process were meticulously applied. The findings include 20 systematic reviews that underscore key themes in autism research, particularly emphasizing the significance of technological integration, including the pivotal roles of fMRI and AI. This study also highlights the enigmatic role of oxytocin. While acknowledging the immense potential in this field, the outcome does not evade acknowledging the significant challenges and limitations. Intriguingly, there is a growing emphasis on research and innovation in AI, whereas aspects related to the integration of healthcare processes, such as regulation, acceptance, informed consent, and data security, receive comparatively less attention. Additionally, the integration of these findings into Personalized Medicine (PM) represents a promising yet relatively unexplored area within autism research. This study concludes by encouraging scholars to focus on the critical themes of health domain integration, vital for the routine implementation of these applications.
Collapse
Affiliation(s)
- Daniele Giansanti
- Centro Nazionale TISP, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161 Roma, Italy
| |
Collapse
|