1
|
Wuth J, Mahu R, Cohen I, Stern RM, Yoma NB. A unified beamforming and source separation model for static and dynamic human-robot interaction. JASA EXPRESS LETTERS 2024; 4:035203. [PMID: 38441431 DOI: 10.1121/10.0025238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/20/2024] [Indexed: 03/07/2024]
Abstract
This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance distortionless response beamformer provides a greater signal-to-noise ratio (SNR) than previous parallel and cascade systems that combine BSS and beamforming. In the difficult-to-model HRI dynamic environment, the system provides a SNR gain that was 2.8 dB greater than the results obtained with the cascade combination, where the parallel combination is infeasible.
Collapse
Affiliation(s)
- Jorge Wuth
- Speech Processing and Transmission Laboratory, Department of Electrical Engineering, University of Chile, Av. Tupper 2007, Santiago, Chile
| | - Rodrigo Mahu
- Speech Processing and Transmission Laboratory, Department of Electrical Engineering, University of Chile, Av. Tupper 2007, Santiago, Chile
| | - Israel Cohen
- Technion-Israel Institute of Technology, Haifa 3200003, Israel
| | - Richard M Stern
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, , , , ,
| | - Néstor Becerra Yoma
- Speech Processing and Transmission Laboratory, Department of Electrical Engineering, University of Chile, Av. Tupper 2007, Santiago, Chile
| |
Collapse
|
2
|
Hou Z, Hu Q, Chen K, Cao Z, Lu J. Local spectral attention for full-band speech enhancement. JASA EXPRESS LETTERS 2023; 3:115201. [PMID: 37916951 DOI: 10.1121/10.0022268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/19/2023] [Indexed: 11/03/2023]
Abstract
Attention mechanism has been widely used in speech enhancement (SE) because, theoretically, it can effectively model the inherent connection of signal in time domain and spectrum domain. In this Letter, it is found that the attention over the entire frequency range hampers the inference for full-band SE and possibly leads to excessive residual noise and degradation of speech. To alleviate this problem, the local spectral attention is introduced into full-band SE model by limiting the span of attention. The ablation tests on three full-band SE models reveal that the local frequency attention can effectively improve overall performance.
Collapse
Affiliation(s)
- Zhongshu Hou
- Key Laboratory of Modern Acoustics, Nanjing University, Nanjing, 210093, China
- Nanjing University-Horizon Intelligent Audio Lab, Horizon Robotics, Beijing 100094, China
- Nanjing Institute of Advanced Artificial Intelligence, Nanjing 210014, China
| | - Qinwen Hu
- Key Laboratory of Modern Acoustics, Nanjing University, Nanjing, 210093, China
- Nanjing University-Horizon Intelligent Audio Lab, Horizon Robotics, Beijing 100094, China
- Nanjing Institute of Advanced Artificial Intelligence, Nanjing 210014, China
| | - Kai Chen
- Key Laboratory of Modern Acoustics, Nanjing University, Nanjing, 210093, China
- Nanjing University-Horizon Intelligent Audio Lab, Horizon Robotics, Beijing 100094, China
- Nanjing Institute of Advanced Artificial Intelligence, Nanjing 210014, China
| | - Zhanzhong Cao
- Nanjing Institute of Information Technology, Nanjing 210036, , , , ,
| | - Jing Lu
- Key Laboratory of Modern Acoustics, Nanjing University, Nanjing, 210093, China
- Nanjing University-Horizon Intelligent Audio Lab, Horizon Robotics, Beijing 100094, China
- Nanjing Institute of Advanced Artificial Intelligence, Nanjing 210014, China
| |
Collapse
|