1
|
Ni G, Xu Z, Bai Y, Zheng Q, Zhao R, Wu Y, Ming D. EEG-based assessment of temporal fine structure and envelope effect in mandarin syllable and tone perception. Cereb Cortex 2023; 33:11287-11299. [PMID: 37804238 DOI: 10.1093/cercor/bhad366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/13/2023] [Accepted: 09/15/2023] [Indexed: 10/09/2023] Open
Abstract
In recent years, speech perception research has benefited from low-frequency rhythm entrainment tracking of the speech envelope. However, speech perception is still controversial regarding the role of speech envelope and temporal fine structure, especially in Mandarin. This study aimed to discuss the dependence of Mandarin syllables and tones perception on the speech envelope and the temporal fine structure. We recorded the electroencephalogram (EEG) of the subjects under three acoustic conditions using the sound chimerism analysis, including (i) the original speech, (ii) the speech envelope and the sinusoidal modulation, and (iii) the fine structure of time and the modulation of the non-speech (white noise) sound envelope. We found that syllable perception mainly depended on the speech envelope, while tone perception depended on the temporal fine structure. The delta bands were prominent, and the parietal and prefrontal lobes were the main activated brain areas, regardless of whether syllable or tone perception was involved. Finally, we decoded the spatiotemporal features of Mandarin perception from the microstate sequence. The spatiotemporal feature sequence of the EEG caused by speech material was found to be specific, suggesting a new perspective for the subsequent auditory brain-computer interface. These results provided a new scheme for the coding strategy of new hearing aids for native Mandarin speakers. HIGHLIGHTS
Collapse
Affiliation(s)
- Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
- Haihe Laboratory of Brain-Computer Interaction and Human-Machine Integration, Tianjin 300392 China
| | - Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Qi Zheng
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
| | - Yubo Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China
- Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China
- Haihe Laboratory of Brain-Computer Interaction and Human-Machine Integration, Tianjin 300392 China
| |
Collapse
|
2
|
Thakkar T, Kan A, Litovsky RY. Lateralization of interaural time differences with mixed rates of stimulation in bilateral cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1912. [PMID: 37002065 PMCID: PMC10036141 DOI: 10.1121/10.0017603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 02/23/2023] [Accepted: 02/25/2023] [Indexed: 05/18/2023]
Abstract
While listeners with bilateral cochlear implants (BiCIs) are able to access information in both ears, they still struggle to perform well on spatial hearing tasks when compared to normal hearing listeners. This performance gap could be attributed to the high stimulation rates used for speech representation in clinical processors. Prior work has shown that spatial cues, such as interaural time differences (ITDs), are best conveyed at low rates. Further, BiCI listeners are sensitive to ITDs with a mixture of high and low rates. However, it remains unclear whether mixed-rate stimuli are perceived as unitary percepts and spatially mapped to intracranial locations. Here, electrical pulse trains were presented on five, interaurally pitch-matched electrode pairs using research processors, at either uniformly high rates, low rates, or mixed rates. Eight post-lingually deafened adults were tested on perceived intracranial lateralization of ITDs ranging from 50 to 1600 μs. Extent of lateralization depended on the location of low-rate stimulation along the electrode array: greatest in the low- and mixed-rate configurations, and smallest in the high-rate configuration. All but one listener perceived a unitary auditory object. These findings suggest that a mixed-rate processing strategy can result in good lateralization and convey a unitary auditory object with ITDs.
Collapse
Affiliation(s)
- Tanvi Thakkar
- Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| | - Alan Kan
- School of Engineering, Macquarie University, New South Wales 2109, Australia
| | - Ruth Y Litovsky
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| |
Collapse
|
3
|
Zhou H, Kan A, Yu G, Guo Z, Zheng N, Meng Q. Pitch perception with the temporal limits encoder for cochlear implants. IEEE Trans Neural Syst Rehabil Eng 2022; 30:2528-2539. [PMID: 36044501 DOI: 10.1109/tnsre.2022.3203079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The temporal-limits-encoder (TLE) strategy has been proposed to enhance the representation of temporal fine structure (TFS) in cochlear implants (CIs), which is vital for many aspects of sound perception but is typically discarded by most modern CI strategies. TLE works by computing an envelope modulator that is within the temporal pitch limits of CI electric hearing. This paper examines the TFS information encoded by TLE and evaluates the salience and usefulness of this information in CI users. Two experiments were conducted to compare pitch perception performance of TLE versus the widely-used Advanced Combinational Encoder (ACE) strategy. Experiment 1 investigated whether TLE processing improved pitch discrimination compared to ACE. Experiment 2 parametrically examined the effect of changing the lower frequency limit of the TLE modulator on pitch ranking. In both experiments, F0 difference limens were measured with synthetic harmonic complex tones using an adaptive procedure. Signal analysis of the outputs of TLE and ACE strategies showed that TLE introduces important temporal pitch cues that are not available with ACE. Results showed an improvement in pitch discrimination with TLE when the acoustic input had a lower F0 frequency. No significant effect of lower frequency limit was observed for pitch ranking, though a lower limit did tend to provide better outcomes. These results suggest that the envelope modulation introduced by TLE can improve pitch perception for CI listeners.
Collapse
|
4
|
Differential weighting of temporal envelope cues from the low-frequency region for Mandarin sentence recognition in noise. BMC Neurosci 2022; 23:35. [PMID: 35698039 PMCID: PMC9190152 DOI: 10.1186/s12868-022-00721-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Temporal envelope cues are conveyed by cochlear implants (CIs) to hearing loss patients to restore hearing. Although CIs could enable users to communicate in clear listening environments, noisy environments still pose a problem. To improve speech-processing strategies used in Chinese CIs, we explored the relative contributions made by the temporal envelope in various frequency regions, as relevant to Mandarin sentence recognition in noise. METHODS Original speech material from the Mandarin version of the Hearing in Noise Test (MHINT) was mixed with speech-shaped noise (SSN), sinusoidally amplitude-modulated speech-shaped noise (SAM SSN), and sinusoidally amplitude-modulated (SAM) white noise (4 Hz) at a + 5 dB signal-to-noise ratio, respectively. Envelope information of the noise-corrupted speech material was extracted from 30 contiguous bands that were allocated to five frequency regions. The intelligibility of the noise-corrupted speech material (temporal cues from one or two regions were removed) was measured to estimate the relative weights of temporal envelope cues from the five frequency regions. RESULTS In SSN, the mean weights of Regions 1-5 were 0.34, 0.19, 0.20, 0.16, and 0.11, respectively; in SAM SSN, the mean weights of Regions 1-5 were 0.34, 0.17, 0.24, 0.14, and 0.11, respectively; and in SAM white noise, the mean weights of Regions 1-5 were 0.46, 0.24, 0.22, 0.06, and 0.02, respectively. CONCLUSIONS The results suggest that the temporal envelope in the low-frequency region transmits the greatest amount of information in terms of Mandarin sentence recognition for three types of noise, which differed from the perception strategy employed in clear listening environments.
Collapse
|
5
|
Zheng Z, Li K, Feng G, Guo Y, Li Y, Xiao L, Liu C, He S, Zhang Z, Qian D, Feng Y. Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition. Front Neurosci 2021; 15:744959. [PMID: 34924928 PMCID: PMC8678109 DOI: 10.3389/fnins.2021.744959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 11/15/2021] [Indexed: 12/04/2022] Open
Abstract
Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin. Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach. Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43–84.82%, 76.27–95.24%, and 96.58%, respectively; for consonant recognition 35.49–63.77%, 67.75–78.87%, and 87.87%; for lexical tone recognition 60.80–97.15%, 73.16–96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14. Conclusion: Regions that contributed most for vowel recognition was Region 2 (502–1,022 Hz) that contains first formant (F1) information; Region 5 (3,856–7,562 Hz) contributed most to consonant recognition; Region 1 (80–502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.
Collapse
Affiliation(s)
- Zhong Zheng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Keyi Li
- Sydney Institute of Language and Commerce, Shanghai University, Shanghai, China
| | - Gang Feng
- Department of Graduate, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China
| | - Yang Guo
- Ear, Nose, and Throat Institute and Otorhinolaryngology Department, Eye and ENT Hospital of Fudan University, Shanghai, China
| | - Yinan Li
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Lili Xiao
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Chengqi Liu
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Shouhuan He
- Department of Otolaryngology, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Zhen Zhang
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Di Qian
- Department of Otolaryngology, Shenzhen Longhua District People's Hospital, Shenzhen, China
| | - Yanmei Feng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| |
Collapse
|
6
|
Kang Y, Zheng N, Meng Q. Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants. Front Med (Lausanne) 2021; 8:740123. [PMID: 34820392 PMCID: PMC8606413 DOI: 10.3389/fmed.2021.740123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 10/04/2021] [Indexed: 11/18/2022] Open
Abstract
The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI devices achieve good speech perception in quiet backgrounds, they usually suffer from poor speech perception in noisy environments. Therefore, noise suppression or speech enhancement (SE) is one of the most important technologies for CI. In this study, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE front ends to CI, and discuss how the hearing properties of the CI recipients could be utilized to optimize the DL-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments is presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge in CI research. Furthermore, speech reception threshold (SRT) in noise tests demonstrates that the intelligibility of the denoised speech can be significantly improved when the NN is trained with a loss function bias to more noise suppression than that with equal attention on noise residue and speech distortion.
Collapse
Affiliation(s)
- Yuyong Kang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Nengheng Zheng
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China.,Pengcheng Laboratory, Shenzhen, China
| | - Qinglin Meng
- Acoustics Laboratory, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| |
Collapse
|
7
|
Kan A, Meng Q. The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 29:265-273. [PMID: 33409339 PMCID: PMC7781292 DOI: 10.1109/taslp.2020.3039601] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The difference in binaural benefit between bilateral cochlear implant (CI) users and normal hearing (NH) listeners has typically been attributed to CI sound coding strategies not encoding the acoustic fine structure (FS) interaural time differences (ITD). The Temporal Limits Encoder (TLE) strategy is proposed as a potential way of improving binaural hearing benefits for CI users in noisy situations. TLE works by downward-transposition of mid-frequency band-limited channel information and can theoretically provide FS-ITD cues. In this work, the effect of choice of lower limit of the modulator in TLE was examined by measuring performance on a word recognition task and computing the magnitude of binaural benefit in bilateral CI users. Performance listening with the TLE strategy was compared with the commonly used Advanced Combinational Encoder (ACE) CI sound coding strategy. Results showed that setting the lower limit to ≥200 Hz maintained word recognition performance comparable to that of ACE. While most CI listeners exhibited a large binaural benefit (≥6 dB) in at least one of the conditions tested, there was no systematic relationship between the lower limit of the modulator and performance. These results indicate that the TLE strategy has potential to improve binaural hearing abilities in CI users but further work is needed to understand how binaural benefit can be maximized.
Collapse
Affiliation(s)
- Alan Kan
- Waisman Center, University of Wisconsin-Madison at the time this work was conducted. He is now with the School of Engineering, Macquarie University, NSW, Australia, 2109
| | - Qinglin Meng
- Acoustics Laboratory, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China, 510641
| |
Collapse
|
8
|
Zhou H, Wang N, Zheng N, Yu G, Meng Q. A New Approach for Noise Suppression in Cochlear Implants: A Single-Channel Noise Reduction Algorithm. Front Neurosci 2020; 14:301. [PMID: 32372902 PMCID: PMC7186595 DOI: 10.3389/fnins.2020.00301] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 03/16/2020] [Indexed: 12/11/2022] Open
Abstract
The cochlea “translates” the in-air vibrational acoustic “language” into the spikes of neural “language” that are then transmitted to the brain for auditory understanding and/or perception. During this intracochlear “translation” process, high resolution in time–frequency–intensity domains guarantees the high quality of the input neural information for the brain, which is vital for our outstanding hearing abilities. However, cochlear implants (CIs) have coarse artificial coding and interfaces, and CI users experience more challenges in common acoustic environments than their normal-hearing (NH) peers. Noise from sound sources that a listener has no interest in may be neglected by NH listeners, but they may distract a CI user. We discuss the CI noise-suppression techniques and introduce noise management for a new implant system. The monaural signal-to-noise ratio estimation-based noise suppression algorithm “eVoice,” which is incorporated in the processors of Nurotron® EnduroTM, was evaluated in two speech perception experiments. The results show that speech intelligibility in stationary speech-shaped noise can be significantly improved with eVoice. Similar results have been observed in other CI devices with single-channel noise reduction techniques. Specifically, the mean speech reception threshold decrease in the present study was 2.2 dB. The Nurotron society already has more than 10,000 users, and eVoice is a start for noise management in the new system. Future steps on non-stationary-noise suppression, spatial-source separation, bilateral hearing, microphone configuration, and environment specification are warranted. The existing evidence, including our research, suggests that noise-suppression techniques should be applied in CI systems. The artificial hearing of CI listeners requires more advanced signal processing techniques to reduce brain effort and increase intelligibility in noisy settings.
Collapse
Affiliation(s)
- Huali Zhou
- Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| | | | - Nengheng Zheng
- The Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Guangzheng Yu
- Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| | - Qinglin Meng
- Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| |
Collapse
|
9
|
Meng Q, Wang X, Cai Y, Kong F, Buck AN, Yu G, Zheng N, Schnupp JWH. Time-compression thresholds for Mandarin sentences in normal-hearing and cochlear implant listeners. Hear Res 2019; 374:58-68. [PMID: 30732921 DOI: 10.1016/j.heares.2019.01.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 01/13/2019] [Accepted: 01/16/2019] [Indexed: 11/19/2022]
Abstract
Faster speech may facilitate more efficient communication, but if speech is too fast it becomes unintelligible. The maximum speeds at which Mandarin words were intelligible in a sentence context were quantified for normal hearing (NH) and cochlear implant (CI) listeners by measuring time-compression thresholds (TCTs) in an adaptive staircase procedure. In Experiment 1, both original and CI-vocoded time-compressed speech from the MSP (Mandarin speech perception) and MHINT (Mandarin hearing in noise test) corpora was presented to 10 NH subjects over headphones. In Experiment 2, original time-compressed speech was presented to 10 CI subjects and another 10 NH subjects through a loudspeaker in a soundproof room. Sentences were time-compressed without changing their spectral profile, and were presented up to three times within a single trial. At the end of each trial, the number of correctly identified words in the sentence was scored. A 50%-word recognition threshold was tracked in the psychophysical procedure. The observed median TCTs were very similar for MSP and MHINT speech. For NH listeners, median TCTs were around 16.7 syllables/s for normal speech, and 11.8 and 8.6 syllables/s respectively for 8 and 4 channel tone-carrier vocoded speech. For CI listeners, TCTs were only around 6.8 syllables/s. The interquartile range of the TCTs within each cohort was smaller than 3.0 syllables/s. Speech reception thresholds in noise were also measured in Experiment 2, and were found to be strongly correlated with TCTs for CI listeners. In conclusion, the Mandarin sentence TCTs were around 16.7 syllables/s for most NH subjects, but rarely faster than 10.0 syllables/s for CI listeners, which quantitatively illustrated upper limits of fast speech information processing with CIs.
Collapse
Affiliation(s)
- Qinglin Meng
- Acoustics Lab of School of Physics and Optoelectronics and State Key Laboratory of Subtropical Building Science, South China University of Technology, China; Hearing Research Group, Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.
| | - Xianren Wang
- Department of Otorhinolaryngology, The First Affiliated Hospital, Sun Yat-Sen University and Institute of Otorhinolaryngology, Sun Yat-Sen University, Guangzhou, China
| | - Yuexin Cai
- Department of Otolaryngology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University and Department of Hearing and Speech Science, Xin Hua College of Sun Yat-Sen University, Guangzhou, China
| | - Fanhui Kong
- The Guangdong Key Laboratory of Intelligent Information Processing, College of Information Engineering, Shenzhen University, China
| | - Alexa Nadezhda Buck
- Hearing Research Group, Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Guangzheng Yu
- Acoustics Lab of School of Physics and Optoelectronics and State Key Laboratory of Subtropical Building Science, South China University of Technology, China.
| | - Nengheng Zheng
- The Guangdong Key Laboratory of Intelligent Information Processing, College of Information Engineering, Shenzhen University, China.
| | - Jan W H Schnupp
- Hearing Research Group, Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
10
|
Meng Q, Zheng N, Li X. Loudness Contour Can Influence Mandarin Tone Recognition: Vocoder Simulation and Cochlear Implants. IEEE Trans Neural Syst Rehabil Eng 2016; 25:641-649. [PMID: 27448366 DOI: 10.1109/tnsre.2016.2593489] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Lexical tone recognition with current cochlear implants (CI) remains unsatisfactory due to significantly degraded pitch-related acoustic cues, which dominate the tone recognition by normal-hearing (NH) listeners. Several secondary cues (e.g., amplitude contour, duration, and spectral envelope) that influence tone recognition in NH listeners and CI users have been studied. This work proposes a loudness contour manipulation algorithm, namely Loudness-Tone (L-Tone), to investigate the effects of loudness contour on Mandarin tone recognition and the effectiveness of using loudness cue to enhance tone recognition for CI users. With L-Tone, the intensity of sound samples is multiplied by gain values determined by instantaneous fundamental frequencies (F0s) and pre-defined gain- F0 mapping functions. Perceptual experiments were conducted with a four-channel noise-band vocoder simulation in NH listeners and with CI users. The results suggested that 1) loudness contour is a useful secondary cue for Mandarin tone recognition, especially when pitch cues are significantly degraded; 2) L-Tone can be used to improve Mandarin tone recognition in both simulated and actual CI-hearing without significant negative effect on vowel and consonant recognition. L-Tone is a promising algorithm for incorporation into real-time CI processing and off-line CI rehabilitation training software.
Collapse
|
11
|
Su Q, Galvin JJ, Zhang G, Li Y, Fu QJ. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients. Trends Hear 2016; 20:20/0/2331216516654022. [PMID: 27363714 PMCID: PMC4959306 DOI: 10.1177/2331216516654022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users.
Collapse
Affiliation(s)
- Qiaotong Su
- Department of Otolaryngology, Head and Neck Surgery, Beijing TongRen Hospital, Capital Medical University, Ministry of Education of China, Beijing, People's Republic of China
| | - John J Galvin
- Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Guoping Zhang
- Department of Otolaryngology, Head and Neck Surgery, Beijing TongRen Hospital, Capital Medical University, Ministry of Education of China, Beijing, People's Republic of China
| | - Yongxin Li
- Department of Otolaryngology, Head and Neck Surgery, Beijing TongRen Hospital, Capital Medical University, Ministry of Education of China, Beijing, People's Republic of China
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| |
Collapse
|