1
|
McMullin MA, Kumar R, Higgins NC, Gygi B, Elhilali M, Snyder JS. Preliminary Evidence for Global Properties in Human Listeners During Natural Auditory Scene Perception. Open Mind (Camb) 2024; 8:333-365. [PMID: 38571530 PMCID: PMC10990578 DOI: 10.1162/opmi_a_00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/10/2024] [Indexed: 04/05/2024] Open
Abstract
Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field's ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.
Collapse
Affiliation(s)
| | - Rohit Kumar
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan C. Higgins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, FL, USA
| | - Brian Gygi
- East Bay Institute for Research and Education, Martinez, CA, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV, USA
| |
Collapse
|
2
|
Englitz B, Akram S, Elhilali M, Shamma S. Decoding contextual influences on auditory perception from primary auditory cortex. bioRxiv 2023:2023.12.24.573229. [PMID: 38187523 PMCID: PMC10769425 DOI: 10.1101/2023.12.24.573229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Perception can be highly dependent on stimulus context, but whether and how sensory areas encode the context remains uncertain. We used an ambiguous auditory stimulus - a tritone pair - to investigate the neural activity associated with a preceding contextual stimulus that strongly influenced the tritone pair's perception: either as an ascending or a descending step in pitch. We recorded single-unit responses from a population of auditory cortical cells in awake ferrets listening to the tritone pairs preceded by the contextual stimulus. We find that the responses adapt locally to the contextual stimulus, consistent with human MEG recordings from the auditory cortex under the same conditions. Decoding the population responses demonstrates that pitch-change selective cells are able to predict well the context-sensitive percept of the tritone pairs. Conversely, decoding the distances between the pitch representations predicts the opposite of the percept. The various percepts can be readily captured and explained by a neural model of cortical activity based on populations of adapting, pitch and pitch-direction selective cells, aligned with the neurophysiological responses. Together, these decoding and model results suggest that contextual influences on perception may well be already encoded at the level of the primary sensory cortices, reflecting basic neural response properties commonly found in these areas.
Collapse
Affiliation(s)
- B Englitz
- Institute for Systems Research, University of Maryland, College Park, MD, USA
- Computational Neuroscience Lab, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, The Netherlands
| | - S Akram
- Research Data Science, Meta Platforms
| | - M Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - S Shamma
- Institute for Systems Research, University of Maryland, College Park, MD, USA
- Equipe Audition, Ecole Normale Supérieure, Paris, France
| |
Collapse
|
3
|
Kothinti SR, Elhilali M. Are acoustics enough? Semantic effects on auditory salience in natural scenes. Front Psychol 2023; 14:1276237. [PMID: 38098516 PMCID: PMC10720592 DOI: 10.3389/fpsyg.2023.1276237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/10/2023] [Indexed: 12/17/2023] Open
Abstract
Auditory salience is a fundamental property of a sound that allows it to grab a listener's attention regardless of their attentional state or behavioral goals. While previous research has shed light on acoustic factors influencing auditory salience, the semantic dimensions of this phenomenon have remained relatively unexplored owing both to the complexity of measuring salience in audition as well as limited focus on complex natural scenes. In this study, we examine the relationship between acoustic, contextual, and semantic attributes and their impact on the auditory salience of natural audio scenes using a dichotic listening paradigm. The experiments present acoustic scenes in forward and backward directions; the latter allows to diminish semantic effects, providing a counterpoint to the effects observed in forward scenes. The behavioral data collected from a crowd-sourced platform reveal a striking convergence in temporal salience maps for certain sound events, while marked disparities emerge in others. Our main hypothesis posits that differences in the perceptual salience of events are predominantly driven by semantic and contextual cues, particularly evident in those cases displaying substantial disparities between forward and backward presentations. Conversely, events exhibiting a high degree of alignment can largely be attributed to low-level acoustic attributes. To evaluate this hypothesis, we employ analytical techniques that combine rich low-level mappings from acoustic profiles with high-level embeddings extracted from a deep neural network. This integrated approach captures both acoustic and semantic attributes of acoustic scenes along with their temporal trajectories. The results demonstrate that perceptual salience is a careful interplay between low-level and high-level attributes that shapes which moments stand out in a natural soundscape. Furthermore, our findings underscore the important role of longer-term context as a critical component of auditory salience, enabling us to discern and adapt to temporal regularities within an acoustic scene. The experimental and model-based validation of semantic factors of salience paves the way for a complete understanding of auditory salience. Ultimately, the empirical and computational analyses have implications for developing large-scale models for auditory salience and audio analytics.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
4
|
Rennoll V, McLane I, Eisape A, Grant D, Hahn H, Elhilali M, West JE. Electrostatic Acoustic Sensor with an Impedance-Matched Diaphragm Characterized for Body Sound Monitoring. ACS Appl Bio Mater 2023; 6:3241-3256. [PMID: 37470762 PMCID: PMC10804910 DOI: 10.1021/acsabm.3c00359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Acoustic sensors are able to capture more incident energy if their acoustic impedance closely matches the acoustic impedance of the medium being probed, such as skin or wood. Controlling the acoustic impedance of polymers can be achieved by selecting materials with appropriate densities and stiffnesses as well as adding ceramic nanoparticles. This study follows a statistical methodology to examine the impact of polymer type and nanoparticle addition on the fabrication of acoustic sensors with desired acoustic impedances in the range of 1-2.2 MRayls. The proposed method using a design of experiments approach measures sensors with diaphragms of varying impedances when excited with acoustic vibrations traveling through wood, gelatin, and plastic. The sensor diaphragm is subsequently optimized for body sound monitoring, and the sensor's improved body sound coherence and airborne noise rejection are evaluated on an acoustic phantom in simulated noise environments and compared to electronic stethoscopes with onboard noise cancellation. The impedance-matched sensor demonstrates high sensitivity to body sounds, low sensitivity to airborne sound, a frequency response comparable to two state-of-the-art electronic stethoscopes, and the ability to capture lung and heart sounds from a real subject. Due to its small size, use of flexible materials, and rejection of airborne noise, the sensor provides an improved solution for wearable body sound monitoring, as well as sensing from other mediums with acoustic impedances in the range of 1-2.2 MRayls, such as water and wood.
Collapse
Affiliation(s)
- Valerie Rennoll
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Ian McLane
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Adebayo Eisape
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Drew Grant
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Helena Hahn
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - James E West
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| |
Collapse
|
5
|
Kala A, McCollum ED, Elhilali M. Reference free auscultation quality metric and its trends. Biomed Signal Process Control 2023; 85:104852. [PMID: 38274002 PMCID: PMC10809975 DOI: 10.1016/j.bspc.2023.104852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Stethoscopes are used ubiquitously in clinical settings to 'listen' to lung sounds. The use of these systems in a variety of healthcare environments (hospitals, urgent care rooms, private offices, community sites, mobile clinics, etc.) presents a range of challenges in terms of ambient noise and distortions that mask lung signals from being heard clearly or processed accurately using auscultation devices. With advances in technology, computerized techniques have been developed to automate analysis or access a digital rendering of lung sounds. However, most approaches are developed and tested in controlled environments and do not reflect real-world conditions where auscultation signals are typically acquired. Without a priori access to a recording of the ambient noise (for signal-to-noise estimation) or a reference signal that reflects the true undistorted lung sound, it is difficult to evaluate the quality of the lung signal and its potential clinical interpretability. The current study proposes an objective reference-free Auscultation Quality Metric (AQM) which incorporates low-level signal attributes with high-level representational embeddings mapped to a nonlinear quality space to provide an independent evaluation of the auscultation quality. This metric is carefully designed to solely judge the signal based on its integrity relative to external distortions and masking effects and not confuse an adventitious breathing pattern as low-quality auscultation. The current study explores the robustness of the proposed AQM method across multiple clinical categorizations and different distortion types. It also evaluates the temporal sensitivity of this approach and its translational impact for deployment in digital auscultation devices.
Collapse
Affiliation(s)
- Annapurna Kala
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Eric D. McCollum
- Global Program of Pediatric Respiratory Sciences, Eudowood Division of Pediatric Respiratory Sciences, Department of Pediatrics, Johns Hopkins School of Medicine, Baltimore, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
6
|
Kala A, Elhilali M. Constrained Synthetic Sampling for Augmentation of Crackle Lung Sounds. Annu Int Conf IEEE Eng Med Biol Soc 2023; 2023:1-5. [PMID: 38083624 PMCID: PMC10823588 DOI: 10.1109/embc40787.2023.10340579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Crackles are explosive breathing patterns caused by lung air sacs filling with fluid and act as an indicator for a plethora of pulmonary diseases. Clinical studies suggest a strong correlation between the presence of these adventitious auscultations and mortality rate, especially in pediatric patients, underscoring the importance of their pathological indication. While clinically important, crackles occur rarely in breathing signals relative to other phases and abnormalities of lung sounds, imposing a considerable class imbalance in developing learning methodologies for automated tracking and diagnosis of lung pathologies. The scarcity and clinical relevance of crackle sounds compel a need for exploring data augmentation techniques to enrich the space of crackle signals. Given their unique nature, the current study proposes a crackle-specific constrained synthetic sampling (CSS) augmentation that captures the geometric properties of crackles across different projected object spaces. We also outline a task-agnostic validation methodology that evaluates different augmentation techniques based on their goodness of fit relative to the space of original crackles. This evaluation considers both the separability of the manifold space generated by augmented data samples as well as a statistical distance space of the synthesized data relative to the original. Compared to a range of augmentation techniques, the proposed constrained-synthetic sampling of crackle sounds is shown to generate the most analogous samples relative to original crackle sounds, highlighting the importance of carefully considering the statistical constraints of the class under study.
Collapse
|
7
|
Bellur A, Thakkar K, Elhilali M. Explicit-memory multiresolution adaptive framework for speech and music separation. EURASIP J Audio Speech Music Process 2023; 2023:20. [PMID: 37181589 PMCID: PMC10169896 DOI: 10.1186/s13636-023-00286-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/21/2023] [Indexed: 05/16/2023]
Abstract
The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of the input and uses memory (or priors) to guide the selection of a target sound from the input mixture. Moreover, feedback mechanisms refine the memory constructs resulting in further improvement of selectivity of a particular sound object amidst dynamic backgrounds. The present study proposes a unified end-to-end computational framework that mimics these principles for sound source separation applied to both speech and music mixtures. While the problems of speech enhancement and music separation have often been tackled separately due to constraints and specificities of each signal domain, the current work posits that common principles for sound source separation are domain-agnostic. In the proposed scheme, parallel and hierarchical convolutional paths map input mixtures onto redundant but distributed higher-dimensional subspaces and utilize the concept of temporal coherence to gate the selection of embeddings belonging to a target stream abstracted in memory. These explicit memories are further refined through self-feedback from incoming observations in order to improve the system's selectivity when faced with unknown backgrounds. The model yields stable outcomes of source separation for both speech and music mixtures and demonstrates benefits of explicit memory as a powerful representation of priors that guide information selection from complex inputs.
Collapse
Affiliation(s)
- Ashwin Bellur
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Karan Thakkar
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Mounya Elhilali
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
8
|
Higgins NC, Scurry AN, Jiang F, Little DF, Alain C, Elhilali M, Snyder JS. Adaptation in the sensory cortex drives bistable switching during auditory stream segregation. Neurosci Conscious 2023; 2023:niac019. [PMID: 36751309 PMCID: PMC9899071 DOI: 10.1093/nc/niac019] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/17/2022] [Accepted: 12/26/2022] [Indexed: 02/06/2023] Open
Abstract
Current theories of perception emphasize the role of neural adaptation, inhibitory competition, and noise as key components that lead to switches in perception. Supporting evidence comes from neurophysiological findings of specific neural signatures in modality-specific and supramodal brain areas that appear to be critical to switches in perception. We used functional magnetic resonance imaging to study brain activity around the time of switches in perception while participants listened to a bistable auditory stream segregation stimulus, which can be heard as one integrated stream of tones or two segregated streams of tones. The auditory thalamus showed more activity around the time of a switch from segregated to integrated compared to time periods of stable perception of integrated; in contrast, the rostral anterior cingulate cortex and the inferior parietal lobule showed more activity around the time of a switch from integrated to segregated compared to time periods of stable perception of segregated streams, consistent with prior findings of asymmetries in brain activity depending on the switch direction. In sound-responsive areas in the auditory cortex, neural activity increased in strength preceding switches in perception and declined in strength over time following switches in perception. Such dynamics in the auditory cortex are consistent with the role of adaptation proposed by computational models of visual and auditory bistable switching, whereby the strength of neural activity decreases following a switch in perception, which eventually destabilizes the current percept enough to lead to a switch to an alternative percept.
Collapse
Affiliation(s)
- Nathan C Higgins
- Department of Communication Sciences and Disorders, University of South Florida, 4202 E. Fowler Avenue, PCD1017, Tampa, FL 33620, USA
| | - Alexandra N Scurry
- Department of Psychology, University of Nevada, 1664 N. Virginia Street Mail Stop 0296, Reno, NV 89557, USA
| | - Fang Jiang
- Department of Psychology, University of Nevada, 1664 N. Virginia Street Mail Stop 0296, Reno, NV 89557, USA
| | - David F Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Claude Alain
- Rotman Research Institute, Baycrest Health Sciences, 3560 Bathurst Street, Toronto, ON M6A 2E1, Canada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Joel S Snyder
- Department of Psychology, University of Nevada, 4505 Maryland Parkway Mail Stop 5030, Las Vegas, NV 89154, USA
| |
Collapse
|
9
|
Rennoll V, McLane I, Elhilali M, West JE. Optimized Acoustic Phantom Design for Characterizing Body Sound Sensors. Sensors (Basel) 2022; 22:9086. [PMID: 36501787 PMCID: PMC9735779 DOI: 10.3390/s22239086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/19/2022] [Accepted: 11/19/2022] [Indexed: 06/17/2023]
Abstract
Many commercial and prototype devices are available for capturing body sounds that provide important information on the health of the lungs and heart; however, a standardized method to characterize and compare these devices is not agreed upon. Acoustic phantoms are commonly used because they generate repeatable sounds that couple to devices using a material layer that mimics the characteristics of skin. While multiple acoustic phantoms have been presented in literature, it is unclear how design elements, such as the driver type and coupling layer, impact the acoustical characteristics of the phantom and, therefore, the device being measured. Here, a design of experiments approach is used to compare the frequency responses of various phantom constructions. An acoustic phantom that uses a loudspeaker to generate sound and excite a gelatin layer supported by a grid is determined to have a flatter and more uniform frequency response than other possible designs with a sound exciter and plate support. When measured on an optimal acoustic phantom, three devices are shown to have more consistent measurements with added weight and differing positions compared to a non-optimal phantom. Overall, the statistical models developed here provide greater insight into acoustic phantom design for improved device characterization.
Collapse
|
10
|
Kala A, McCollum ED, Elhilali M. Implications of clinical variability on computer-aided lung auscultation classification. Annu Int Conf IEEE Eng Med Biol Soc 2022; 2022:4421-4425. [PMID: 36086501 DOI: 10.1109/embc48229.2022.9871393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Thanks to recent advances in digital stethoscopes and rapid adoption of deep learning techniques, there has been tremendous progress in the field of Computerized Auscultation Analysis (CAA). Despite these promising leaps, the deploy-ment of these technologies in real-world applications remains limited due to inherent challenges with properly interpreting clinical data, particularly auscultations. One of the limiting factors is the inherent ambiguity that comes with variability in clinical opinion, even from highly trained experts. The lack of unanimity in expert opinions is often ignored in developing machine learning techniques to automatically screen normal from abnormal lung signals, with most algorithms being developed and tested on highly curated datasets. To better understand the potential pitfalls this selective analysis could cause in deployment, the current work explores the impact of clinical opinion variability on algorithms to detect adventitious patterns in lung sounds when trained on gold-standard data. The study shows that uncertainty in clinical opinion introduces far more variability and performance drop than dissidence in expert judgments. The study also explores the feasibility of automatically flagging auscultation signals based on their estimated uncertainty, thereby recommending further reassessment as well as improving computer-aided analysis.
Collapse
|
11
|
Park S, Han DK, Elhilali M. Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures. IEEE Trans Multimedia 2022; 25:4573-4585. [PMID: 37928617 PMCID: PMC10621403 DOI: 10.1109/tmm.2022.3178591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural networks, there has been tremendous improvement in the performance of sound event detection systems, although at the expense of costly data collection and labeling efforts. In fact, current state-of-the-art methods employ supervised training methods that leverage large amounts of data samples and corresponding labels in order to facilitate identification of sound category and time stamps of events. As an alternative, the current study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training. Additionally, this paper explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance. The proposed approach is evaluated on sound event detection task for the DCASE2020 challenge. The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.
Collapse
Affiliation(s)
- Sangwook Park
- Department of Electronic Engineering, Gangneung-Wonju National University, Gangneung, 25457 South Korea
| | - David K Han
- Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, 19104 USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering and jointly with the Department of Psychology and Brain Sciences, Johns Hopkins University, Baltimore, MD, 21218 USA
| |
Collapse
|
12
|
Park DE, Watson NL, Focht C, Feikin D, Hammitt LL, Brooks WA, Howie SRC, Kotloff KL, Levine OS, Madhi SA, Murdoch DR, O'Brien KL, Scott JAG, Thea DM, Amorninthapichet T, Awori J, Bunthi C, Ebruke B, Elhilali M, Higdon M, Hossain L, Jahan Y, Moore DP, Mulindwa J, Mwananyanda L, Naorat S, Prosperi C, Thamthitiwat S, Verwey C, Jablonski KA, Power MC, Young HA, Deloria Knoll M, McCollum ED. Digitally recorded and remotely classified lung auscultation compared with conventional stethoscope classifications among children aged 1-59 months enrolled in the Pneumonia Etiology Research for Child Health (PERCH) case-control study. BMJ Open Respir Res 2022; 9:9/1/e001144. [PMID: 35577452 PMCID: PMC9115042 DOI: 10.1136/bmjresp-2021-001144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 04/28/2022] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Diagnosis of pneumonia remains challenging. Digitally recorded and remote human classified lung sounds may offer benefits beyond conventional auscultation, but it is unclear whether classifications differ between the two approaches. We evaluated concordance between digital and conventional auscultation. METHODS We collected digitally recorded lung sounds, conventional auscultation classifications and clinical measures and samples from children with pneumonia (cases) in low-income and middle-income countries. Physicians remotely classified recordings as crackles, wheeze or uninterpretable. Conventional and digital auscultation concordance was evaluated among 383 pneumonia cases with concurrently (within 2 hours) collected conventional and digital auscultation classifications using prevalence-adjusted bias-adjusted kappa (PABAK). Using an expanded set of 737 cases that also incorporated the non-concurrently collected assessments, we evaluated whether associations between auscultation classifications and clinical or aetiological findings differed between conventional or digital auscultation using χ2 tests and logistic regression adjusted for age, sex and site. RESULTS Conventional and digital auscultation concordance was moderate for classifying crackles and/or wheeze versus neither crackles nor wheeze (PABAK=0.50), and fair for crackles-only versus not crackles-only (PABAK=0.30) and any wheeze versus no wheeze (PABAK=0.27). Crackles were more common on conventional auscultation, whereas wheeze was more frequent on digital auscultation. Compared with neither crackles nor wheeze, crackles-only on both conventional and digital auscultation was associated with abnormal chest radiographs (adjusted OR (aOR)=1.53, 95% CI 0.99 to 2.36; aOR=2.09, 95% CI 1.19 to 3.68, respectively); any wheeze was inversely associated with C-reactive protein >40 mg/L using conventional auscultation (aOR=0.50, 95% CI 0.27 to 0.92) and with very severe pneumonia using digital auscultation (aOR=0.67, 95% CI 0.46 to 0.97). Crackles-only on digital auscultation was associated with mortality compared with any wheeze (aOR=2.70, 95% CI 1.12 to 6.25). CONCLUSIONS Conventional auscultation and remotely-classified digital auscultation displayed moderate concordance for presence/absence of wheeze and crackles among cases. Conventional and digital auscultation may provide different classification patterns, but wheeze was associated with decreased clinical severity on both.
Collapse
Affiliation(s)
- Daniel E Park
- Department of Environmental and Occupational Health, The George Washington University, Washington, District of Columbia, USA
| | | | | | - Daniel Feikin
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA
| | - Laura L Hammitt
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA,Kenya Medical Research Institute - Wellcome Trust Research Programme, Kilifi, Kenya
| | - W Abdullah Brooks
- International Centre for Diarrhoeal Disease Research Bangladesh, Dhaka and Matlab, Bangladesh,Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Stephen R C Howie
- Medical Research Council Unit, Basse, Gambia,Department of Paediatrics, The University of Auckland, Auckland, New Zealand
| | - Karen L Kotloff
- Department of Pediatrics, University of Maryland Center for Vaccine Development, Baltimore, Maryland, USA
| | - Orin S Levine
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA,Bill & Melinda Gates Foundation, Seattle, Washington, USA
| | - Shabir A Madhi
- South African Medical Research Council Vaccines and Infectious Diseases Analytics Research Unit, University of the Witwatersrand, Johannesburg, Gauteng, South Africa,Department of Science and Innovation/National Research Foundation: Vaccine Preventable Diseases Unit, University of the Witwatersrand, Johannesburg, Gauteng, South Africa
| | - David R Murdoch
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand,Microbiology Unit, Canterbury Health Laboratories, Christchurch, New Zealand
| | - Katherine L O'Brien
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA
| | - J Anthony G Scott
- Kenya Medical Research Institute - Wellcome Trust Research Programme, Kilifi, Kenya,Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK
| | - Donald M Thea
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts, USA
| | | | - Juliet Awori
- Kenya Medical Research Institute - Wellcome Trust Research Programme, Kilifi, Kenya
| | - Charatdao Bunthi
- Division of Global Health Protection, Thailand Ministry of Public Health – US CDC Collaboration, Royal Thai Government Ministry of Public Health, Bangkok, Thailand
| | - Bernard Ebruke
- Medical Research Council Unit, Basse, Gambia,International Foundation Against Infectious Disease in Nigeria, Abuja, Nigeria
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Melissa Higdon
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA
| | - Lokman Hossain
- International Centre for Diarrhoeal Disease Research Bangladesh, Dhaka and Matlab, Bangladesh
| | - Yasmin Jahan
- International Centre for Diarrhoeal Disease Research Bangladesh, Dhaka and Matlab, Bangladesh
| | - David P Moore
- South African Medical Research Council Vaccines and Infectious Diseases Analytics Research Unit, University of the Witwatersrand, Johannesburg, South Africa,Department of Paediatrics and Child Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Justin Mulindwa
- Department of Paediatrics and Child Health, University Teaching Hospital, Lusaka, Zambia
| | - Lawrence Mwananyanda
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts, USA,Right to Care - Zambia, Lusaka, Zambia
| | | | - Christine Prosperi
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA
| | - Somsak Thamthitiwat
- Division of Global Health Protection, Thailand Ministry of Public Health – US CDC Collaboration, Royal Thai Government Ministry of Public Health, Nonthaburi, Thailand
| | - Charl Verwey
- South African Medical Research Council Vaccines and Infectious Diseases Analytics Research Unit, University of the Witwatersrand, Johannesburg, Gauteng, South Africa,Department of Paediatrics and Child Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | | - Melinda C Power
- Department of Epidemiology, The George Washington University, Washington, District of Columbia, USA
| | - Heather A Young
- Department of Epidemiology, The George Washington University, Washington, District of Columbia, USA
| | - Maria Deloria Knoll
- Department of International Health, Johns Hopkins University International Vaccine Access Center, Baltimore, Maryland, USA
| | - Eric D McCollum
- Global Program in Respiratory Sciences, Eudowood Division of Pediatric Respiratory Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, USA,Department of International Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| |
Collapse
|
13
|
Allen KM, Salles A, Park S, Elhilali M, Moss CF. Effect of background clutter on neural discrimination in the bat auditory midbrain. J Neurophysiol 2021; 126:1772-1782. [PMID: 34669503 PMCID: PMC8794058 DOI: 10.1152/jn.00109.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 09/22/2021] [Accepted: 10/12/2021] [Indexed: 11/22/2022] Open
Abstract
The discrimination of complex sounds is a fundamental function of the auditory system. This operation must be robust in the presence of noise and acoustic clutter. Echolocating bats are auditory specialists that discriminate sonar objects in acoustically complex environments. Bats produce brief signals, interrupted by periods of silence, rendering echo snapshots of sonar objects. Sonar object discrimination requires that bats process spatially and temporally overlapping echoes to make split-second decisions. The mechanisms that enable this discrimination are not well understood, particularly in complex environments. We explored the neural underpinnings of sonar object discrimination in the presence of acoustic scattering caused by physical clutter. We performed electrophysiological recordings in the inferior colliculus of awake big brown bats, to broadcasts of prerecorded echoes from physical objects. We acquired single unit responses to echoes and discovered a subpopulation of IC neurons that encode acoustic features that can be used to discriminate between sonar objects. We further investigated the effects of environmental clutter on this population's encoding of acoustic features. We discovered that the effect of background clutter on sonar object discrimination is highly variable and depends on object properties and target-clutter spatiotemporal separation. In many conditions, clutter impaired discrimination of sonar objects. However, in some instances clutter enhanced acoustic features of echo returns, enabling higher levels of discrimination. This finding suggests that environmental clutter may augment acoustic cues used for sonar target discrimination and provides further evidence in a growing body of literature that noise is not universally detrimental to sensory encoding.NEW & NOTEWORTHY Bats are powerful animal models for investigating the encoding of auditory objects under acoustically challenging conditions. Although past work has considered the effect of acoustic clutter on sonar target detection, less is known about target discrimination in clutter. Our work shows that the neural encoding of auditory objects was affected by clutter in a distance-dependent manner. These findings advance the knowledge on auditory object detection and discrimination and noise-dependent stimulus enhancement.
Collapse
Affiliation(s)
- Kathryne M Allen
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Angeles Salles
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Sangwook Park
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Cynthia F Moss
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland
- Department of Neuroscience, Johns Hopkins University, Baltimore, Maryland
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
14
|
Higgins NC, Monjaras AG, Yerkes BD, Little DF, Nave-Blodgett JE, Elhilali M, Snyder JS. Resetting of Auditory and Visual Segregation Occurs After Transient Stimuli of the Same Modality. Front Psychol 2021; 12:720131. [PMID: 34621219 PMCID: PMC8490814 DOI: 10.3389/fpsyg.2021.720131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 08/16/2021] [Indexed: 12/03/2022] Open
Abstract
In the presence of a continually changing sensory environment, maintaining stable but flexible awareness is paramount, and requires continual organization of information. Determining which stimulus features belong together, and which are separate is therefore one of the primary tasks of the sensory systems. Unknown is whether there is a global or sensory-specific mechanism that regulates the final perceptual outcome of this streaming process. To test the extent of modality independence in perceptual control, an auditory streaming experiment, and a visual moving-plaid experiment were performed. Both were designed to evoke alternating perception of an integrated or segregated percept. In both experiments, transient auditory and visual distractor stimuli were presented in separate blocks, such that the distractors did not overlap in frequency or space with the streaming or plaid stimuli, respectively, thus preventing peripheral interference. When a distractor was presented in the opposite modality as the bistable stimulus (visual distractors during auditory streaming or auditory distractors during visual streaming), the probability of percept switching was not significantly different than when no distractor was presented. Conversely, significant differences in switch probability were observed following within-modality distractors, but only when the pre-distractor percept was segregated. Due to the modality-specificity of the distractor-induced resetting, the results suggest that conscious perception is at least partially controlled by modality-specific processing. The fact that the distractors did not have peripheral overlap with the bistable stimuli indicates that the perceptual reset is due to interference at a locus in which stimuli of different frequencies and spatial locations are integrated.
Collapse
Affiliation(s)
- Nathan C Higgins
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| | - Ambar G Monjaras
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| | - Breanne D Yerkes
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| | - David F Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | | | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Joel S Snyder
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| |
Collapse
|
15
|
Abstract
Salience is the quality of a sensory signal that attracts involuntary attention in humans. While it primarily reflects conspicuous physical attributes of a scene, our understanding of processes underlying what makes a certain object or event salient remains limited. In the vision literature, experimental results, theoretical accounts, and large amounts of eye-tracking data using rich stimuli have shed light on some of the underpinnings of visual salience in the brain. In contrast, studies of auditory salience have lagged behind due to limitations in both experimental designs and stimulus datasets used to probe the question of salience in complex everyday soundscapes. In this work, we deploy an online platform to study salience using a dichotic listening paradigm with natural auditory stimuli. The study validates crowd-sourcing as a reliable platform to collect behavioral responses to auditory salience by comparing experimental outcomes to findings acquired in a controlled laboratory setting. A model-based analysis demonstrates the benefits of extending behavioral measures of salience to broader selection of auditory scenes and larger pools of subjects. Overall, this effort extends our current knowledge of auditory salience in everyday soundscapes and highlights the limitations of low-level acoustic attributes in capturing the richness of natural soundscapes.
Collapse
Affiliation(s)
- Sandeep Reddy Kothinti
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Nicholas Huang
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
16
|
Abstract
The human brain extracts statistical regularities embedded in real-world scenes to sift through the complexity stemming from changing dynamics and entwined uncertainty along multiple perceptual dimensions (e.g., pitch, timbre, location). While there is evidence that sensory dynamics along different auditory dimensions are tracked independently by separate cortical networks, how these statistics are integrated to give rise to unified objects remains unknown, particularly in dynamic scenes that lack conspicuous coupling between features. Using tone sequences with stochastic regularities along spectral and spatial dimensions, this study examines behavioral and electrophysiological responses from human listeners (male and female) to changing statistics in auditory sequences and uses a computational model of predictive Bayesian inference to formulate multiple hypotheses for statistical integration across features. Neural responses reveal multiplexed brain responses reflecting both local statistics along individual features in frontocentral networks, together with global (object-level) processing in centroparietal networks. Independent tracking of local surprisal along each acoustic feature reveals linear modulation of neural responses, while global melody-level statistics follow a nonlinear integration of statistical beliefs across features to guide perception. Near identical results are obtained in separate experiments along spectral and spatial acoustic dimensions, suggesting a common mechanism for statistical inference in the brain. Potential variations in statistical integration strategies and memory deployment shed light on individual variability between listeners in terms of behavioral efficacy and fidelity of neural encoding of stochastic change in acoustic sequences.SIGNIFICANCE STATEMENT The world around us is complex and ever changing: in everyday listening, sound sources evolve along multiple dimensions, such as pitch, timbre, and spatial location, and they exhibit emergent statistical properties that change over time. In the face of this complexity, the brain builds an internal representation of the external world by collecting statistics from the sensory input along multiple dimensions. Using a Bayesian predictive inference model, this work considers alternative hypotheses for how statistics are combined across sensory dimensions. Behavioral and neural responses from human listeners show the brain multiplexes two representations, where local statistics along each feature linearly affect neural responses, and global statistics nonlinearly combine statistical beliefs across dimensions to shape perception of stochastic auditory sequences.
Collapse
|
17
|
McLane I, Emmanouilidou D, West JE, Elhilali M. Design and Comparative Performance of a Robust Lung Auscultation System for Noisy Clinical Settings. IEEE J Biomed Health Inform 2021; 25:2583-2594. [PMID: 33534721 PMCID: PMC8374873 DOI: 10.1109/jbhi.2021.3056916] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Chest auscultation is a widely used clinical tool for respiratory disease detection. The stethoscope has undergone a number of transformative enhancements since its invention, including the introduction of electronic systems in the last two decades. Nevertheless, stethoscopes remain riddled with a number of issues that limit their signal quality and diagnostic capability, rendering both traditional and electronic stethoscopes unusable in noisy or non-traditional environments (e.g., emergency rooms, rural clinics, ambulatory vehicles). This work outlines the design and validation of an advanced electronic stethoscope that dramatically reduces external noise contamination through hardware redesign and real-time, dynamic signal processing. The proposed system takes advantage of an acoustic sensor array, an external facing microphone, and on-board processing to perform adaptive noise suppression. The proposed system is objectively compared to six commercially-available acoustic and electronic devices in varying levels of simulated noisy clinical settings and quantified using two metrics that reflect perceptual audibility and statistical similarity, normalized covariance measure (NCM) and magnitude squared coherence (MSC). The analyses highlight the major limitations of current stethoscopes and the significant improvements the proposed system makes in challenging settings by minimizing both distortion of lung sounds and contamination by ambient noise.
Collapse
|
18
|
Rennoll V, McLane I, Emmanouilidou D, West J, Elhilali M. Electronic Stethoscope Filtering Mimics the Perceived Sound Characteristics of Acoustic Stethoscope. IEEE J Biomed Health Inform 2021; 25:1542-1549. [PMID: 32870803 PMCID: PMC7917155 DOI: 10.1109/jbhi.2020.3020494] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Electronic stethoscopes offer several advantages over conventional acoustic stethoscopes, including noise reduction, increased amplification, and ability to store and transmit sounds. However, the acoustical characteristics of electronic and acoustic stethoscopes can differ significantly, introducing a barrier for clinicians to transition to electronic stethoscopes. This work proposes a method to process lung sounds recorded by an electronic stethoscope, such that the sounds are perceived to have been captured by an acoustic stethoscope. The proposed method calculates an electronic-to-acoustic stethoscope filter by measuring the difference between the average frequency responses of an acoustic and an electronic stethoscope to multiple lung sounds. To validate the method, a change detection experiment was conducted with 51 medical professionals to compare filtered electronic, unfiltered electronic, and acoustic stethoscope lung sounds. Participants were asked to detect when transitions occurred in sounds comprising several sections of the three types of recordings. Transitions between the filtered electronic and acoustic stethoscope sections were detected, on average, by chance (sensitivity index equal to zero) and also detected significantly less than transitions between the unfiltered electronic and acoustic stethoscope sections ( ), demonstrating the effectiveness of the method to filter electronic stethoscopes to mimic an acoustic stethoscope. This processing could incentivize clinicians to adopt electronic stethoscopes by providing a means to shift between the sound characteristics of acoustic and electronic stethoscopes in a single device, allowing for a faster transition to new technology and greater appreciation for the electronic sound quality.
Collapse
|
19
|
Skerritt-Davis B, Elhilali M. Computational framework for investigating predictive processing in auditory perception. J Neurosci Methods 2021; 360:109177. [PMID: 33839191 DOI: 10.1016/j.jneumeth.2021.109177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 03/07/2021] [Accepted: 03/25/2021] [Indexed: 11/24/2022]
Abstract
BACKGROUND The brain tracks sound sources as they evolve in time, collecting contextual information to predict future sensory inputs. Previous work in predictive coding typically focuses on the perception of predictable stimuli, leaving the implementation of these same neural processes in more complex, real-world environments containing randomness and uncertainty up for debate. NEW METHOD To facilitate investigation into the perception of less tightly-controlled listening scenarios, we present a computational model as a tool to ask targeted questions about the underlying predictive processes that connect complex sensory inputs to listener behavior and neural responses. In the modeling framework, observed sound features (e.g. pitch) are tracked sequentially using Bayesian inference. Sufficient statistics are inferred from past observations at multiple time scales and used to make predictions about future observation while tracking the statistical structure of the sensory input. RESULTS Facets of the model are discussed in terms of their application to perceptual research, and examples taken from real-world audio demonstrate the model's flexibility to capture a variety of statistical structures along various perceptual dimensions. COMPARISON WITH EXISTING METHODS Previous models are often targeted toward interpreting a particular experimental paradigm (e.g., oddball paradigm), perceptual dimension (e.g., pitch processing), or task (e.g., speech segregation), thus limiting their ability to generalize to other domains. The presented model is designed as a flexible and practical tool for broad application. CONCLUSION The model is presented as a general framework for generating new hypotheses and guiding investigation into the neural processes underlying predictive coding of complex scenes.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Johns Hopkins University, 3400 N Charles St, Baltimore, MD, USA.
| |
Collapse
|
20
|
McCollum ED, Park DE, Watson NL, Fancourt NSS, Focht C, Baggett HC, Brooks WA, Howie SRC, Kotloff KL, Levine OS, Madhi SA, Murdoch DR, Scott JAG, Thea DM, Awori JO, Chipeta J, Chuananon S, DeLuca AN, Driscoll AJ, Ebruke BE, Elhilali M, Emmanouilidou D, Githua LP, Higdon MM, Hossain L, Jahan Y, Karron RA, Kyalo J, Moore DP, Mulindwa JM, Naorat S, Prosperi C, Verwey C, West JE, Knoll MD, O'Brien KL, Feikin DR, Hammitt LL. Digital auscultation in PERCH: Associations with chest radiography and pneumonia mortality in children. Pediatr Pulmonol 2020; 55:3197-3208. [PMID: 32852888 PMCID: PMC7692889 DOI: 10.1002/ppul.25046] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 08/15/2020] [Accepted: 08/17/2020] [Indexed: 01/16/2023]
Abstract
BACKGROUND Whether digitally recorded lung sounds are associated with radiographic pneumonia or clinical outcomes among children in low-income and middle-income countries is unknown. We sought to address these knowledge gaps. METHODS We enrolled 1 to 59monthold children hospitalized with pneumonia at eight African and Asian Pneumonia Etiology Research for Child Health sites in six countries, recorded digital stethoscope lung sounds, obtained chest radiographs, and collected clinical outcomes. Recordings were processed and classified into binary categories positive or negative for adventitial lung sounds. Listening and reading panels classified recordings and radiographs. Recording classification associations with chest radiographs with World Health Organization (WHO)-defined primary endpoint pneumonia (radiographic pneumonia) or mortality were evaluated. We also examined case fatality among risk strata. RESULTS Among children without WHO danger signs, wheezing (without crackles) had a lower adjusted odds ratio (aOR) for radiographic pneumonia (0.35, 95% confidence interval (CI): 0.15, 0.82), compared to children with normal recordings. Neither crackle only (no wheeze) (aOR: 2.13, 95% CI: 0.91, 4.96) or any wheeze (with or without crackle) (aOR: 0.63, 95% CI: 0.34, 1.15) were associated with radiographic pneumonia. Among children with WHO danger signs no lung recording classification was independently associated with radiographic pneumonia, although trends toward greater odds of radiographic pneumonia were observed among children classified with crackle only (no wheeze) or any wheeze (with or without crackle). Among children without WHO danger signs, those with recorded wheezing had a lower case fatality than those without wheezing (3.8% vs. 9.1%, p = .03). CONCLUSIONS Among lower risk children without WHO danger signs digitally recorded wheezing is associated with a lower odds for radiographic pneumonia and with lower mortality. Although further research is needed, these data indicate that with further development digital auscultation may eventually contribute to child pneumonia care.
Collapse
Affiliation(s)
- Eric D McCollum
- Global Program in Respiratory Sciences, Eudowood Division of Pediatric Respiratory Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, USA.,Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Daniel E Park
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Department of Epidemiology and Biostatistics, Milken Institute School of Public Health, George Washington University, Washington, District of Columbia, USA
| | | | - Nicholas S S Fancourt
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | | | - Henry C Baggett
- Global Disease Detection Center, US Centers for Disease Control and Prevention Collaboration, Thailand Ministry of Public Health, Mueang Nonthaburi, Nonthaburi, Thailand.,Division of Global Health Protection, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - W Abdullah Brooks
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka and Matlab, Bangladesh
| | - Stephen R C Howie
- Medical Research Council Unit, Basse, The Gambia.,Department of Paediatrics, University of Auckland, Auckland, New Zealand.,Centre for International Health, University of Otago, Dunedin, New Zealand
| | - Karen L Kotloff
- Division of Infectious Disease and Tropical Pediatrics, Department of Pediatrics, Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, Maryland
| | - Orin S Levine
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Bill & Melinda Gates Foundation, Seattle, Washington, USA
| | - Shabir A Madhi
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa.,Department of Science and Technology/National Research Foundation: Vaccine Preventable Diseases Unite, University of the Witwatersrand, Johannesburg, South Africa
| | - David R Murdoch
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand.,Microbiology Unit, Canterbury Health Laboratories, Christchurch, New Zealand
| | - J Anthony G Scott
- Kenya Medical Research Institute-Wellcome Trust Research Programme, Kilifi, Kenya.,Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK
| | - Donald M Thea
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Juliet O Awori
- Kenya Medical Research Institute-Wellcome Trust Research Programme, Kilifi, Kenya
| | - James Chipeta
- Department of Paediatrics and Child Health, University Teaching Hospital, Lusaka, Zambia
| | - Somchai Chuananon
- Global Disease Detection Center, US Centers for Disease Control and Prevention Collaboration, Thailand Ministry of Public Health, Mueang Nonthaburi, Nonthaburi, Thailand
| | - Andrea N DeLuca
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Amanda J Driscoll
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Bernard E Ebruke
- Medical Research Council Unit, Basse, The Gambia.,International Foundation Against Infectious Disease in Nigeria, Abuja, Nigeria
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Dimitra Emmanouilidou
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | | | - Melissa M Higdon
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Lokman Hossain
- International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka and Matlab, Bangladesh
| | - Yasmin Jahan
- International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka and Matlab, Bangladesh
| | - Ruth A Karron
- Department of International Health, Center for Immunization Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Joshua Kyalo
- Kenya Medical Research Institute-Wellcome Trust Research Programme, Kilifi, Kenya
| | - David P Moore
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa.,Department of Paediatrics, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Justin M Mulindwa
- Department of Paediatrics and Child Health, University Teaching Hospital, Lusaka, Zambia
| | - Sathapana Naorat
- Global Disease Detection Center, US Centers for Disease Control and Prevention Collaboration, Thailand Ministry of Public Health, Mueang Nonthaburi, Nonthaburi, Thailand
| | - Christine Prosperi
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Charl Verwey
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa.,Department of Paediatrics, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - James E West
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Maria Deloria Knoll
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Katherine L O'Brien
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Daniel R Feikin
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Laura L Hammitt
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Kenya Medical Research Institute-Wellcome Trust Research Programme, Kilifi, Kenya
| |
Collapse
|
21
|
Graceffo S, Husain A, Ahmed S, McCollum ED, Elhilali M. Validation of Auscultation Technologies using Objective and Clinical Comparisons. Annu Int Conf IEEE Eng Med Biol Soc 2020; 2020:992-997. [PMID: 33018152 DOI: 10.1109/embc44109.2020.9176456] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Technology is rapidly changing the health care industry. As new systems and devices are developed, validating their effectiveness in practice is not trivial, yet it is essential for assessing their technical and clinical capabilities. Digital auscultations are new technologies that are changing the landscape of diagnosis of lung and heart sounds and revamping the centuries old original design of the stethoscope. Here, we propose a methodology to validate a newly developed digital stethoscope, and compare its effectiveness against a market-accepted device, using a combination of signal properties and clinical assessments. Data from 100 pediatric patients is collected using both devices side by side in two clinical sites. Using the proposed methodology, we objectively compare the technical performance of the two devices, and identify clinical situations where performance of the two devices differs. The proposed methodology offers a general approach to verify a new digital auscultation device as clinically-viable; while highlighting the important consideration for clinical conditions in performing these evaluations.
Collapse
|
22
|
Kala A, Husain A, McCollum ED, Elhilali M. An objective measure of signal quality for pediatric lung auscultations. Annu Int Conf IEEE Eng Med Biol Soc 2020; 2020:772-775. [PMID: 33018100 DOI: 10.1109/embc44109.2020.9176539] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A stethoscope is a ubiquitous tool used to 'listen' to sounds from the chest in order to assess lung and heart conditions. With advances in health technologies including digital devices and new wearable sensors, access to these sounds is becoming easier and abundant; yet proper measures of signal quality do not exist. In this work, we develop an objective quality metric of lung sounds based on low-level and high-level features in order to independently assess the integrity of the signal in presence of interference from ambient sounds and other distortions. The proposed metric outlines a mapping of auscultation signals onto rich low-level features extracted directly from the signal which capture spectral and temporal characteristics of the signal. Complementing these signal-derived attributes, we propose high-level learnt embedding features extracted from a generative auto-encoder trained to map auscultation signals onto a representative space that best captures the inherent statistics of lung sounds. Integrating both low-level (signal-derived) and high-level (embedding) features yields a robust correlation of 0.85 to infer the signal-to-noise ratio of recordings with varying quality levels. The method is validated on a large dataset of lung auscultation recorded in various clinical settings with controlled varying degrees of noise interference. The proposed metric is also validated against opinions of expert physicians in a blind listening test to further corroborate the efficacy of this method for quality assessment.
Collapse
|
23
|
Summers V, Grant KW, Walden BE, Cord MT, Surr RK, Elhilali M. Evaluation of A “Direct-Comparison” Approach to Automatic Switching In Omnidirectional/Directional Hearing Aids. J Am Acad Audiol 2020; 19:708-20. [DOI: 10.3766/jaaa.19.9.6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Background: Hearing aids today often provide both directional (DIR) and omnidirectional (OMNI) processing options with the currently active mode selected automatically by the device. The most common approach to automatic switching involves “acoustic scene analysis” where estimates of various acoustic properties of the listening environment (e.g., signal-to-noise ratio [SNR], overall sound level) are used as a basis for switching decisions.
Purpose: The current study was carried out to evaluate an alternative, “direct-comparison” approach to automatic switching that does not involve assumptions about how the listening environment may relate to microphone preferences. Predictions of microphone preference were based on whether DIR- or OMNI-processing of a given listening environment produced a closer match to a reference template representing the spectral and temporal modulations present in clean speech.
Research Design: A descriptive and correlational study. Predictions of OMNI/DIR preferences were determined based on degree of similarity between spectral and temporal modulations contained in a reference, clean-speech template, and in OMNI- and DIR-processed recordings of various listening environments. These predictions were compared with actual preference judgments (both real-world judgments and laboratory responses to the recordings).
Data Collection And Analysis: Predictions of microphone preference were based on whether DIR- or OMNI-processing of a given listening environment produced a closer match to a reference template representing clean speech. The template is the output of an auditory processing model that characterizes the spectral and temporal modulations associated with a given input signal (clean speech in this case). A modified version of the spectro-temporal modulation index (mSTMI) was used to compare the template to both DIR- and OMNI-processed versions of a given listening environment, as processed through the same auditory model. These analyses were carried out on recordings (originally collected by Walden et al, 2007) of OMNI- and DIR-processed speech produced in a range of everyday listening situations. Walden et al reported OMNI/DIR preference judgments made by raters at the same time the field recordings were made and judgments based on laboratory presentations of these recordings to hearing-impaired and normal-hearing listeners. Preference predictions based on the mSTMI analyses were compared with both sets of preference judgments.
Results: The mSTMI analyses showed better than 92% accuracy in predicting the field preferences and 82–85% accuracy in predicting the laboratory preference judgments. OMNI processing tended to be favored over DIR processing in cases where the analysis indicated fairly similar mSTMI scores across the two processing modes. This is consistent with the common clinical assignment of OMNI mode as the default setting, most likely to be preferred in cases where neither mode produces a substantial improvement in SNR. Listeners experienced with switchable OMNI/DIR hearing aids were more likely than other listeners to favor the DIR mode in instances where mSTMI scores only slightly favored DIR processing.
Conclusions: A direct-comparison approach to OMNI/DIR mode selection was generally successful in predicting user preferences in a range of listening environments. Future modifications to the approach to further improve predictive accuracy are discussed.
Collapse
|
24
|
Kaya EM, Huang N, Elhilali M. Pitch, Timbre and Intensity Interdependently Modulate Neural Responses to Salient Sounds. Neuroscience 2020; 440:1-14. [PMID: 32445938 DOI: 10.1016/j.neuroscience.2020.05.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 04/28/2020] [Accepted: 05/10/2020] [Indexed: 01/31/2023]
Abstract
As we listen to everyday sounds, auditory perception is heavily shaped by interactions between acoustic attributes such as pitch, timbre and intensity; though it is not clear how such interactions affect judgments of acoustic salience in dynamic soundscapes. Salience perception is believed to rely on an internal brain model that tracks the evolution of acoustic characteristics of a scene and flags events that do not fit this model as salient. The current study explores how the interdependency between attributes of dynamic scenes affects the neural representation of this internal model and shapes encoding of salient events. Specifically, the study examines how deviations along combinations of acoustic attributes interact to modulate brain responses, and subsequently guide perception of certain sound events as salient given their context. Human volunteers have their attention focused on a visual task and ignore acoustic melodies playing in the background while their brain activity using electroencephalography is recorded. Ambient sounds consist of musical melodies with probabilistically-varying acoustic attributes. Salient notes embedded in these scenes deviate from the melody's statistical distribution along pitch, timbre and/or intensity. Recordings of brain responses to salient notes reveal that neural power in response to the melodic rhythm as well as cross-trial phase alignment in the theta band are modulated by degree of salience of the notes, estimated across all acoustic attributes given their probabilistic context. These neural nonlinear effects across attributes strongly parallel behavioral nonlinear interactions observed in perceptual judgments of auditory salience using similar dynamic melodies; suggesting a neural underpinning of nonlinear interactions that underlie salience perception.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Huang
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering Johns Hopkins University, Baltimore, MD, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
25
|
Little DF, Snyder JS, Elhilali M. Ensemble modeling of auditory streaming reveals potential sources of bistability across the perceptual hierarchy. PLoS Comput Biol 2020; 16:e1007746. [PMID: 32275706 PMCID: PMC7185718 DOI: 10.1371/journal.pcbi.1007746] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 04/27/2020] [Accepted: 02/25/2020] [Indexed: 11/19/2022] Open
Abstract
Perceptual bistability-the spontaneous, irregular fluctuation of perception between two interpretations of a stimulus-occurs when observing a large variety of ambiguous stimulus configurations. This phenomenon has the potential to serve as a tool for, among other things, understanding how function varies across individuals due to the large individual differences that manifest during perceptual bistability. Yet it remains difficult to interpret the functional processes at work, without knowing where bistability arises during perception. In this study we explore the hypothesis that bistability originates from multiple sources distributed across the perceptual hierarchy. We develop a hierarchical model of auditory processing comprised of three distinct levels: a Peripheral, tonotopic analysis, a Central analysis computing features found more centrally in the auditory system, and an Object analysis, where sounds are segmented into different streams. We model bistable perception within this system by applying adaptation, inhibition and noise into one or all of the three levels of the hierarchy. We evaluate a large ensemble of variations of this hierarchical model, where each model has a different configuration of adaptation, inhibition and noise. This approach avoids the assumption that a single configuration must be invoked to explain the data. Each model is evaluated based on its ability to replicate two hallmarks of bistability during auditory streaming: the selectivity of bistability to specific stimulus configurations, and the characteristic log-normal pattern of perceptual switches. Consistent with a distributed origin, a broad range of model parameters across this hierarchy lead to a plausible form of perceptual bistability.
Collapse
Affiliation(s)
- David F. Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas; Las Vegas, Nevada, United States of America
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
26
|
Huang N, Elhilali M. Push-pull competition between bottom-up and top-down auditory attention to natural soundscapes. eLife 2020; 9:52984. [PMID: 32196457 PMCID: PMC7083598 DOI: 10.7554/elife.52984] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/13/2020] [Indexed: 12/17/2022] Open
Abstract
In everyday social environments, demands on attentional resources dynamically shift to balance our attention to targets of interest while alerting us to important objects in our surrounds. The current study uses electroencephalography to explore how the push-pull interaction between top-down and bottom-up attention manifests itself in dynamic auditory scenes. Using natural soundscapes as distractors while subjects attend to a controlled rhythmic sound sequence, we find that salient events in background scenes significantly suppress phase-locking and gamma responses to the attended sequence, countering enhancement effects observed for attended targets. In line with a hypothesis of limited attentional resources, the modulation of neural activity by bottom-up attention is graded by degree of salience of ambient events. The study also provides insights into the interplay between endogenous and exogenous attention during natural soundscapes, with both forms of attention engaging a common fronto-parietal network at different time lags.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical Engineering, Johns Hopkins University, Baltimore, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical Engineering, Johns Hopkins University, Baltimore, United States
| |
Collapse
|
27
|
Abstract
One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose a bio-mimetic account of acoustic object classification by developing a novel distributed deep belief network validated for the task of robust acoustic object classification using the UrbanSound database. The proposed distributed belief network (DBN) encompasses an array of independent sub-networks trained generatively to capture different abstractions of natural sounds. A supervised classifier then performs a readout of this distributed mapping. The overall architecture not only matches the state of the art system for acoustic object classification but leads to significant improvement over the baseline in mismatched noisy conditions (31.4% relative improvement in 0dB conditions). Furthermore, we incorporate mechanisms of attentional feedback that allows the DBN to deploy local memories of sounds targets estimated at multiple views to bias network activation when attending to a particular object. This adaptive feedback results in further improvement of object classification in unseen noise conditions (relative improvement of 54% over the baseline in 0dB conditions).
Collapse
Affiliation(s)
- Ashwin Bellur
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| |
Collapse
|
28
|
Salles A, Park S, Sundar H, Macías S, Elhilali M, Moss CF. Neural Response Selectivity to Natural Sounds in the Bat Midbrain. Neuroscience 2020; 434:200-211. [PMID: 31918008 DOI: 10.1016/j.neuroscience.2019.11.047] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 11/29/2022]
Abstract
Little is known about the neural mechanisms that mediate differential action-selection responses to communication and echolocation calls in bats. For example, in the big brown bat, frequency modulated (FM) food-claiming communication calls closely resemble FM echolocation calls, which guide social and orienting behaviors, respectively. Using advanced signal processing methods, we identified fine differences in temporal structure of these natural sounds that appear key to auditory discrimination and behavioral decisions. We recorded extracellular potentials from single neurons in the midbrain inferior colliculus (IC) of passively listening animals, and compared responses to playbacks of acoustic signals used by bats for social communication and echolocation. We combined information obtained from spike number and spike triggered averages (STA) to reveal a robust classification of neuron selectivity for communication or echolocation calls. These data highlight the importance of temporal acoustic structure for differentiating echolocation and food-claiming social calls and point to general mechanisms of natural sound processing across species.
Collapse
Affiliation(s)
- Angeles Salles
- Department of Psychological and Brain Sciences, Johns Hopkins University, United States.
| | - Sangwook Park
- Department of Electrical and Computer Engineering, Johns Hopkins University, United States
| | - Harshavardhan Sundar
- Department of Electrical and Computer Engineering, Johns Hopkins University, United States
| | - Silvio Macías
- Department of Psychological and Brain Sciences, Johns Hopkins University, United States
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, United States
| | - Cynthia F Moss
- Department of Psychological and Brain Sciences, Johns Hopkins University, United States
| |
Collapse
|
29
|
Liu SC, Harris JG, Elhilali M, Slaney M. Editorial: Bio-inspired Audio Processing, Models and Systems. Front Neurosci 2019; 13:978. [PMID: 31572122 PMCID: PMC6753195 DOI: 10.3389/fnins.2019.00978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Accepted: 08/30/2019] [Indexed: 11/24/2022] Open
Affiliation(s)
- Shih-Chii Liu
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - John G. Harris
- Department of Electrical & Computer Engineering, University of Florida, Gainesville, FL, United States
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Malcolm Slaney
- Google, Mountain View, CA, United States
- *Correspondence: Malcolm Slaney
| |
Collapse
|
30
|
Elhilali M, West JE. The Stethoscope Gets Smart: Engineers from Johns Hopkins are giving the humble stethoscope an AI upgrade. IEEE Spectr 2019; 56:36-41. [PMID: 34588704 PMCID: PMC8478072 DOI: 10.1109/mspec.2019.8635815] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Affiliation(s)
- Mounya Elhilali
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - James E West
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| |
Collapse
|
31
|
Abstract
Our current understanding of how the brain segregates auditory scenes into meaningful objects is in line with a Gestaltism framework. These Gestalt principles suggest a theory of how different attributes of the soundscape are extracted then bound together into separate groups that reflect different objects or streams present in the scene. These cues are thought to reflect the underlying statistical structure of natural sounds in a similar way that statistics of natural images are closely linked to the principles that guide figure-ground segregation and object segmentation in vision. In the present study, we leverage inference in stochastic neural networks to learn emergent grouping cues directly from natural soundscapes including speech, music and sounds in nature. The model learns a hierarchy of local and global spectro-temporal attributes reminiscent of simultaneous and sequential Gestalt cues that underlie the organization of auditory scenes. These mappings operate at multiple time scales to analyze an incoming complex scene and are then fused using a Hebbian network that binds together coherent features into perceptually-segregated auditory objects. The proposed architecture successfully emulates a wide range of well established auditory scene segregation phenomena and quantifies the complimentary role of segregation and binding cues in driving auditory scene segregation.
Collapse
Affiliation(s)
- Debmalya Chakrabarty
- Laboratory for Computational Audio Processing, Center for Speech and Language Processing, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Processing, Center for Speech and Language Processing, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
- * E-mail:
| |
Collapse
|
32
|
Abstract
To understand our surroundings, we effortlessly parse our sound environment into sound sources, extracting invariant information-or regularities-over time to build an internal representation of the world around us. Previous experimental work has shown the brain is sensitive to many types of regularities in sound, but theoretical models that capture underlying principles of regularity tracking across diverse sequence structures have been few and far between. Existing efforts often focus on sound patterns rather the stochastic nature of sequences. In the current study, we employ a perceptual model for regularity extraction based on a Bayesian framework that posits the brain collects statistical information over time. We show this model can be used to simulate various results from the literature with stimuli exhibiting a wide range of predictability. This model can provide a useful tool for both interpreting existing experimental results under a unified model and providing predictions for new ones using more complex stimuli.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Johns Hopkins University, Baltimore, Maryland, United States.
| |
Collapse
|
33
|
Huang N, Slaney M, Elhilali M. Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals. Front Neurosci 2018; 12:532. [PMID: 30154688 PMCID: PMC6102345 DOI: 10.3389/fnins.2018.00532] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Accepted: 07/16/2018] [Indexed: 11/13/2022] Open
Abstract
Deep neural networks have been recently shown to capture intricate information transformation of signals from the sensory profiles to semantic representations that facilitate recognition or discrimination of complex stimuli. In this vein, convolutional neural networks (CNNs) have been used very successfully in image and audio classification. Designed to imitate the hierarchical structure of the nervous system, CNNs reflect activation with increasing degrees of complexity that transform the incoming signal onto object-level representations. In this work, we employ a CNN trained for large-scale audio object classification to gain insights about the contribution of various audio representations that guide sound perception. The analysis contrasts activation of different layers of a CNN with acoustic features extracted directly from the scenes, perceptual salience obtained from behavioral responses of human listeners, as well as neural oscillations recorded by electroencephalography (EEG) in response to the same natural scenes. All three measures are tightly linked quantities believed to guide percepts of salience and object formation when listening to complex scenes. The results paint a picture of the intricate interplay between low-level and object-level representations in guiding auditory salience that is very much dependent on context and sound category.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Malcolm Slaney
- Machine Hearing, Google AI, Google (United States), Mountain View, CA, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
34
|
Abstract
Our ability to parse our acoustic environment relies on the brain's capacity to extract statistical regularities from surrounding sounds. Previous work in regularity extraction has predominantly focused on the brain's sensitivity to predictable patterns in sound sequences. However, natural sound environments are rarely completely predictable, often containing some level of randomness, yet the brain is able to effectively interpret its surroundings by extracting useful information from stochastic sounds. It has been previously shown that the brain is sensitive to the marginal lower-order statistics of sound sequences (i.e., mean and variance). In this work, we investigate the brain's sensitivity to higher-order statistics describing temporal dependencies between sound events through a series of change detection experiments, where listeners are asked to detect changes in randomness in the pitch of tone sequences. Behavioral data indicate listeners collect statistical estimates to process incoming sounds, and a perceptual model based on Bayesian inference shows a capacity in the brain to track higher-order statistics. Further analysis of individual subjects' behavior indicates an important role of perceptual constraints in listeners' ability to track these sensory statistics with high fidelity. In addition, the inference model facilitates analysis of neural electroencephalography (EEG) responses, anchoring the analysis relative to the statistics of each stochastic stimulus. This reveals both a deviance response and a change-related disruption in phase of the stimulus-locked response that follow the higher-order statistics. These results shed light on the brain's ability to process stochastic sound sequences.
Collapse
Affiliation(s)
- Benjamin Skerritt-Davis
- Electrical & Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Mounya Elhilali
- Electrical & Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
35
|
|
36
|
McCollum ED, Park DE, Watson NL, Buck WC, Bunthi C, Devendra A, Ebruke BE, Elhilali M, Emmanouilidou D, Garcia-Prats AJ, Githinji L, Hossain L, Madhi SA, Moore DP, Mulindwa J, Olson D, Awori JO, Vandepitte WP, Verwey C, West JE, Knoll MD, O'Brien KL, Feikin DR, Hammit LL. Listening panel agreement and characteristics of lung sounds digitally recorded from children aged 1-59 months enrolled in the Pneumonia Etiology Research for Child Health (PERCH) case-control study. BMJ Open Respir Res 2017; 4:e000193. [PMID: 28883927 PMCID: PMC5531306 DOI: 10.1136/bmjresp-2017-000193] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 05/25/2017] [Accepted: 05/25/2017] [Indexed: 01/14/2023] Open
Abstract
INTRODUCTION Paediatric lung sound recordings can be systematically assessed, but methodological feasibility and validity is unknown, especially from developing countries. We examined the performance of acoustically interpreting recorded paediatric lung sounds and compared sound characteristics between cases and controls. METHODS Pneumonia Etiology Research for Child Health staff in six African and Asian sites recorded lung sounds with a digital stethoscope in cases and controls. Cases aged 1-59 months had WHO severe or very severe pneumonia; age-matched community controls did not. A listening panel assigned examination results of normal, crackle, wheeze, crackle and wheeze or uninterpretable, with adjudication of discordant interpretations. Classifications were recategorised into any crackle, any wheeze or abnormal (any crackle or wheeze) and primary listener agreement (first two listeners) was analysed among interpretable examinations using the prevalence-adjusted, bias-adjusted kappa (PABAK). We examined predictors of disagreement with logistic regression and compared case and control lung sounds with descriptive statistics. RESULTS Primary listeners considered 89.5% of 792 case and 92.4% of 301 control recordings interpretable. Among interpretable recordings, listeners agreed on the presence or absence of any abnormality in 74.9% (PABAK 0.50) of cases and 69.8% (PABAK 0.40) of controls, presence/absence of crackles in 70.6% (PABAK 0.41) of cases and 82.4% (PABAK 0.65) of controls and presence/absence of wheeze in 72.6% (PABAK 0.45) of cases and 73.8% (PABAK 0.48) of controls. Controls, tachypnoea, >3 uninterpretable chest positions, crying, upper airway noises and study site predicted listener disagreement. Among all interpretable examinations, 38.0% of cases and 84.9% of controls were normal (p<0.0001); wheezing was the most common sound (49.9%) in cases. CONCLUSIONS Listening panel and case-control data suggests our methodology is feasible, likely valid and that small airway inflammation is common in WHO pneumonia. Digital auscultation may be an important future pneumonia diagnostic in developing countries.
Collapse
Affiliation(s)
- Eric D McCollum
- Eudowood Division of Pediatric Respiratory Sciences, Johns Hopkins School of Medicine, Baltimore, Maryland, USA,Department of International Health, Johns Hopkins Bloomberg School of Public Health, Dhaka, Bangladesh,Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Daniel E Park
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | | | - W Chris Buck
- Department of Pediatrics, University of California Los Angeles, Maputo, Mozambique
| | - Charatdao Bunthi
- International Emerging Infections Program, Global Disease Detection Center, Thailand Ministry of Public Health – US Centers for Disease Control and Prevention Collaboration, Nonthaburi, Thailand
| | | | | | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Dimitra Emmanouilidou
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Anthony J Garcia-Prats
- Department of Paediatrics and Child Health, Stellenbosch University, Tygerberg, South Africa
| | - Leah Githinji
- Division of Paediatric Pulmonology, University of Cape Town, Cape Town, South Africa
| | - Lokman Hossain
- Respiratory Vaccines, Center for Vaccine Sciences, icddr,b, Dhaka, Bangladesh
| | - Shabir A Madhi
- Medical Research Council, Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa,Department of Science and Technology/National Research Foundation, South African Research Chair: Vaccine Preventable Diseases, University of the Witwatersrand, Johannesburg, South Africa
| | - David P Moore
- Medical Research Council, Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa,Department of Paediatrics, University of the Witwatersrand, Chris Hani Baragwanath Academic Hospital, Johannesburg, South Africa
| | - Justin Mulindwa
- Department of Paediatrics and Child Health, University Teaching Hospital, Lusaka, Zambia
| | - Dan Olson
- Department of Pediatrics, Section of Infectious Disease, Center for Global Health, University of Colorado, Colorado, USA
| | - Juliet O Awori
- Kenya Medical Research Institute Wellcome Trust Research Programme, Kilifi, Kenya
| | - Warunee P Vandepitte
- Queen Sirikit National Institute of Child Health, Rangsit University, Bangkok, Thailand
| | - Charl Verwey
- Medical Research Council, Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa,Department of Paediatrics, University of the Witwatersrand, Chris Hani Baragwanath Academic Hospital, Johannesburg, South Africa
| | - James E West
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Maria D Knoll
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Katherine L O'Brien
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Daniel R Feikin
- Department of International Health, International Vaccine Access Center, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA,Division of Viral Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Laura L Hammit
- Kenya Medical Research Institute Wellcome Trust Research Programme, Kilifi, Kenya
| |
Collapse
|
37
|
Abstract
GOAL Chest auscultations offer a non-invasive and low-cost tool for monitoring lung disease. However, they present many shortcomings, including inter-listener variability, subjectivity, and vulnerability to noise and distortions. This work proposes a computer-aided approach to process lung signals acquired in the field under adverse noisy conditions, by improving the signal quality and offering automated identification of abnormal auscultations indicative of respiratory pathologies. METHODS The developed noise-suppression scheme eliminates ambient sounds, heart sounds, sensor artifacts, and crying contamination. The improved high-quality signal is then mapped onto a rich spectrotemporal feature space before being classified using a trained support-vector machine classifier. Individual signal frame decisions are then combined using an evaluation scheme, providing an overall patient-level decision for unseen patient records. RESULTS All methods are evaluated on a large dataset with 1000 children enrolled, 1-59 months old. The noise suppression scheme is shown to significantly improve signal quality, and the classification system achieves an accuracy of 86.7% in distinguishing normal from pathological sounds, far surpassing other state-of-the-art methods. CONCLUSION Computerized lung sound processing can benefit from the enforcement of advanced noise suppression. A fairly short processing window size ( s) combined with detailed spectrotemporal features is recommended, in order to capture transient adventitious events without highlighting sharp noise occurrences. SIGNIFICANCE Unlike existing methodologies in the literature, the proposed work is not limited in scope or confined to laboratory settings: This work validates a practical method for fully automated chest sound processing applicable to realistic and noisy auscultation settings.
Collapse
|
38
|
Snyder JS, Elhilali M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, Nevada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
39
|
Abstract
Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions that are often faced in everyday life, the biological system relies on a number of principles that allow it to effortlessly parse its rich soundscape. In the current study, we leverage a key principle employed by the auditory system: its ability to adapt the neural representation of its sensory input in a high-dimensional space. We propose a framework that mimics this process in a computational model for robust speech activity detection. The system employs a 2-D Gabor filter bank whose parameters are retuned offline to improve the separability between the feature representation of speech and nonspeech sounds. This retuning process, driven by feedback from statistical models of speech and nonspeech classes, attempts to minimize the misclassification risk of mismatched data, with respect to the original statistical models. We hypothesize that this risk minimization procedure results in an emphasis of unique speech and nonspeech modulations in the high-dimensional space. We show that such an adapted system is indeed robust to other novel conditions, with a marked reduction in equal error rates for a variety of databases with additive and convolutive noise distortions. We discuss the lessons learned from biology with regard to adapting to an ever-changing acoustic environment and the impact on building truly intelligent audio processing systems.
Collapse
Affiliation(s)
- Ashwin Bellur
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
40
|
Huang N, Elhilali M. Auditory salience using natural soundscapes. J Acoust Soc Am 2017; 141:2163. [PMID: 28372080 PMCID: PMC6909985 DOI: 10.1121/1.4979055] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/09/2017] [Accepted: 03/10/2017] [Indexed: 05/26/2023]
Abstract
Salience describes the phenomenon by which an object stands out from a scene. While its underlying processes are extensively studied in vision, mechanisms of auditory salience remain largely unknown. Previous studies have used well-controlled auditory scenes to shed light on some of the acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of sensory-driven auditory attention. The present study explores auditory salience in a set of dynamic natural scenes. A behavioral measure of salience is collected by having human volunteers listen to two concurrent scenes and indicate continuously which one attracts their attention. By using natural scenes, the study takes a data-driven rather than experimenter-driven approach to exploring the parameters of auditory salience. The findings indicate that the space of auditory salience is multidimensional (spanning loudness, pitch, spectral shape, as well as other acoustic attributes), nonlinear and highly context-dependent. Importantly, the results indicate that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.
Collapse
Affiliation(s)
- Nicholas Huang
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
41
|
Abstract
Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information-a phenomenon referred to as the 'cocktail party problem'. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by 'bottom-up' sensory-driven factors, as well as 'top-down' task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| |
Collapse
|
42
|
Carlin MA, Elhilali M. A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields. IEEE/ACM Trans Audio Speech Lang Process 2015; 23:2422-2433. [PMID: 29904642 PMCID: PMC5997283 DOI: 10.1109/taslp.2015.2481179] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
Collapse
Affiliation(s)
- Michael A Carlin
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
43
|
Abstract
Gonadotropin-releasing hormone receptors (GnRHR) have been found in extrapituitary tissues, including the prostate, where they might exert a local effect on tissue growth. Degarelix is a GnRHR antagonist approved for use in patients with prostate cancer (PCa) who need androgen deprivation therapy. The slowing of prostate cell growth is a common goal shared by PCa and benign prostate hyperplasia (BPH) patients, and the effect of degarelix on BPH cells has not yet been investigated. We wanted to evaluate the direct effect of degarelix on human BPH primary cell growth. Gene expression studies performed with BPH (n=11), stage 0 (n=15), and PCa (n=65) human specimens demonstrated the presence of GNRHR1 and GNRHR2 and their respective endogenous peptide ligands. BPH-isolated epithelial and stromal cells were either cultured alone or co-cultured (1:4 or 4:1 ratio of epithelial to stromal cells) and subsequently treated with increasing concentrations of degarelix. Degarelix treatment induced a decrease in cell viability and cell proliferation rates, which occurred in parallel to an increase in apoptosis. Both epithelial and stromal BPH cells are sensitive to degarelix treatment and, interestingly, degarelix is also effective when the cells were growing in a co-culture microenvironment. In contrast to degarelix, the GnRHR agonists, leuprolide and goserelin, exerted no effect on the viability of BPH epithelial or stromal cells. In conclusion, (i) prostate tissues express GNRHR and are a potential target for degarelix; and (ii) degarelix directly inhibits BPH cell growth through a decrease in cell proliferation and an increase in apoptosis. Supporting information for this article is available online at http://www.thieme-connect.de/products.
Collapse
Affiliation(s)
- M Sakai
- The Research Institute of the McGill University Health Center, McGill University, Montréal, Québec, Canada
| | - M Elhilali
- The Research Institute of the McGill University Health Center, McGill University, Montréal, Québec, Canada
| | - V Papadopoulos
- The Research Institute of the McGill University Health Center, McGill University, Montréal, Québec, Canada
| |
Collapse
|
44
|
Abstract
To navigate complex acoustic environments, listeners adapt neural processes to focus on behaviorally relevant sounds in the acoustic foreground while minimizing the impact of distractors in the background, an ability referred to as top-down selective attention. Particularly striking examples of attention-driven plasticity have been reported in primary auditory cortex via dynamic reshaping of spectro-temporal receptive fields (STRFs). By enhancing the neural response to features of the foreground while suppressing those to the background, STRFs can act as adaptive contrast matched filters that directly contribute to an improved cognitive segregation between behaviorally relevant and irrelevant sounds. In this study, we propose a novel discriminative framework for modeling attention-driven plasticity of STRFs in primary auditory cortex. The model describes a general strategy for cortical plasticity via an optimization that maximizes discriminability between the foreground and distractors while maintaining a degree of stability in the cortical representation. The first instantiation of the model describes a form of feature-based attention and yields STRF adaptation patterns consistent with a contrast matched filter previously reported in neurophysiological studies. An extension of the model captures a form of object-based attention, where top-down signals act on an abstracted representation of the sensory input characterized in the modulation domain. The object-based model makes explicit predictions in line with limited neurophysiological data currently available but can be readily evaluated experimentally. Finally, we draw parallels between the model and anatomical circuits reported to be engaged during active attention. The proposed model strongly suggests an interpretation of attention-driven plasticity as a discriminative adaptation operating at the level of sensory cortex, in line with similar strategies previously described across different sensory modalities.
Collapse
Affiliation(s)
- Michael A Carlin
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University Baltimore, MD, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University Baltimore, MD, USA
| |
Collapse
|
45
|
Abstract
Although great strides have been achieved in computer-aided diagnosis (CAD) research, a major remaining problem is the ability to perform well under the presence of significant noise. In this work, we propose a mechanism to find instances of potential interest in time series for further analysis. Adaptive Kalman filters are employed in parallel among different feature axes. Lung sounds recorded in noisy conditions are used as an example application, with spectro-temporal feature extraction to capture the complex variabilities in sound. We demonstrate that both disease indicators and distortion events can be detected, reducing long time series signals into a sparse set of relevant events.
Collapse
|
46
|
Emmanouilidou D, McCollum ED, Park DE, Elhilali M. Adaptive Noise Suppression of Pediatric Lung Auscultations With Real Applications to Noisy Clinical Settings in Developing Countries. IEEE Trans Biomed Eng 2015; 62:2279-88. [PMID: 25879837 DOI: 10.1109/tbme.2015.2422698] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
GOAL Chest auscultation constitutes a portable low-cost tool widely used for respiratory disease detection. Though it offers a powerful means of pulmonary examination, it remains riddled with a number of issues that limit its diagnostic capability. Particularly, patient agitation (especially in children), background chatter, and other environmental noises often contaminate the auscultation, hence affecting the clarity of the lung sound itself. This paper proposes an automated multiband denoising scheme for improving the quality of auscultation signals against heavy background contaminations. METHODS The algorithm works on a simple two-microphone setup, dynamically adapts to the background noise and suppresses contaminations while successfully preserving the lung sound content. The proposed scheme is refined to offset maximal noise suppression against maintaining the integrity of the lung signal, particularly its unknown adventitious components that provide the most informative diagnostic value during lung pathology. RESULTS The algorithm is applied to digital recordings obtained in the field in a busy clinic in West Africa and evaluated using objective signal fidelity measures and perceptual listening tests performed by a panel of licensed physicians. A strong preference of the enhanced sounds is revealed. SIGNIFICANCE The strengths and benefits of the proposed method lie in the simple automated setup and its adaptive nature, both fundamental conditions for everyday clinical applicability. It can be simply extended to a real-time implementation, and integrated with lung sound acquisition protocols.
Collapse
|
47
|
Sell G, Suied C, Elhilali M, Shamma S. Perceptual susceptibility to acoustic manipulations in speaker discrimination. J Acoust Soc Am 2015; 137:911-922. [PMID: 25698023 PMCID: PMC5392054 DOI: 10.1121/1.4906826] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Revised: 09/11/2014] [Accepted: 12/08/2014] [Indexed: 06/04/2023]
Abstract
Listeners' ability to discriminate unfamiliar voices is often susceptible to the effects of manipulations of acoustic characteristics of the utterances. This vulnerability was quantified within a task in which participants determined if two utterances were spoken by the same or different speakers. Results of this task were analyzed in relation to a set of historical and novel parameters in order to hypothesize the role of those parameters in the decision process. Listener performance was first measured in a baseline task with unmodified stimuli, and then compared to responses with resynthesized stimuli under three conditions: (1) normalized mean-pitch; (2) normalized duration; and (3) normalized linear predictive coefficients (LPCs). The results of these experiments suggest that perceptual speaker discrimination is robust to acoustic changes, though mean-pitch and LPC modifications are more detrimental to a listener's ability to successfully identify same or different speaker pairings. However, this susceptibility was also found to be partially dependent on the specific speaker and utterances.
Collapse
Affiliation(s)
- Gregory Sell
- Institute for Systems Research, Electrical and Computer Engineering Department, University of Maryland, College Park, Maryland 20742
| | - Clara Suied
- Institut de Recherche Biomédicale des Armées, Département Action et Cognition en Situation Opérationnelle, 91223 Brétigny sur Orge, France
| | - Mounya Elhilali
- Electrical Engineering Department, Johns Hopkins University, Baltimore, Maryland 21218
| | - Shihab Shamma
- Institute for Systems Research, Electrical and Computer Engineering Department, University of Maryland, College Park, Maryland 20742
| |
Collapse
|
48
|
Patil K, Elhilali M. Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases. EURASIP J Audio Speech Music Process 2015; 2015:27. [PMID: 30555520 PMCID: PMC6290678 DOI: 10.1186/s13636-015-0070-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be best captured through an analysis that encompasses both time and frequency domains; with a focus on the modulations or changes in the signal in the spectrotemporal space. This representation mimics the spectrotemporal receptive field (STRF) analysis believed to underlie processing in the central mammalian auditory system, particularly at the level of primary auditory cortex. How well does this STRF representation capture timbral identity of musical instruments in continuous solo recordings remains unclear. The current work investigates the applicability of the STRF feature space for instrument recognition in solo musical phrases and explores best approaches to leveraging knowledge from isolated musical notes for instrument recognition in solo recordings. The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.
Collapse
|
49
|
Abstract
A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments. Humans and many animals can effortlessly navigate complex sensory environments, segregating and attending to one desired target source while suppressing distracting and interfering others. In this paper, we present an algorithmic model that can accomplish this task with no prior information or training on complex signals such as speech mixtures, and speech in noise and music. The model accounts for this ability relying solely on the temporal coherence principle, the notion that perceived sources emit coherently modulated features that evoke coincident cortical response patterns. It further demonstrates how basic cortical mechanisms common to all sensory systems can implement the necessary representations, as well as the adaptive computations necessary to maintain continuity by tracking slowly changing characteristics of different sources in a scene.
Collapse
Affiliation(s)
- Lakshmi Krishnan
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Shihab Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department Etudes Cognitive, Ecole Normale Superieure, Paris, France
| |
Collapse
|
50
|
Akram S, Englitz B, Elhilali M, Simon JZ, Shamma SA. Investigating the neural correlates of a streaming percept in an informational-masking paradigm. PLoS One 2014; 9:e114427. [PMID: 25490720 PMCID: PMC4260833 DOI: 10.1371/journal.pone.0114427] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 11/10/2014] [Indexed: 11/19/2022] Open
Abstract
Humans routinely segregate a complex acoustic scene into different auditory streams, through the extraction of bottom-up perceptual cues and the use of top-down selective attention. To determine the neural mechanisms underlying this process, neural responses obtained through magnetoencephalography (MEG) were correlated with behavioral performance in the context of an informational masking paradigm. In half the trials, subjects were asked to detect frequency deviants in a target stream, consisting of a rhythmic tone sequence, embedded in a separate masker stream composed of a random cloud of tones. In the other half of the trials, subjects were exposed to identical stimuli but asked to perform a different task—to detect tone-length changes in the random cloud of tones. In order to verify that the normalized neural response to the target sequence served as an indicator of streaming, we correlated neural responses with behavioral performance under a variety of stimulus parameters (target tone rate, target tone frequency, and the “protection zone”, that is, the spectral area with no tones around the target frequency) and attentional states (changing task objective while maintaining the same stimuli). In all conditions that facilitated target/masker streaming behaviorally, MEG normalized neural responses also changed in a manner consistent with the behavior. Thus, attending to the target stream caused a significant increase in power and phase coherence of the responses in recording channels correlated with an increase in the behavioral performance of the listeners. Normalized neural target responses also increased as the protection zone widened and as the frequency of the target tones increased. Finally, when the target sequence rate increased, the buildup of the normalized neural responses was significantly faster, mirroring the accelerated buildup of the streaming percepts. Our data thus support close links between the perceptual and neural consequences of the auditory stream segregation.
Collapse
Affiliation(s)
- Sahar Akram
- The Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Bernhard Englitz
- The Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Département d'Etudes Cognitives, Ecole normale supérieure, Paris, France
- Department of Neurophysiology, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department of Biology, University of Maryland University, College Park, Maryland, United States of America
| | - Shihab A. Shamma
- The Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Département d'Etudes Cognitives, Ecole normale supérieure, Paris, France
| |
Collapse
|