1
|
von Seth J, Aller M, Davis MH. Unimodal speech perception predicts stable individual differences in audiovisual benefit for phonemes, words and sentencesa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:1554-1576. [PMID: 40029090 DOI: 10.1121/10.0034846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 12/17/2024] [Indexed: 03/05/2025]
Abstract
There are substantial individual differences in the benefit that can be obtained from visual cues during speech perception. Here, 113 normally hearing participants between the ages of 18 and 60 years old completed a three-part experiment investigating the reliability and predictors of individual audiovisual benefit for acoustically degraded speech. Audiovisual benefit was calculated as the relative intelligibility (at the individual-level) of approximately matched (at the group-level) auditory-only and audiovisual speech for materials at three levels of linguistic structure: meaningful sentences, monosyllabic words, and consonants in minimal syllables. This measure of audiovisual benefit was stable across sessions and materials, suggesting that a shared mechanism of audiovisual integration operates across levels of linguistic structure. Information transmission analyses suggested that this may be related to simple phonetic cue extraction: sentence-level audiovisual benefit was reliably predicted by the relative ability to discriminate place of articulation at the consonant-level. Finally, whereas unimodal speech perception was related to cognitive measures (matrix reasoning and vocabulary) and demographics (age and gender), audiovisual benefit was predicted only by unimodal speech perceptual abilities: Better lipreading ability and subclinically poorer hearing (speech reception thresholds) independently predicted enhanced audiovisual benefit. This work has implications for practices in quantifying audiovisual benefit and research identifying strategies to enhance multimodal communication in hearing loss.
Collapse
Affiliation(s)
- Jacqueline von Seth
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Máté Aller
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
2
|
Herrmann B. Language-agnostic, Automated Assessment of Listeners' Speech Recall Using Large Language Models. Trends Hear 2025; 29:23312165251347131. [PMID: 40448324 DOI: 10.1177/23312165251347131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2025] Open
Abstract
Speech-comprehension difficulties are common among older people. Standard speech tests do not fully capture such difficulties because the tests poorly resemble the context-rich, story-like nature of ongoing conversation and are typically available only in a country's dominant/official language (e.g., English), leading to inaccurate scores for native speakers of other languages. Assessments for naturalistic, story speech in multiple languages require accurate, time-efficient scoring. The current research leverages modern large language models (LLMs) in native English speakers and native speakers of 10 other languages to automate the generation of high-quality, spoken stories and scoring of speech recall in different languages. Participants listened to and freely recalled short stories (in quiet/clear and in babble noise) in their native language. Large language model text-embeddings and LLM prompt engineering with semantic similarity analyses to score speech recall revealed sensitivity to known effects of temporal order, primacy/recency, and background noise, and high similarity of recall scores across languages. The work overcomes limitations associated with simple speech materials and testing of closed native-speaker groups because recall data of varying length and details can be mapped across languages with high accuracy. The full automation of speech generation and recall scoring provides an important step toward comprehension assessments of naturalistic speech with clinical applicability.
Collapse
Affiliation(s)
- Björn Herrmann
- Rotman Research Institute, Baycrest Academy for Research and Education, North York, Ontario, Canada
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Bent T, Baese-Berk M, Puckett B, Ryherd E, Perry S, Manley NA. Older adults' recognition of medical terminology in hospital noise. Cogn Res Princ Implic 2024; 9:79. [PMID: 39636386 PMCID: PMC11621266 DOI: 10.1186/s41235-024-00606-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 11/13/2024] [Indexed: 12/07/2024] Open
Abstract
Word identification accuracy is modulated by many factors including linguistic characteristics of words (frequent vs. infrequent), listening environment (noisy vs. quiet), and listener-related differences (older vs. younger). Nearly, all studies investigating these factors use high-familiarity words and noise signals that are either energetic maskers (e.g., white noise) or informational maskers composed of competing talkers (e.g., multitalker babble). Here, we expand on these findings by examining younger and older listeners' speech-in-noise perception for words varying in both frequency and familiarity within a simulated hospital noise that has important non-speech information. The method was inspired by the real-world challenges aging patients can face in understanding less familiar medical terminology used by healthcare professionals in noisy hospital environments. Word familiarity data from older and young adults were collected for 800 medically related terms. Familiarity ratings were highly correlated between the two age groups. Older adults' transcription accuracy for sentences with medical terminology that vary in their familiarity and frequency was assessed across four listening conditions: hospital noise, speech-shaped noise, amplitude-modulated speech-shaped noise, and quiet. Listeners were less accurate in noise conditions than in a quiet condition and were more impacted by hospital noise than either speech-shaped noise. Sentences with low-familiarity and low-frequency medical words combined with hospital noise were particularly detrimental for older adults compared to younger adults. The results impact our theoretical understanding of speech perception in noise and highlight real-world consequences of older adults' difficulties with speech-in-noise and specifically noise containing competing, non-speech information.
Collapse
Affiliation(s)
- Tessa Bent
- Department of Speech, Language and Hearing Sciences, Indiana University, Tessa Bent, 2631 E. Discovery Parkway, Bloomington, IN, 47408, USA.
| | | | - Brian Puckett
- Durham School of Architectural Engineering and Construction, University of Nebraska-Lincoln, Lincoln, USA
| | - Erica Ryherd
- Durham School of Architectural Engineering and Construction, University of Nebraska-Lincoln, Lincoln, USA
| | - Sydney Perry
- Department of Speech, Language and Hearing Sciences, Indiana University, Tessa Bent, 2631 E. Discovery Parkway, Bloomington, IN, 47408, USA
| | - Natalie A Manley
- Division of Geriatrics, Gerontology and Palliative Medicine, University of Nebraska Medical Center Department of Internal Medicine, Omaha, USA
| |
Collapse
|
4
|
Luo X, Zhou L, Adelgais K, Zhang Z. Assessing the Effectiveness of Automatic Speech Recognition Technology in Emergency Medicine Settings: A Comparative Study of Four AI-powered Engines. RESEARCH SQUARE 2024:rs.3.rs-4727659. [PMID: 39184074 PMCID: PMC11343293 DOI: 10.21203/rs.3.rs-4727659/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Purpose Cutting-edge automatic speech recognition (ASR) technology holds significant promise in transcribing and recognizing medical information during patient encounters, thereby enabling automatic and real-time clinical documentation, which could significantly alleviate care clinicians' burdens. Nevertheless, the performance of current-generation ASR technology in analyzing conversations in noisy and dynamic medical settings, such as prehospital or Emergency Medical Services (EMS), lacks sufficient validation. This study explores the current technological limitations and future potential of deploying ASR technology for clinical documentation in fast-paced and noisy medical settings such as EMS. Methods In this study, we evaluated four ASR engines, including Google Speech-to-Text Clinical Conversation, OpenAI Speech-to-Text, Amazon Transcribe Medical, and Azure Speech-to-Text engine. The empirical data used for evaluation were 40 EMS simulation recordings. The transcribed texts were analyzed for accuracy against 23 Electronic Health Records (EHR) categories of EMS. The common types of errors in transcription were also analyzed. Results Among all four ASR engines, Google Speech-to-Text Clinical Conversation performed the best. Among all EHR categories, better performance was observed in categories "mental state" (F1 = 1.0), "allergies" (F1 = 0.917), "past medical history" (F1 = 0.804), "electrolytes" (F1 = 1.0), and "blood glucose level" (F1 = 0.813). However, all four ASR engines demonstrated low performance in transcribing certain critical categories, such as "treatment" (F1 = 0.650) and "medication" (F1 = 0.577). Conclusion Current ASR solutions fall short in fully automating the clinical documentation in EMS setting. Our findings highlight the need for further improvement and development of automated clinical documentation technology to improve recognition accuracy in time-critical and dynamic medical settings.
Collapse
|
5
|
Suite L, Freiwirth G, Babel M. Receptive vocabulary predicts multilinguals' recognition skills in adverse listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3916-3930. [PMID: 38126803 DOI: 10.1121/10.0023960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023]
Abstract
Adverse listening conditions are known to affect bilingual listeners' intelligibility scores more than those of monolingual listeners. To advance theoretical understanding of the mechanisms underpinning bilinguals' challenges in adverse listening conditions, vocabulary size and language entropy are compared as predictors in a sentence transcription task with a heterogeneous multilingual population representative of a speech community. Adverse listening was induced through noise type, bandwidth manipulations, and sentences varying in their semantic predictability. Overall, the results generally confirm anticipated patterns with respect to sentence type, noise masking, and bandwidth. Listeners show better comprehension of semantically coherent utterances without masking and with a full spectrum. Crucially, listeners with larger receptive vocabularies and lower language entropy, a measure of the predictability of one's language use, showed improved performance in adverse listening conditions. Vocabulary size had a substantially larger effect size, indicating that vocabulary size has more impact on performance in adverse listening conditions than bilingual language use. These results suggest that the mechanism behind the bilingual disadvantage in adverse listening conditions may be rooted in bilinguals' smaller language-specific receptive vocabularies, offering a harmonious explanation for challenges in adverse listening conditions experienced by monolinguals and multilinguals.
Collapse
Affiliation(s)
- Lexia Suite
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Galia Freiwirth
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Molly Babel
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| |
Collapse
|
6
|
Zhao K, Farrell K, Mashiku M, Abay D, Tang K, Oberste MS, Burns CC. A search-based geographic metadata curation pipeline to refine sequencing institution information and support public health. Front Public Health 2023; 11:1254976. [PMID: 38035280 PMCID: PMC10683794 DOI: 10.3389/fpubh.2023.1254976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 10/19/2023] [Indexed: 12/02/2023] Open
Abstract
Background The National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) has amassed a vast reservoir of genetic data since its inception in 2007. These public data hold immense potential for supporting pathogen surveillance and control. However, the lack of standardized metadata and inconsistent submission practices in SRA may impede the data's utility in public health. Methods To address this issue, we introduce the Search-based Geographic Metadata Curation (SGMC) pipeline. SGMC utilized Python and web scraping to extract geographic data of sequencing institutions from NCBI SRA in the Cloud and its website. It then harnessed ChatGPT to refine the sequencing institution and location assignments. To illustrate the pipeline's utility, we examined the geographic distribution of the sequencing institutions and their countries relevant to polio eradication and categorized them. Results SGMC successfully identified 7,649 sequencing institutions and their global locations from a random selection of 2,321,044 SRA accessions. These institutions were distributed across 97 countries, with strong representation in the United States, the United Kingdom and China. However, there was a lack of data from African, Central Asian, and Central American countries, indicating potential disparities in sequencing capabilities. Comparison with manually curated data for U.S. institutions reveals SGMC's accuracy rates of 94.8% for institutions, 93.1% for countries, and 74.5% for geographic coordinates. Conclusion SGMC may represent a novel approach using a generative AI model to enhance geographic data (country and institution assignments) for large numbers of samples within SRA datasets. This information can be utilized to bolster public health endeavors.
Collapse
Affiliation(s)
- Kun Zhao
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Katie Farrell
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Melchizedek Mashiku
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Dawit Abay
- Cherokee Nation Businesses, Contracting Agency to the Division of Viral Diseases, Centers for Disease Control and Prevention, Catoosa, OK, United States
| | - Kevin Tang
- Division of Scientific Resources, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - M Steven Oberste
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Cara C Burns
- Division of Viral Diseases, National Center for Immunization and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States
| |
Collapse
|
7
|
Constructing a Shariah Document Screening Prototype Based on Serverless Architecture. COMPUTERS 2023. [DOI: 10.3390/computers12030050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
The aim of this research is to discuss the groundwork of building an Islamic Banking Document Screening Prototype based on a serverless architecture framework. This research first forms an algorithm for document matching based Vector Space Model (VCM) and adopts Levenshtein Distance for similarity setting. Product proposals will become a query, and policy documents by the central bank will be a corpus or database for document matching. Both the query and corpus went through preprocessing stage prior to similarity analysis. One set of queries with two sets of corpora is tested in this research to compare similarity values. Finally, a prototype of Shariah Document Screening is built based on a serverless architecture framework and ReactJS interface. This research is the first attempt to introduce a Shariah document screening prototype based on a serverless architecture technology that would be useful to the Islamic financial industry towards achieving a Shariah-compliant business. Given the development of Fintech, the output of this research study would be a complement to the existing Fintech applications, which focus on ensuring the Islamic nature of the businesses.
Collapse
|
8
|
Stark K, van Scherpenberg C, Obrig H, Abdel Rahman R. Web-based language production experiments: Semantic interference assessment is robust for spoken and typed response modalities. Behav Res Methods 2023; 55:236-262. [PMID: 35378676 PMCID: PMC9918579 DOI: 10.3758/s13428-021-01768-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/03/2021] [Indexed: 11/08/2022]
Abstract
For experimental research on language production, temporal precision and high quality of the recorded audio files are imperative. These requirements are a considerable challenge if language production is to be investigated online. However, online research has huge potential in terms of efficiency, ecological validity and diversity of study populations in psycholinguistic and related research, also beyond the current situation. Here, we supply confirmatory evidence that language production can be investigated online and that reaction time (RT) distributions and error rates are similar in written naming responses (using the keyboard) and typical overt spoken responses. To assess semantic interference effects in both modalities, we performed two pre-registered experiments (n = 30 each) in online settings using the participants' web browsers. A cumulative semantic interference (CSI) paradigm was employed that required naming several exemplars of semantic categories within a seemingly unrelated sequence of objects. RT is expected to increase linearly for each additional exemplar of a category. In Experiment 1, CSI effects in naming times described in lab-based studies were replicated. In Experiment 2, the responses were typed on participants' computer keyboards, and the first correct key press was used for RT analysis. This novel response assessment yielded a qualitatively similar, very robust CSI effect. Besides technical ease of application, collecting typewritten responses and automatic data preprocessing substantially reduce the work load for language production research. Results of both experiments open new perspectives for research on RT effects in language experiments across a wide range of contexts. JavaScript- and R-based implementations for data collection and processing are available for download.
Collapse
Affiliation(s)
- Kirsten Stark
- Humboldt-Universität zu Berlin, Department of Neurocognitive Psychology, 10099, Berlin, Germany.
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Einstein Center for Neurosciences Berlin, Charitéplatz 1, 10117, Berlin, Germany.
- Humboldt-Universität zu Berlin, Berlin School of Mind and Brain, 10099, Berlin, Germany.
| | - Cornelia van Scherpenberg
- Humboldt-Universität zu Berlin, Berlin School of Mind and Brain, 10099, Berlin, Germany
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Clinic for Cognitive Neurology, University Hospital and Faculty of Medicine Leipzig, Leipzig, Germany
| | - Hellmuth Obrig
- Humboldt-Universität zu Berlin, Berlin School of Mind and Brain, 10099, Berlin, Germany
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Clinic for Cognitive Neurology, University Hospital and Faculty of Medicine Leipzig, Leipzig, Germany
| | - Rasha Abdel Rahman
- Humboldt-Universität zu Berlin, Department of Neurocognitive Psychology, 10099, Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Einstein Center for Neurosciences Berlin, Charitéplatz 1, 10117, Berlin, Germany
- Humboldt-Universität zu Berlin, Berlin School of Mind and Brain, 10099, Berlin, Germany
| |
Collapse
|
9
|
Baese-Berk MM, Levi SV, Van Engen KJ. Intelligibility as a measure of speech perception: Current approaches, challenges, and recommendations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:68. [PMID: 36732227 DOI: 10.1121/10.0016806] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 12/18/2022] [Indexed: 06/18/2023]
Abstract
Intelligibility measures, which assess the number of words or phonemes a listener correctly transcribes or repeats, are commonly used metrics for speech perception research. While these measures have many benefits for researchers, they also come with a number of limitations. By pointing out the strengths and limitations of this approach, including how it fails to capture aspects of perception such as listening effort, this article argues that the role of intelligibility measures must be reconsidered in fields such as linguistics, communication disorders, and psychology. Recommendations for future work in this area are presented.
Collapse
Affiliation(s)
| | - Susannah V Levi
- Department of Communicative Sciences and Disorders, New York University, New York, New York 10012, USA
| | - Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| |
Collapse
|
10
|
Levy J, Vattikonda N, Haudenschild C, Christensen B, Vaickus L. Comparison of Machine-Learning Algorithms for the Prediction of Current Procedural Terminology (CPT) Codes from Pathology Reports. J Pathol Inform 2022; 13:3. [PMID: 35127232 PMCID: PMC8802304 DOI: 10.4103/jpi.jpi_52_21] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/20/2021] [Accepted: 11/30/2021] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Pathology reports serve as an auditable trial of a patient's clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs).
Collapse
Affiliation(s)
- Joshua Levy
- Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA,Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Corresponding author at: Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, 1 Medical Center Drive, Borwell Building 4th Floor, Lebanon NH 03766, USA.
| | - Nishitha Vattikonda
- Thomas Jefferson High School for Science and Technology, Alexandria, VA, USA
| | | | - Brock Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Louis Vaickus
- Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA
| |
Collapse
|
11
|
Adaptation to Social-Linguistic Associations in Audio-Visual Speech. Brain Sci 2022; 12:brainsci12070845. [PMID: 35884648 PMCID: PMC9312963 DOI: 10.3390/brainsci12070845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/23/2022] [Accepted: 06/25/2022] [Indexed: 02/04/2023] Open
Abstract
Listeners entertain hypotheses about how social characteristics affect a speaker’s pronunciation. While some of these hypotheses may be representative of a demographic, thus facilitating spoken language processing, others may be erroneous stereotypes that impede comprehension. As a case in point, listeners’ stereotypes of language and ethnicity pairings in varieties of North American English can improve intelligibility and comprehension, or hinder these processes. Using audio-visual speech this study examines how listeners adapt to speech in noise from four speakers who are representative of selected accent-ethnicity associations in the local speech community: an Asian English-L1 speaker, a white English-L1 speaker, an Asian English-L2 speaker, and a white English-L2 speaker. The results suggest congruent accent-ethnicity associations facilitate adaptation, and that the mainstream local accent is associated with a more diverse speech community.
Collapse
|
12
|
Ratnanather JT, Wang LC, Bae SH, O'Neill ER, Sagi E, Tward DJ. Visualization of Speech Perception Analysis via Phoneme Alignment: A Pilot Study. Front Neurol 2022; 12:724800. [PMID: 35087462 PMCID: PMC8787339 DOI: 10.3389/fneur.2021.724800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
Objective: Speech tests assess the ability of people with hearing loss to comprehend speech with a hearing aid or cochlear implant. The tests are usually at the word or sentence level. However, few tests analyze errors at the phoneme level. So, there is a need for an automated program to visualize in real time the accuracy of phonemes in these tests. Method: The program reads in stimulus-response pairs and obtains their phonemic representations from an open-source digital pronouncing dictionary. The stimulus phonemes are aligned with the response phonemes via a modification of the Levenshtein Minimum Edit Distance algorithm. Alignment is achieved via dynamic programming with modified costs based on phonological features for insertion, deletions and substitutions. The accuracy for each phoneme is based on the F1-score. Accuracy is visualized with respect to place and manner (consonants) or height (vowels). Confusion matrices for the phonemes are used in an information transfer analysis of ten phonological features. A histogram of the information transfer for the features over a frequency-like range is presented as a phonemegram. Results: The program was applied to two datasets. One consisted of test data at the sentence and word levels. Stimulus-response sentence pairs from six volunteers with different degrees of hearing loss and modes of amplification were analyzed. Four volunteers listened to sentences from a mobile auditory training app while two listened to sentences from a clinical speech test. Stimulus-response word pairs from three lists were also analyzed. The other dataset consisted of published stimulus-response pairs from experiments of 31 participants with cochlear implants listening to 400 Basic English Lexicon sentences via different talkers at four different SNR levels. In all cases, visualization was obtained in real time. Analysis of 12,400 actual and random pairs showed that the program was robust to the nature of the pairs. Conclusion: It is possible to automate the alignment of phonemes extracted from stimulus-response pairs from speech tests in real time. The alignment then makes it possible to visualize the accuracy of responses via phonological features in two ways. Such visualization of phoneme alignment and accuracy could aid clinicians and scientists.
Collapse
Affiliation(s)
- J Tilak Ratnanather
- Center for Imaging Science and Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Lydia C Wang
- Center for Imaging Science and Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Seung-Ho Bae
- Center for Imaging Science and Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Erin R O'Neill
- Center for Applied and Translational Sensory Sciences, University of Minnesota, Minneapolis, MN, United States
| | - Elad Sagi
- Department of Otolaryngology, New York University School of Medicine, New York, NY, United States
| | - Daniel J Tward
- Center for Imaging Science and Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States.,Departments of Computational Medicine and Neurology, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
13
|
Bent T, Holt RF, Van Engen KJ, Jamsek IA, Arzbecker LJ, Liang L, Brown E. How pronunciation distance impacts word recognition in children and adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:4103. [PMID: 34972309 DOI: 10.1121/10.0008930] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 11/12/2021] [Indexed: 06/14/2023]
Abstract
Although unfamiliar accents can pose word identification challenges for children and adults, few studies have directly compared perception of multiple nonnative and regional accents or quantified how the extent of deviation from the ambient accent impacts word identification accuracy across development. To address these gaps, 5- to 7-year-old children's and adults' word identification accuracy with native (Midland American, British, Scottish), nonnative (German-, Mandarin-, Japanese-accented English) and bilingual (Hindi-English) varieties (one talker per accent) was tested in quiet and noise. Talkers' pronunciation distance from the ambient dialect was quantified at the phoneme level using a Levenshtein algorithm adaptation. Whereas performance was worse on all non-ambient dialects than the ambient one, there were only interactions between talker and age (child vs adult or across age for the children) for a subset of talkers, which did not fall along the native/nonnative divide. Levenshtein distances significantly predicted word recognition accuracy for adults and children in both listening environments with similar impacts in quiet. In noise, children had more difficulty overcoming pronunciations that substantially deviated from ambient dialect norms than adults. Future work should continue investigating how pronunciation distance impacts word recognition accuracy by incorporating distance metrics at other levels of analysis (e.g., phonetic, suprasegmental).
Collapse
Affiliation(s)
- Tessa Bent
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| | - Rachael F Holt
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| | - Izabela A Jamsek
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Lian J Arzbecker
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Laura Liang
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Emma Brown
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
14
|
The lrd package: An R package and Shiny application for processing lexical data. Behav Res Methods 2021; 54:2001-2024. [PMID: 34850358 DOI: 10.3758/s13428-021-01718-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2021] [Indexed: 11/08/2022]
Abstract
Recall testing is a common assessment to gauge memory retrieval. Responses from these tests can be analyzed in several ways; however, the output generated from a recall study typically requires manual coding that can be time intensive and error-prone before analyses can be conducted. To address this issue, this article introduces lrd (Lexical Response Data), a set of open-source tools for quickly and accurately processing lexical response data that can be used either from the R command line or through an R Shiny graphical user interface. First, we provide an overview of this package and include a step-by-step user guide for processing both cued- and free-recall responses. For validation of lrd, we used lrd to recode output from cued, free, and sentence-recall studies with large samples and examined whether the results replicated using lrd-scored data. We then assessed the inter-rater reliability and sensitivity and specificity of the scoring algorithm relative to human-coded data. Overall, lrd is highly reliable and shows excellent sensitivity and specificity, indicating that recall data processed using this package are remarkably consistent with data processed by a human coder.
Collapse
|