Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Miner AS, Haque A, Fries JA, Fleming SL, Wilfley DE, Terence Wilson G, Milstein A, Jurafsky D, Arnow BA, Stewart Agras W, Fei-Fei L, Shah NH. Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ Digit Med 2020;3:82. [PMID: 32550644 PMCID: PMC7270106 DOI: 10.1038/s41746-020-0285-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 04/30/2020] [Indexed: 01/17/2023] Open

For:	Miner AS, Haque A, Fries JA, Fleming SL, Wilfley DE, Terence Wilson G, Milstein A, Jurafsky D, Arnow BA, Stewart Agras W, Fei-Fei L, Shah NH. Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ Digit Med 2020;3:82. [PMID: 32550644 PMCID: PMC7270106 DOI: 10.1038/s41746-020-0285-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 04/30/2020] [Indexed: 01/17/2023] Open

Number

Cited by Other Article(s)

Shen FX, Baum ML, Martinez-Martin N, Miner AS, Abraham M, Brownstein CA, Cortez N, Evans BJ, Germine LT, Glahn DC, Grady C, Holm IA, Hurley EA, Kimble S, Lázaro-Muñoz G, Leary K, Marks M, Monette PJ, Jukka-Pekka O, O’Rourke PP, Rauch SL, Shachar C, Sen S, Vahia I, Vassy JL, Baker JT, Bierer BE, Silverman BC. Returning Individual Research Results from Digital Phenotyping in Psychiatry. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2024;24:69-90. [PMID: 37155651 PMCID: PMC10630534 DOI: 10.1080/15265161.2023.2180109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Seyedi S, Griner E, Corbin L, Jiang Z, Roberts K, Iacobelli L, Milloy A, Boazak M, Bahrami Rad A, Abbasi A, Cotes RO, Clifford GD. Using HIPAA (Health Insurance Portability and Accountability Act)-Compliant Transcription Services for Virtual Psychiatric Interviews: Pilot Comparison Study. JMIR Ment Health 2023;10:e48517. [PMID: 37906217 PMCID: PMC10646674 DOI: 10.2196/48517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/25/2023] [Accepted: 09/12/2023] [Indexed: 11/02/2023] Open

Abstract

BACKGROUND

Automatic speech recognition (ASR) technology is increasingly being used for transcription in clinical contexts. Although there are numerous transcription services using ASR, few studies have compared the word error rate (WER) between different transcription services among different diagnostic groups in a mental health setting. There has also been little research into the types of words ASR transcriptions mistakenly generate or omit.

OBJECTIVE

This study compared the WER of 3 ASR transcription services (Amazon Transcribe [Amazon.com, Inc], Zoom-Otter AI [Zoom Video Communications, Inc], and Whisper [OpenAI Inc]) in interviews across 2 different clinical categories (controls and participants experiencing a variety of mental health conditions). These ASR transcription services were also compared with a commercial human transcription service, Rev (Rev.Com, Inc). Words that were either included or excluded by the error in the transcripts were systematically analyzed by their Linguistic Inquiry and Word Count categories.

METHODS

Participants completed a 1-time research psychiatric interview, which was recorded on a secure server. Transcriptions created by the research team were used as the gold standard from which WER was calculated. The interviewees were categorized into either the control group (n=18) or the mental health condition group (n=47) using the Mini-International Neuropsychiatric Interview. The total sample included 65 participants. Brunner-Munzel tests were used for comparing independent sets, such as the diagnostic groupings, and Wilcoxon signed rank tests were used for correlated samples when comparing the total sample between different transcription services.

RESULTS

There were significant differences between each ASR transcription service's WER (P<.001). Amazon Transcribe's output exhibited significantly lower WERs compared with the Zoom-Otter AI's and Whisper's ASR. ASR performances did not significantly differ across the 2 different clinical categories within each service (P>.05). A comparison between the human transcription service output from Rev and the best-performing ASR (Amazon Transcribe) demonstrated a significant difference (P<.001), with Rev having a slightly lower median WER (7.6%, IQR 5.4%-11.35 vs 8.9%, IQR 6.9%-11.6%). Heat maps and spider plots were used to visualize the most common errors in Linguistic Inquiry and Word Count categories, which were found to be within 3 overarching categories: Conversation, Cognition, and Function.

CONCLUSIONS

Overall, consistent with previous literature, our results suggest that the WER between manual and automated transcription services may be narrowing as ASR services advance. These advances, coupled with decreased cost and time in receiving transcriptions, may make ASR transcriptions a more viable option within health care settings. However, more research is required to determine if errors in specific types of words impact the analysis and usability of these transcriptions, particularly for specific applications and in a variety of populations in terms of clinical diagnosis, literacy level, accent, and cultural origin.

Collapse

Malgaroli M, Hull TD, Zech JM, Althoff T. Natural language processing for mental health interventions: a systematic review and research framework. Transl Psychiatry 2023;13:309. [PMID: 37798296 PMCID: PMC10556019 DOI: 10.1038/s41398-023-02592-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/07/2023] Open

Abstract

Neuropsychiatric disorders pose a high societal cost, but their treatment is hindered by lack of objective outcomes and fidelity metrics. AI technologies and specifically Natural Language Processing (NLP) have emerged as tools to study mental health interventions (MHI) at the level of their constituent conversations. However, NLP's potential to address clinical and research challenges remains unclear. We therefore conducted a pre-registered systematic review of NLP-MHI studies using PRISMA guidelines (osf.io/s52jh) to evaluate their models, clinical applications, and to identify biases and gaps. Candidate studies (n = 19,756), including peer-reviewed AI conference manuscripts, were collected up to January 2023 through PubMed, PsycINFO, Scopus, Google Scholar, and ArXiv. A total of 102 articles were included to investigate their computational characteristics (NLP algorithms, audio features, machine learning pipelines, outcome metrics), clinical characteristics (clinical ground truths, study samples, clinical focus), and limitations. Results indicate a rapid growth of NLP MHI studies since 2019, characterized by increased sample sizes and use of large language models. Digital health platforms were the largest providers of MHI data. Ground truth for supervised learning models was based on clinician ratings (n = 31), patient self-report (n = 29) and annotations by raters (n = 26). Text-based features contributed more to model accuracy than audio markers. Patients' clinical presentation (n = 34), response to intervention (n = 11), intervention monitoring (n = 20), providers' characteristics (n = 12), relational dynamics (n = 14), and data preparation (n = 4) were commonly investigated clinical categories. Limitations of reviewed studies included lack of linguistic diversity, limited reproducibility, and population bias. A research framework is developed and validated (NLPxMHI) to assist computational and clinical researchers in addressing the remaining gaps in applying NLP to MHI, with the goal of improving clinical utility, data access, and fairness.

Collapse

Lee TY, Li CC, Chou KR, Chung MH, Hsiao ST, Guo SL, Hung LY, Wu HT. Machine learning-based speech recognition system for nursing documentation - A pilot study. Int J Med Inform 2023;178:105213. [PMID: 37690224 DOI: 10.1016/j.ijmedinf.2023.105213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 09/01/2023] [Accepted: 09/03/2023] [Indexed: 09/12/2023]

Abstract

PURPOSE

Considering the significant workload of nursing tasks, enhancing the efficiency of nursing documentation is imperative. This study aimed to evaluate the effectiveness of a machine learning-based speech recognition (SR) system in reducing the clinical workload associated with typing nursing records, implemented in a psychiatry ward.

METHODS

The study was conducted between July 15, 2020, and June 30, 2021, at Cheng Hsin General Hospital in Taiwan. The language corpus was based on the existing records from the hospital nursing information system. The participating ward's nursing activities, clinical conversation, and accent data were also collected for deep learning-based SR-engine training. A total of 21 nurses participated in the evaluation of the SR system. Documentation time and recognition error rate were evaluated in parallel between SR-generated records and keyboard entry over 4 sessions. Any differences between SR and keyboard transcriptions were regarded as SR errors.

FINDINGS

A total of 200 data were obtained from four evaluation sessions, 10 participants were asked to use SR and keyboard entry in parallel at each session and 5 entries were collected from each participant. Overall, the SR system processed 30,112 words in 32,456 s (0.928 words per second). The mean accuracy of the SR system improved after each session, from 87.06% in 1st session to 95.07% in 4th session.

CONCLUSION

This pilot study demonstrated our machine learning-based SR system has an acceptable recognition accuracy and may reduce the burden of documentation for nurses. However, the potential error with the SR transcription should continually be recognized and improved. Further studies are needed to improve the integration of SR in digital documentation of nursing records, in terms of both productivity and accuracy across different clinical specialties.

Collapse

Triantafyllopoulos A, Kathan A, Baird A, Christ L, Gebhard A, Gerczuk M, Karas V, Hübner T, Jing X, Liu S, Mallol-Ragolta A, Milling M, Ottl S, Semertzidou A, Rajamani ST, Yan T, Yang Z, Dineley J, Amiriparian S, Bartl-Pokorny KD, Batliner A, Pokorny FB, Schuller BW. HEAR4Health: a blueprint for making computer audition a staple of modern healthcare. Front Digit Health 2023;5:1196079. [PMID: 37767523 PMCID: PMC10520966 DOI: 10.3389/fdgth.2023.1196079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open

Affiliation(s)

Andreas Triantafyllopoulos EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Alexander Kathan EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Alice Baird EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Lukas Christ EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Alexander Gebhard EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Maurice Gerczuk EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Vincent Karas EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Tobias Hübner EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Xin Jing EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Shuo Liu EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Adria Mallol-Ragolta EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
Manuel Milling EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Sandra Ottl EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Anastasia Semertzidou EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Srividya Tirunellai Rajamani EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Tianhao Yan EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Zijiang Yang EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Judith Dineley EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Shahin Amiriparian EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Katrin D. Bartl-Pokorny EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Division of Phoniatrics, Medical University of Graz, Graz, Austria
Anton Batliner EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Florian B. Pokorny EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Division of Phoniatrics, Medical University of Graz, Graz, Austria Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
Björn W. Schuller EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany GLAM – Group on Language, Audio, & Music, Imperial College London, London, United Kingdom

Collapse

Miner AS, Fleming SL, Haque A, Fries JA, Althoff T, Wilfley DE, Agras WS, Milstein A, Hancock J, Asch SM, Stirman SW, Arnow BA, Shah NH. A computational approach to measure the linguistic characteristics of psychotherapy timing, responsiveness, and consistency. NPJ MENTAL HEALTH RESEARCH 2022;1:19. [PMID: 38609510 PMCID: PMC10956022 DOI: 10.1038/s44184-022-00020-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 10/18/2022] [Indexed: 04/14/2024]

Affiliation(s)

Adam S Miner Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA. Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
Scott L Fleming Department of Biomedical Data Science, Stanford University, Stanford, CA, USA Department of Computer Science, Stanford University, Stanford, CA, USA
Albert Haque Department of Computer Science, Stanford University, Stanford, CA, USA
Jason A Fries Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
Tim Althoff Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
Denise E Wilfley Departments of Psychiatry, Medicine, Pediatrics, and Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO, USA
W Stewart Agras Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
Arnold Milstein Clinical Excellence Research Center, Stanford University, Stanford, CA, USA
Jeff Hancock Department of Communication, Stanford University, Stanford, CA, USA
Steven M Asch VA Palo Alto Health Care System, Palo Alto, CA, USA Division of Primary Care and Population Health, Stanford University School of Medicine, Stanford, CA, USA
Shannon Wiltsey Stirman Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA VA Palo Alto Health Care System, Palo Alto, CA, USA National Center for Posttraumatic Stress Disorders, Dissemination and Training Division, VA Palo Alto Healthcare System, Menlo Park, CA, USA
Bruce A Arnow Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
Nigam H Shah Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA Department of Biomedical Data Science, Stanford University, Stanford, CA, USA Clinical Excellence Research Center, Stanford University, Stanford, CA, USA Technology and Digital Solutions, Stanford Healthcare, Stanford, CA, USA

Collapse

Krüger J, Siegert I, Junne F. Künstliche Intelligenz für die Sprachanalyse in der Psychotherapie – Chancen und Risiken. Psychother Psychosom Med Psychol 2022;72:395-396. [DOI: 10.1055/a-1915-2589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Soroski T, da Cunha Vasco T, Newton-Mason S, Granby S, Lewis C, Harisinghani A, Rizzo M, Conati C, Murray G, Carenini G, Field TS, Jang H. Evaluating Web-Based Automatic Transcription for Alzheimer Speech Data: Transcript Comparison and Machine Learning Analysis. JMIR Aging 2022;5:e33460. [PMID: 36129754 PMCID: PMC9536526 DOI: 10.2196/33460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 07/11/2022] [Accepted: 07/23/2022] [Indexed: 11/16/2022] Open

Abstract

Background

Speech data for medical research can be collected noninvasively and in large volumes. Speech analysis has shown promise in diagnosing neurodegenerative disease. To effectively leverage speech data, transcription is important, as there is valuable information contained in lexical content. Manual transcription, while highly accurate, limits the potential scalability and cost savings associated with language-based screening.

Objective

To better understand the use of automatic transcription for classification of neurodegenerative disease, namely, Alzheimer disease (AD), mild cognitive impairment (MCI), or subjective memory complaints (SMC) versus healthy controls, we compared automatically generated transcripts against transcripts that went through manual correction.

Methods

We recruited individuals from a memory clinic (“patients”) with a diagnosis of mild-to-moderate AD, (n=44, 30%), MCI (n=20, 13%), SMC (n=8, 5%), as well as healthy controls (n=77, 52%) living in the community. Participants were asked to describe a standardized picture, read a paragraph, and recall a pleasant life experience. We compared transcripts generated using Google speech-to-text software to manually verified transcripts by examining transcription confidence scores, transcription error rates, and machine learning classification accuracy. For the classification tasks, logistic regression, Gaussian naive Bayes, and random forests were used.

Results

The transcription software showed higher confidence scores (P<.001) and lower error rates (P>.05) for speech from healthy controls compared with patients. Classification models using human-verified transcripts significantly (P<.001) outperformed automatically generated transcript models for both spontaneous speech tasks. This comparison showed no difference in the reading task. Manually adding pauses to transcripts had no impact on classification performance. However, manually correcting both spontaneous speech tasks led to significantly higher performances in the machine learning models.

Conclusions

We found that automatically transcribed speech data could be used to distinguish patients with a diagnosis of AD, MCI, or SMC from controls. We recommend a human verification step to improve the performance of automatic transcripts, especially for spontaneous tasks. Moreover, human verification can focus on correcting errors and adding punctuation to transcripts. However, manual addition of pauses is not needed, which can simplify the human verification step to more efficiently process large volumes of speech data.

Collapse

Maharjan R, Doherty K, Rohani DA, Bækgaard P, Bardram JE. Experiences of a Speech-Enabled Conversational Agent for the Self-Report of Wellbeing Among People Living with Affective Disorders: An In-The-Wild Study. ACM T INTERACT INTEL 2022. [DOI: 10.1145/3484508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Flemotomos N, Martinez VR, Chen Z, Singla K, Ardulov V, Peri R, Caperton DD, Gibson J, Tanana MJ, Georgiou P, Van Epps J, Lord SP, Hirsch T, Imel ZE, Atkins DC, Narayanan S. Automated evaluation of psychotherapy skills using speech and language technologies. Behav Res Methods 2022;54:690-711. [PMID: 34346043 PMCID: PMC8810915 DOI: 10.3758/s13428-021-01623-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2021] [Indexed: 11/08/2022]

Affiliation(s)

Nikolaos Flemotomos Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA.
Victor R Martinez Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
Zhuohao Chen Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
Karan Singla Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
Victor Ardulov Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
Raghuveer Peri Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
Derek D Caperton Department of Educational Psychology, University of Utah, Salt Lake City, Utah, USA
James Gibson Behavioral Signal Technologies Inc., Los Angeles, CA, USA
Michael J Tanana College of Social Work, University of Utah, Salt Lake City, Utah, USA
Panayiotis Georgiou Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
Jake Van Epps University Counseling Center, University of Utah, Salt Lake City, Utah, USA
Sarah P Lord Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
Tad Hirsch Department of Art + Design, Northeastern University, Boston, Massachusetts, USA
Zac E Imel Department of Educational Psychology, University of Utah, Salt Lake City, Utah, USA
David C Atkins Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
Shrikanth Narayanan Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA Behavioral Signal Technologies Inc., Los Angeles, CA, USA

Collapse

Pragt L, van Hengel P, Grob D, Wasmann JWA. Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf. Front Digit Health 2022;4:806076. [PMID: 35252959 PMCID: PMC8889114 DOI: 10.3389/fdgth.2022.806076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 01/18/2022] [Indexed: 11/26/2022] Open

Abstract

Objective

Automated speech recognition (ASR) systems have become increasingly sophisticated, accurate, and deployable on many digital devices, including on a smartphone. This pilot study aims to examine the speech recognition performance of ASR apps using audiological speech tests. In addition, we compare ASR speech recognition performance to normal hearing and hearing impaired listeners and evaluate if standard clinical audiological tests are a meaningful and quick measure of the performance of ASR apps.

Methods

Four apps have been tested on a smartphone, respectively AVA, Earfy, Live Transcribe, and Speechy. The Dutch audiological speech tests performed were speech audiometry in quiet (Dutch CNC-test), Digits-in-Noise (DIN)-test with steady-state speech-shaped noise, sentences in quiet and in averaged long-term speech-shaped spectrum noise (Plomp-test). For comparison, the app's ability to transcribe a spoken dialogue (Dutch and English) was tested.

Results

All apps scored at least 50% phonemes correct on the Dutch CNC-test for a conversational speech intensity level (65 dB SPL) and achieved 90–100% phoneme recognition at higher intensity levels. On the DIN-test, AVA and Live Transcribe had the lowest (best) signal-to-noise ratio +8 dB. The lowest signal-to-noise measured with the Plomp-test was +8 to 9 dB for Earfy (Android) and Live Transcribe (Android). Overall, the word error rate for the dialogue in English (19–34%) was lower (better) than for the Dutch dialogue (25–66%).

Conclusion

The performance of the apps was limited on audiological tests that provide little linguistic context or use low signal to noise levels. For Dutch audiological speech tests in quiet, ASR apps performed similarly to a person with a moderate hearing loss. In noise, the ASR apps performed more poorly than most profoundly deaf people using a hearing aid or cochlear implant. Adding new performance metrics including the semantic difference as a function of SNR and reverberation time could help to monitor and further improve ASR performance.

Collapse

2-level hierarchical depression recognition method based on task-stimulated and integrated speech features. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Monteith S, Glenn T, Geddes J, Whybrow PC, Achtyes E, Bauer M. Expectations for Artificial Intelligence (AI) in Psychiatry. Curr Psychiatry Rep 2022;24:709-721. [PMID: 36214931 PMCID: PMC9549456 DOI: 10.1007/s11920-022-01378-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/15/2022] [Indexed: 01/29/2023]

Bickmore TW, Ólafsson S, O'Leary TK. Mitigating Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: Exploratory Mixed Methods Experiment. J Med Internet Res 2021;23:e30704. [PMID: 34751661 PMCID: PMC8663571 DOI: 10.2196/30704] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/29/2021] [Accepted: 09/18/2021] [Indexed: 02/01/2023] Open

Abstract

BACKGROUND

Prior studies have demonstrated the safety risks when patients and consumers use conversational assistants such as Apple's Siri and Amazon's Alexa for obtaining medical information.

OBJECTIVE

The aim of this study is to evaluate two approaches to reducing the likelihood that patients or consumers will act on the potentially harmful medical information they receive from conversational assistants.

METHODS

Participants were given medical problems to pose to conversational assistants that had been previously demonstrated to result in potentially harmful recommendations. Each conversational assistant's response was randomly varied to include either a correct or incorrect paraphrase of the query or a disclaimer message-or not-telling the participants that they should not act on the advice without first talking to a physician. The participants were then asked what actions they would take based on their interaction, along with the likelihood of taking the action. The reported actions were recorded and analyzed, and the participants were interviewed at the end of each interaction.

RESULTS

A total of 32 participants completed the study, each interacting with 4 conversational assistants. The participants were on average aged 42.44 (SD 14.08) years, 53% (17/32) were women, and 66% (21/32) were college educated. Those participants who heard a correct paraphrase of their query were significantly more likely to state that they would follow the medical advice provided by the conversational assistant (χ²₁=3.1; P=.04). Those participants who heard a disclaimer message were significantly more likely to say that they would contact a physician or health professional before acting on the medical advice received (χ²₁=43.5; P=.001).

CONCLUSIONS

Designers of conversational systems should consider incorporating both disclaimers and feedback on query understanding in response to user queries for medical advice. Unconstrained natural language input should not be used in systems designed specifically to provide medical advice.

Collapse

Alvarez-Alonso MJ, de-la-Peña C, Ortega Z, Scott R. Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12-14 Years-Old Students. Front Psychol 2021;12:574685. [PMID: 33897513 PMCID: PMC8062718 DOI: 10.3389/fpsyg.2021.574685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open

Di Matteo D, Wang W, Fotinos K, Lokuge S, Yu J, Sternat T, Katzman MA, Rose J. Smartphone-Detected Ambient Speech and Self-Reported Measures of Anxiety and Depression: Exploratory Observational Study. JMIR Form Res 2021;5:e22723. [PMID: 33512325 PMCID: PMC7880807 DOI: 10.2196/22723] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/13/2020] [Accepted: 12/24/2020] [Indexed: 12/14/2022] Open

Abstract

Background

The ability to objectively measure the severity of depression and anxiety disorders in a passive manner could have a profound impact on the way in which these disorders are diagnosed, assessed, and treated. Existing studies have demonstrated links between both depression and anxiety and the linguistic properties of words that people use to communicate. Smartphones offer the ability to passively and continuously detect spoken words to monitor and analyze the linguistic properties of speech produced by the speaker and other sources of ambient speech in their environment. The linguistic properties of automatically detected and recognized speech may be used to build objective severity measures of depression and anxiety.

Objective

The aim of this study was to determine if the linguistic properties of words passively detected from environmental audio recorded using a participant’s smartphone can be used to find correlates of symptom severity of social anxiety disorder, generalized anxiety disorder, depression, and general impairment.

Methods

An Android app was designed to collect periodic audiorecordings of participants’ environments and to detect English words using automatic speech recognition. Participants were recruited into a 2-week observational study. The app was installed on the participants’ personal smartphones to record and analyze audio. The participants also completed self-report severity measures of social anxiety disorder, generalized anxiety disorder, depression, and functional impairment. Words detected from audiorecordings were categorized, and correlations were measured between words counts in each category and the 4 self-report measures to determine if any categories could serve as correlates of social anxiety disorder, generalized anxiety disorder, depression, or general impairment.

Results

The participants were 112 adults who resided in Canada from a nonclinical population; 86 participants yielded sufficient data for analysis. Correlations between word counts in 67 word categories and each of the 4 self-report measures revealed a strong relationship between the usage rates of death-related words and depressive symptoms (r=0.41, P<.001). There were also interesting correlations between rates of word usage in the categories of reward-related words with depression (r=–0.22, P=.04) and generalized anxiety (r=–0.29, P=.007), and vision-related words with social anxiety (r=0.31, P=.003).

Conclusions

In this study, words automatically recognized from environmental audio were shown to contain a number of potential associations with severity of depression and anxiety. This work suggests that sparsely sampled audio could provide relevant insight into individuals’ mental health.

Collapse

Sezgin E, Huang Y, Ramtekkar U, Lin S. Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. NPJ Digit Med 2020;3:122. [PMID: 33015374 PMCID: PMC7494948 DOI: 10.1038/s41746-020-00332-0] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/31/2020] [Indexed: 12/13/2022] Open

Haque A, Milstein A, Fei-Fei L. Illuminating the dark spaces of healthcare with ambient intelligence. Nature 2020;585:193-202. [PMID: 32908264 DOI: 10.1038/s41586-020-2669-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 07/14/2020] [Indexed: 11/09/2022]