1
|
Shen FX, Baum ML, Martinez-Martin N, Miner AS, Abraham M, Brownstein CA, Cortez N, Evans BJ, Germine LT, Glahn DC, Grady C, Holm IA, Hurley EA, Kimble S, Lázaro-Muñoz G, Leary K, Marks M, Monette PJ, Jukka-Pekka O, O’Rourke PP, Rauch SL, Shachar C, Sen S, Vahia I, Vassy JL, Baker JT, Bierer BE, Silverman BC. Returning Individual Research Results from Digital Phenotyping in Psychiatry. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2024; 24:69-90. [PMID: 37155651 PMCID: PMC10630534 DOI: 10.1080/15265161.2023.2180109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Psychiatry is rapidly adopting digital phenotyping and artificial intelligence/machine learning tools to study mental illness based on tracking participants' locations, online activity, phone and text message usage, heart rate, sleep, physical activity, and more. Existing ethical frameworks for return of individual research results (IRRs) are inadequate to guide researchers for when, if, and how to return this unprecedented number of potentially sensitive results about each participant's real-world behavior. To address this gap, we convened an interdisciplinary expert working group, supported by a National Institute of Mental Health grant. Building on established guidelines and the emerging norm of returning results in participant-centered research, we present a novel framework specific to the ethical, legal, and social implications of returning IRRs in digital phenotyping research. Our framework offers researchers, clinicians, and Institutional Review Boards (IRBs) urgently needed guidance, and the principles developed here in the context of psychiatry will be readily adaptable to other therapeutic areas.
Collapse
Affiliation(s)
- Francis X. Shen
- Harvard Medical School
- Massachusetts General Hospital
- Harvard Law School
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Mason Marks
- Harvard Law School
- Florida State University College of Law
- Yale Law School
| | | | | | | | - Scott L. Rauch
- Harvard Medical School
- McLean Hospital
- Mass General Brigham
| | | | | | | | - Jason L. Vassy
- Harvard Medical School
- Brigham and Women’s Hospital
- VA Boston Healthcare System
| | | | - Barbara E. Bierer
- Harvard Medical School
- Brigham and Women’s Hospital
- Multi-Regional Clinical Trials Center of Brigham and Women’s Hospital and Harvard
| | | |
Collapse
|
2
|
Seyedi S, Griner E, Corbin L, Jiang Z, Roberts K, Iacobelli L, Milloy A, Boazak M, Bahrami Rad A, Abbasi A, Cotes RO, Clifford GD. Using HIPAA (Health Insurance Portability and Accountability Act)-Compliant Transcription Services for Virtual Psychiatric Interviews: Pilot Comparison Study. JMIR Ment Health 2023; 10:e48517. [PMID: 37906217 PMCID: PMC10646674 DOI: 10.2196/48517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/25/2023] [Accepted: 09/12/2023] [Indexed: 11/02/2023] Open
Abstract
BACKGROUND Automatic speech recognition (ASR) technology is increasingly being used for transcription in clinical contexts. Although there are numerous transcription services using ASR, few studies have compared the word error rate (WER) between different transcription services among different diagnostic groups in a mental health setting. There has also been little research into the types of words ASR transcriptions mistakenly generate or omit. OBJECTIVE This study compared the WER of 3 ASR transcription services (Amazon Transcribe [Amazon.com, Inc], Zoom-Otter AI [Zoom Video Communications, Inc], and Whisper [OpenAI Inc]) in interviews across 2 different clinical categories (controls and participants experiencing a variety of mental health conditions). These ASR transcription services were also compared with a commercial human transcription service, Rev (Rev.Com, Inc). Words that were either included or excluded by the error in the transcripts were systematically analyzed by their Linguistic Inquiry and Word Count categories. METHODS Participants completed a 1-time research psychiatric interview, which was recorded on a secure server. Transcriptions created by the research team were used as the gold standard from which WER was calculated. The interviewees were categorized into either the control group (n=18) or the mental health condition group (n=47) using the Mini-International Neuropsychiatric Interview. The total sample included 65 participants. Brunner-Munzel tests were used for comparing independent sets, such as the diagnostic groupings, and Wilcoxon signed rank tests were used for correlated samples when comparing the total sample between different transcription services. RESULTS There were significant differences between each ASR transcription service's WER (P<.001). Amazon Transcribe's output exhibited significantly lower WERs compared with the Zoom-Otter AI's and Whisper's ASR. ASR performances did not significantly differ across the 2 different clinical categories within each service (P>.05). A comparison between the human transcription service output from Rev and the best-performing ASR (Amazon Transcribe) demonstrated a significant difference (P<.001), with Rev having a slightly lower median WER (7.6%, IQR 5.4%-11.35 vs 8.9%, IQR 6.9%-11.6%). Heat maps and spider plots were used to visualize the most common errors in Linguistic Inquiry and Word Count categories, which were found to be within 3 overarching categories: Conversation, Cognition, and Function. CONCLUSIONS Overall, consistent with previous literature, our results suggest that the WER between manual and automated transcription services may be narrowing as ASR services advance. These advances, coupled with decreased cost and time in receiving transcriptions, may make ASR transcriptions a more viable option within health care settings. However, more research is required to determine if errors in specific types of words impact the analysis and usability of these transcriptions, particularly for specific applications and in a variety of populations in terms of clinical diagnosis, literacy level, accent, and cultural origin.
Collapse
Affiliation(s)
- Salman Seyedi
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
| | - Emily Griner
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States
| | - Lisette Corbin
- Department of Psychiatry, Duke University Health, Durham, NC, United States
| | - Zifan Jiang
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Kailey Roberts
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | - Luca Iacobelli
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States
| | - Aaron Milloy
- Infection Prevention Department, Emory Healthcare, Atlanta, GA, United States
| | - Mina Boazak
- Animo Sano Psychiatry, Durham, NC, United States
| | - Ali Bahrami Rad
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
| | - Ahmed Abbasi
- Department of Information Technology, Analytics, and Operations, University of Notre Dame, Notre Dame, IN, United States
| | - Robert O Cotes
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, United States
| | - Gari D Clifford
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| |
Collapse
|
3
|
Malgaroli M, Hull TD, Zech JM, Althoff T. Natural language processing for mental health interventions: a systematic review and research framework. Transl Psychiatry 2023; 13:309. [PMID: 37798296 PMCID: PMC10556019 DOI: 10.1038/s41398-023-02592-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/07/2023] Open
Abstract
Neuropsychiatric disorders pose a high societal cost, but their treatment is hindered by lack of objective outcomes and fidelity metrics. AI technologies and specifically Natural Language Processing (NLP) have emerged as tools to study mental health interventions (MHI) at the level of their constituent conversations. However, NLP's potential to address clinical and research challenges remains unclear. We therefore conducted a pre-registered systematic review of NLP-MHI studies using PRISMA guidelines (osf.io/s52jh) to evaluate their models, clinical applications, and to identify biases and gaps. Candidate studies (n = 19,756), including peer-reviewed AI conference manuscripts, were collected up to January 2023 through PubMed, PsycINFO, Scopus, Google Scholar, and ArXiv. A total of 102 articles were included to investigate their computational characteristics (NLP algorithms, audio features, machine learning pipelines, outcome metrics), clinical characteristics (clinical ground truths, study samples, clinical focus), and limitations. Results indicate a rapid growth of NLP MHI studies since 2019, characterized by increased sample sizes and use of large language models. Digital health platforms were the largest providers of MHI data. Ground truth for supervised learning models was based on clinician ratings (n = 31), patient self-report (n = 29) and annotations by raters (n = 26). Text-based features contributed more to model accuracy than audio markers. Patients' clinical presentation (n = 34), response to intervention (n = 11), intervention monitoring (n = 20), providers' characteristics (n = 12), relational dynamics (n = 14), and data preparation (n = 4) were commonly investigated clinical categories. Limitations of reviewed studies included lack of linguistic diversity, limited reproducibility, and population bias. A research framework is developed and validated (NLPxMHI) to assist computational and clinical researchers in addressing the remaining gaps in applying NLP to MHI, with the goal of improving clinical utility, data access, and fairness.
Collapse
Affiliation(s)
- Matteo Malgaroli
- Department of Psychiatry, New York University, Grossman School of Medicine, New York, NY, 10016, USA.
| | | | - James M Zech
- Talkspace, New York, NY, 10025, USA
- Department of Psychology, Florida State University, Tallahassee, FL, 32306, USA
| | - Tim Althoff
- Department of Computer Science, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
4
|
Lee TY, Li CC, Chou KR, Chung MH, Hsiao ST, Guo SL, Hung LY, Wu HT. Machine learning-based speech recognition system for nursing documentation - A pilot study. Int J Med Inform 2023; 178:105213. [PMID: 37690224 DOI: 10.1016/j.ijmedinf.2023.105213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 09/01/2023] [Accepted: 09/03/2023] [Indexed: 09/12/2023]
Abstract
PURPOSE Considering the significant workload of nursing tasks, enhancing the efficiency of nursing documentation is imperative. This study aimed to evaluate the effectiveness of a machine learning-based speech recognition (SR) system in reducing the clinical workload associated with typing nursing records, implemented in a psychiatry ward. METHODS The study was conducted between July 15, 2020, and June 30, 2021, at Cheng Hsin General Hospital in Taiwan. The language corpus was based on the existing records from the hospital nursing information system. The participating ward's nursing activities, clinical conversation, and accent data were also collected for deep learning-based SR-engine training. A total of 21 nurses participated in the evaluation of the SR system. Documentation time and recognition error rate were evaluated in parallel between SR-generated records and keyboard entry over 4 sessions. Any differences between SR and keyboard transcriptions were regarded as SR errors. FINDINGS A total of 200 data were obtained from four evaluation sessions, 10 participants were asked to use SR and keyboard entry in parallel at each session and 5 entries were collected from each participant. Overall, the SR system processed 30,112 words in 32,456 s (0.928 words per second). The mean accuracy of the SR system improved after each session, from 87.06% in 1st session to 95.07% in 4th session. CONCLUSION This pilot study demonstrated our machine learning-based SR system has an acceptable recognition accuracy and may reduce the burden of documentation for nurses. However, the potential error with the SR transcription should continually be recognized and improved. Further studies are needed to improve the integration of SR in digital documentation of nursing records, in terms of both productivity and accuracy across different clinical specialties.
Collapse
Affiliation(s)
- Tso-Ying Lee
- Director of Nursing Research Center, Nursing Department, Taipei Medical University Hospital, Taipei, Taiwan; Associate Professor, School of Nursing, College of Nursing, Taipei Medical University, Taipei, Taiwan.
| | - Chin-Ching Li
- Assistant Professor, Department of Nursing, Mackay Medical College, New Taipei City, Taiwan
| | - Kuei-Ru Chou
- Professor, College of Nursing, Taipei Medical University, Taipei, Taiwan
| | - Min-Huey Chung
- Professor, College of Nursing, Taipei Medical University, Taipei, Taiwan
| | - Shu-Tai Hsiao
- Vice President, Taipei Medical University Hospital, Taipei, Taiwan
| | - Shu-Liu Guo
- Director of Nursing Department, Taipei Medical University Hospital, Taipei, Taiwan
| | - Lung-Yun Hung
- Head Nurse, Nursing Department, Cheng Hsin General Hospital, Taipei, Taiwan
| | - Hao-Ting Wu
- Head Nurse, Nursing Department, Cheng Hsin General Hospital, Taipei, Taiwan
| |
Collapse
|
5
|
Triantafyllopoulos A, Kathan A, Baird A, Christ L, Gebhard A, Gerczuk M, Karas V, Hübner T, Jing X, Liu S, Mallol-Ragolta A, Milling M, Ottl S, Semertzidou A, Rajamani ST, Yan T, Yang Z, Dineley J, Amiriparian S, Bartl-Pokorny KD, Batliner A, Pokorny FB, Schuller BW. HEAR4Health: a blueprint for making computer audition a staple of modern healthcare. Front Digit Health 2023; 5:1196079. [PMID: 37767523 PMCID: PMC10520966 DOI: 10.3389/fdgth.2023.1196079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. Thus, we provide an overview and perspective of HEAR4Health: the sketch of a modern, ubiquitous sensing system that can bring computer audition on par with other AI technologies in the strive for improved healthcare systems.
Collapse
Affiliation(s)
- Andreas Triantafyllopoulos
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alexander Kathan
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alice Baird
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Lukas Christ
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alexander Gebhard
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Maurice Gerczuk
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Vincent Karas
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Tobias Hübner
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Xin Jing
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Shuo Liu
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Adria Mallol-Ragolta
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
| | - Manuel Milling
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Sandra Ottl
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Anastasia Semertzidou
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | | | - Tianhao Yan
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Zijiang Yang
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Judith Dineley
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Shahin Amiriparian
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Katrin D. Bartl-Pokorny
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Division of Phoniatrics, Medical University of Graz, Graz, Austria
| | - Anton Batliner
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Florian B. Pokorny
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Division of Phoniatrics, Medical University of Graz, Graz, Austria
- Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
| | - Björn W. Schuller
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
- GLAM – Group on Language, Audio, & Music, Imperial College London, London, United Kingdom
| |
Collapse
|
6
|
Miner AS, Fleming SL, Haque A, Fries JA, Althoff T, Wilfley DE, Agras WS, Milstein A, Hancock J, Asch SM, Stirman SW, Arnow BA, Shah NH. A computational approach to measure the linguistic characteristics of psychotherapy timing, responsiveness, and consistency. NPJ MENTAL HEALTH RESEARCH 2022; 1:19. [PMID: 38609510 PMCID: PMC10956022 DOI: 10.1038/s44184-022-00020-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 10/18/2022] [Indexed: 04/14/2024]
Abstract
Although individual psychotherapy is generally effective for a range of mental health conditions, little is known about the moment-to-moment language use of effective therapists. Increased access to computational power, coupled with a rise in computer-mediated communication (telehealth), makes feasible the large-scale analyses of language use during psychotherapy. Transparent methodological approaches are lacking, however. Here we present novel methods to increase the efficiency of efforts to examine language use in psychotherapy. We evaluate three important aspects of therapist language use - timing, responsiveness, and consistency - across five clinically relevant language domains: pronouns, time orientation, emotional polarity, therapist tactics, and paralinguistic style. We find therapist language is dynamic within sessions, responds to patient language, and relates to patient symptom diagnosis but not symptom severity. Our results demonstrate that analyzing therapist language at scale is feasible and may help answer longstanding questions about specific behaviors of effective therapists.
Collapse
Affiliation(s)
- Adam S Miner
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
| | - Scott L Fleming
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Albert Haque
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jason A Fries
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Tim Althoff
- Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Denise E Wilfley
- Departments of Psychiatry, Medicine, Pediatrics, and Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO, USA
| | - W Stewart Agras
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Arnold Milstein
- Clinical Excellence Research Center, Stanford University, Stanford, CA, USA
| | - Jeff Hancock
- Department of Communication, Stanford University, Stanford, CA, USA
| | - Steven M Asch
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Division of Primary Care and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - Shannon Wiltsey Stirman
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- National Center for Posttraumatic Stress Disorders, Dissemination and Training Division, VA Palo Alto Healthcare System, Menlo Park, CA, USA
| | - Bruce A Arnow
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford University, Stanford, CA, USA
- Technology and Digital Solutions, Stanford Healthcare, Stanford, CA, USA
| |
Collapse
|
7
|
Krüger J, Siegert I, Junne F. Künstliche Intelligenz für die Sprachanalyse in der
Psychotherapie – Chancen und Risiken. Psychother Psychosom Med Psychol 2022; 72:395-396. [DOI: 10.1055/a-1915-2589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Julia Krüger
- Universitätsklinik für Psychosomatische Medizin und
Psychotherapie, Medizinische Fakultät,
Otto-von-Guericke-Universität Magdeburg
| | - Ingo Siegert
- Fachgebiet Mobile Dialogsysteme, Institut für Informations- und
Kommunikationstechnik, Fakultät für Elektrotechnik,
Otto-von-Guericke-Universität Magdeburg
| | - Florian Junne
- Universitätsklinik für Psychosomatische Medizin und
Psychotherapie, Medizinische Fakultät,
Otto-von-Guericke-Universität Magdeburg
| |
Collapse
|
8
|
Soroski T, da Cunha Vasco T, Newton-Mason S, Granby S, Lewis C, Harisinghani A, Rizzo M, Conati C, Murray G, Carenini G, Field TS, Jang H. Evaluating Web-Based Automatic Transcription for Alzheimer Speech Data: Transcript Comparison and Machine Learning Analysis. JMIR Aging 2022; 5:e33460. [PMID: 36129754 PMCID: PMC9536526 DOI: 10.2196/33460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 07/11/2022] [Accepted: 07/23/2022] [Indexed: 11/16/2022] Open
Abstract
Background Speech data for medical research can be collected noninvasively and in large volumes. Speech analysis has shown promise in diagnosing neurodegenerative disease. To effectively leverage speech data, transcription is important, as there is valuable information contained in lexical content. Manual transcription, while highly accurate, limits the potential scalability and cost savings associated with language-based screening. Objective To better understand the use of automatic transcription for classification of neurodegenerative disease, namely, Alzheimer disease (AD), mild cognitive impairment (MCI), or subjective memory complaints (SMC) versus healthy controls, we compared automatically generated transcripts against transcripts that went through manual correction. Methods We recruited individuals from a memory clinic (“patients”) with a diagnosis of mild-to-moderate AD, (n=44, 30%), MCI (n=20, 13%), SMC (n=8, 5%), as well as healthy controls (n=77, 52%) living in the community. Participants were asked to describe a standardized picture, read a paragraph, and recall a pleasant life experience. We compared transcripts generated using Google speech-to-text software to manually verified transcripts by examining transcription confidence scores, transcription error rates, and machine learning classification accuracy. For the classification tasks, logistic regression, Gaussian naive Bayes, and random forests were used. Results The transcription software showed higher confidence scores (P<.001) and lower error rates (P>.05) for speech from healthy controls compared with patients. Classification models using human-verified transcripts significantly (P<.001) outperformed automatically generated transcript models for both spontaneous speech tasks. This comparison showed no difference in the reading task. Manually adding pauses to transcripts had no impact on classification performance. However, manually correcting both spontaneous speech tasks led to significantly higher performances in the machine learning models. Conclusions We found that automatically transcribed speech data could be used to distinguish patients with a diagnosis of AD, MCI, or SMC from controls. We recommend a human verification step to improve the performance of automatic transcripts, especially for spontaneous tasks. Moreover, human verification can focus on correcting errors and adding punctuation to transcripts. However, manual addition of pauses is not needed, which can simplify the human verification step to more efficiently process large volumes of speech data.
Collapse
Affiliation(s)
- Thomas Soroski
- Vancouver Stroke Program and Division of Neurology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Thiago da Cunha Vasco
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| | - Sally Newton-Mason
- Vancouver Stroke Program and Division of Neurology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Saffrin Granby
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| | - Caitlin Lewis
- Vancouver Stroke Program and Division of Neurology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Anuj Harisinghani
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| | - Matteo Rizzo
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| | - Cristina Conati
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| | - Gabriel Murray
- School of Computing, University of the Fraser Valley, Abbotsford, BC, Canada
| | - Giuseppe Carenini
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| | - Thalia S Field
- Vancouver Stroke Program and Division of Neurology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Hyeju Jang
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
9
|
Maharjan R, Doherty K, Rohani DA, Bækgaard P, Bardram JE. Experiences of a Speech-Enabled Conversational Agent for the Self-Report of Wellbeing Among People Living with Affective Disorders: An In-The-Wild Study. ACM T INTERACT INTEL 2022. [DOI: 10.1145/3484508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The growing commercial success of smart speaker devices following recent advancements in speech recognition technology has surfaced new opportunities for collecting self-reported health and wellbeing data. Speech-enabled conversational agents (CAs) in particular, deployed in home environments using just such systems, may offer increasingly intuitive and engaging means of self-report. To date, however, few real-world studies have examined users’ experiences of engaging in the self-report of mental health using such devices, nor the challenges of deploying these systems in the home context. With these aims in mind, this paper recounts findings from a four-week ‘in-the-wild’ study during which 20 individuals with depression or bipolar disorder used a speech-enabled CA named ‘Sofia’ to maintain a daily diary log, responding also to the WHO-5 wellbeing scale every two weeks. Thematic analysis of post-study interviews highlights actions taken by participants to overcome CAs’ limitations, diverse personifications of a speech-enabled agent, and unique forms of valuing of this system among users’ personal and social circles. These findings serve as initial evidence for the potential of CAs to support the self-report of mental health and wellbeing, while highlighting the need to address outstanding technical limitations in addition to design challenges of conversational pattern matching, filling unmet interpersonal gaps, and the use of self-report CAs in the at-home social context. Based on these insights, we discuss implications for the future design of CAs to support the self-report of mental health and wellbeing.
Collapse
Affiliation(s)
- Raju Maharjan
- Department of Health Technology, Technical University of Denmark, Denmark
| | - Kevin Doherty
- Department of Health Technology, Technical University of Denmark, Denmark
| | - Darius Adam Rohani
- Department of Health Technology, Technical University of Denmark, Denmark
| | - Per Bækgaard
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Denmark
| | - Jakob E. Bardram
- Department of Health Technology, Technical University of Denmark, Denmark
| |
Collapse
|
10
|
Flemotomos N, Martinez VR, Chen Z, Singla K, Ardulov V, Peri R, Caperton DD, Gibson J, Tanana MJ, Georgiou P, Van Epps J, Lord SP, Hirsch T, Imel ZE, Atkins DC, Narayanan S. Automated evaluation of psychotherapy skills using speech and language technologies. Behav Res Methods 2022; 54:690-711. [PMID: 34346043 PMCID: PMC8810915 DOI: 10.3758/s13428-021-01623-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2021] [Indexed: 11/08/2022]
Abstract
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is, however, a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called "motivational interviewing", our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.
Collapse
Affiliation(s)
- Nikolaos Flemotomos
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA.
| | - Victor R Martinez
- Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
| | - Zhuohao Chen
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
| | - Karan Singla
- Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
| | - Victor Ardulov
- Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
| | - Raghuveer Peri
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
| | - Derek D Caperton
- Department of Educational Psychology, University of Utah, Salt Lake City, Utah, USA
| | - James Gibson
- Behavioral Signal Technologies Inc., Los Angeles, CA, USA
| | - Michael J Tanana
- College of Social Work, University of Utah, Salt Lake City, Utah, USA
| | - Panayiotis Georgiou
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
| | - Jake Van Epps
- University Counseling Center, University of Utah, Salt Lake City, Utah, USA
| | - Sarah P Lord
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
| | - Tad Hirsch
- Department of Art + Design, Northeastern University, Boston, Massachusetts, USA
| | - Zac E Imel
- Department of Educational Psychology, University of Utah, Salt Lake City, Utah, USA
| | - David C Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
| | - Shrikanth Narayanan
- Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
- Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
- Behavioral Signal Technologies Inc., Los Angeles, CA, USA
| |
Collapse
|
11
|
Pragt L, van Hengel P, Grob D, Wasmann JWA. Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf. Front Digit Health 2022; 4:806076. [PMID: 35252959 PMCID: PMC8889114 DOI: 10.3389/fdgth.2022.806076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 01/18/2022] [Indexed: 11/26/2022] Open
Abstract
Objective Automated speech recognition (ASR) systems have become increasingly sophisticated, accurate, and deployable on many digital devices, including on a smartphone. This pilot study aims to examine the speech recognition performance of ASR apps using audiological speech tests. In addition, we compare ASR speech recognition performance to normal hearing and hearing impaired listeners and evaluate if standard clinical audiological tests are a meaningful and quick measure of the performance of ASR apps. Methods Four apps have been tested on a smartphone, respectively AVA, Earfy, Live Transcribe, and Speechy. The Dutch audiological speech tests performed were speech audiometry in quiet (Dutch CNC-test), Digits-in-Noise (DIN)-test with steady-state speech-shaped noise, sentences in quiet and in averaged long-term speech-shaped spectrum noise (Plomp-test). For comparison, the app's ability to transcribe a spoken dialogue (Dutch and English) was tested. Results All apps scored at least 50% phonemes correct on the Dutch CNC-test for a conversational speech intensity level (65 dB SPL) and achieved 90–100% phoneme recognition at higher intensity levels. On the DIN-test, AVA and Live Transcribe had the lowest (best) signal-to-noise ratio +8 dB. The lowest signal-to-noise measured with the Plomp-test was +8 to 9 dB for Earfy (Android) and Live Transcribe (Android). Overall, the word error rate for the dialogue in English (19–34%) was lower (better) than for the Dutch dialogue (25–66%). Conclusion The performance of the apps was limited on audiological tests that provide little linguistic context or use low signal to noise levels. For Dutch audiological speech tests in quiet, ASR apps performed similarly to a person with a moderate hearing loss. In noise, the ASR apps performed more poorly than most profoundly deaf people using a hearing aid or cochlear implant. Adding new performance metrics including the semantic difference as a function of SNR and reverberation time could help to monitor and further improve ASR performance.
Collapse
Affiliation(s)
- Leontien Pragt
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Nijmegen, Netherlands
- *Correspondence: Leontien Pragt
| | - Peter van Hengel
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Nijmegen, Netherlands
- Pento Audiological Center Twente, Hengelo, Netherlands
| | - Dagmar Grob
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, Netherlands
| | - Jan-Willem A. Wasmann
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, Nijmegen, Netherlands
| |
Collapse
|
12
|
2-level hierarchical depression recognition method based on task-stimulated and integrated speech features. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
13
|
Monteith S, Glenn T, Geddes J, Whybrow PC, Achtyes E, Bauer M. Expectations for Artificial Intelligence (AI) in Psychiatry. Curr Psychiatry Rep 2022; 24:709-721. [PMID: 36214931 PMCID: PMC9549456 DOI: 10.1007/s11920-022-01378-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/15/2022] [Indexed: 01/29/2023]
Abstract
PURPOSE OF REVIEW Artificial intelligence (AI) is often presented as a transformative technology for clinical medicine even though the current technology maturity of AI is low. The purpose of this narrative review is to describe the complex reasons for the low technology maturity and set realistic expectations for the safe, routine use of AI in clinical medicine. RECENT FINDINGS For AI to be productive in clinical medicine, many diverse factors that contribute to the low maturity level need to be addressed. These include technical problems such as data quality, dataset shift, black-box opacity, validation and regulatory challenges, and human factors such as a lack of education in AI, workflow changes, automation bias, and deskilling. There will also be new and unanticipated safety risks with the introduction of AI. The solutions to these issues are complex and will take time to discover, develop, validate, and implement. However, addressing the many problems in a methodical manner will expedite the safe and beneficial use of AI to augment medical decision making in psychiatry.
Collapse
Affiliation(s)
- Scott Monteith
- Michigan State University College of Human Medicine, Traverse City Campus, Traverse City, MI, 49684, USA.
| | | | - John Geddes
- Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK
| | - Peter C. Whybrow
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles (UCLA), Los Angeles, CA USA
| | - Eric Achtyes
- Michigan State University College of Human Medicine, Grand Rapids, MI 49684 USA ,Network180, Grand Rapids, MI USA
| | - Michael Bauer
- Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus Medical Faculty, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
14
|
Bickmore TW, Ólafsson S, O'Leary TK. Mitigating Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: Exploratory Mixed Methods Experiment. J Med Internet Res 2021; 23:e30704. [PMID: 34751661 PMCID: PMC8663571 DOI: 10.2196/30704] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 08/29/2021] [Accepted: 09/18/2021] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Prior studies have demonstrated the safety risks when patients and consumers use conversational assistants such as Apple's Siri and Amazon's Alexa for obtaining medical information. OBJECTIVE The aim of this study is to evaluate two approaches to reducing the likelihood that patients or consumers will act on the potentially harmful medical information they receive from conversational assistants. METHODS Participants were given medical problems to pose to conversational assistants that had been previously demonstrated to result in potentially harmful recommendations. Each conversational assistant's response was randomly varied to include either a correct or incorrect paraphrase of the query or a disclaimer message-or not-telling the participants that they should not act on the advice without first talking to a physician. The participants were then asked what actions they would take based on their interaction, along with the likelihood of taking the action. The reported actions were recorded and analyzed, and the participants were interviewed at the end of each interaction. RESULTS A total of 32 participants completed the study, each interacting with 4 conversational assistants. The participants were on average aged 42.44 (SD 14.08) years, 53% (17/32) were women, and 66% (21/32) were college educated. Those participants who heard a correct paraphrase of their query were significantly more likely to state that they would follow the medical advice provided by the conversational assistant (χ21=3.1; P=.04). Those participants who heard a disclaimer message were significantly more likely to say that they would contact a physician or health professional before acting on the medical advice received (χ21=43.5; P=.001). CONCLUSIONS Designers of conversational systems should consider incorporating both disclaimers and feedback on query understanding in response to user queries for medical advice. Unconstrained natural language input should not be used in systems designed specifically to provide medical advice.
Collapse
Affiliation(s)
- Timothy W Bickmore
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | - Stefán Ólafsson
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | - Teresa K O'Leary
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| |
Collapse
|
15
|
Alvarez-Alonso MJ, de-la-Peña C, Ortega Z, Scott R. Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12-14 Years-Old Students. Front Psychol 2021; 12:574685. [PMID: 33897513 PMCID: PMC8062718 DOI: 10.3389/fpsyg.2021.574685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
Quality of language comprehension determines performance in all kinds of activities including academics. Processing of words initially develops as auditory, and gradually extends to visual as children learn to read. School failure is highly related to listening and reading comprehension problems. In this study we analyzed sex-differences in comprehension of texts in Spanish (standardized reading test PROLEC-R) in three modalities (visual, auditory, and both simultaneously: dual-modality) presented to 12-14-years old students, native in Spanish. We controlled relevant cognitive variables such as attention (d2), phonological and semantic fluency (FAS) and speed of processing (WISC subtest Coding). Girls' comprehension was similar in the three modalities of presentation, however boys were importantly benefited by dual-modality as compared to boys exposed only to visual or auditory text presentation. With respect to the relation of text comprehension and school performance, students with low grades in Spanish showed low auditory comprehension. Interestingly, visual and dual modalities preserved comprehension levels in these low skilled students. Our results suggest that the use of visual-text support during auditory language presentation could be beneficial for low school performance students, especially boys, and encourage future research to evaluate the implementation in classes of the rapidly developing technology of simultaneous speech transcription, that could be, in addition, beneficial to non-native students, especially those recently incorporated into school or newly arrived in a country from abroad.
Collapse
Affiliation(s)
- Maria Jose Alvarez-Alonso
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain
| | - Cristina de-la-Peña
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain
| | - Zaira Ortega
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain
| | - Ricardo Scott
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain.,Departamento de Psicología Evolutiva y Didáctica, Universidad de Alicante, Alicante, Spain
| |
Collapse
|
16
|
Di Matteo D, Wang W, Fotinos K, Lokuge S, Yu J, Sternat T, Katzman MA, Rose J. Smartphone-Detected Ambient Speech and Self-Reported Measures of Anxiety and Depression: Exploratory Observational Study. JMIR Form Res 2021; 5:e22723. [PMID: 33512325 PMCID: PMC7880807 DOI: 10.2196/22723] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/13/2020] [Accepted: 12/24/2020] [Indexed: 12/14/2022] Open
Abstract
Background The ability to objectively measure the severity of depression and anxiety disorders in a passive manner could have a profound impact on the way in which these disorders are diagnosed, assessed, and treated. Existing studies have demonstrated links between both depression and anxiety and the linguistic properties of words that people use to communicate. Smartphones offer the ability to passively and continuously detect spoken words to monitor and analyze the linguistic properties of speech produced by the speaker and other sources of ambient speech in their environment. The linguistic properties of automatically detected and recognized speech may be used to build objective severity measures of depression and anxiety. Objective The aim of this study was to determine if the linguistic properties of words passively detected from environmental audio recorded using a participant’s smartphone can be used to find correlates of symptom severity of social anxiety disorder, generalized anxiety disorder, depression, and general impairment. Methods An Android app was designed to collect periodic audiorecordings of participants’ environments and to detect English words using automatic speech recognition. Participants were recruited into a 2-week observational study. The app was installed on the participants’ personal smartphones to record and analyze audio. The participants also completed self-report severity measures of social anxiety disorder, generalized anxiety disorder, depression, and functional impairment. Words detected from audiorecordings were categorized, and correlations were measured between words counts in each category and the 4 self-report measures to determine if any categories could serve as correlates of social anxiety disorder, generalized anxiety disorder, depression, or general impairment. Results The participants were 112 adults who resided in Canada from a nonclinical population; 86 participants yielded sufficient data for analysis. Correlations between word counts in 67 word categories and each of the 4 self-report measures revealed a strong relationship between the usage rates of death-related words and depressive symptoms (r=0.41, P<.001). There were also interesting correlations between rates of word usage in the categories of reward-related words with depression (r=–0.22, P=.04) and generalized anxiety (r=–0.29, P=.007), and vision-related words with social anxiety (r=0.31, P=.003). Conclusions In this study, words automatically recognized from environmental audio were shown to contain a number of potential associations with severity of depression and anxiety. This work suggests that sparsely sampled audio could provide relevant insight into individuals’ mental health.
Collapse
Affiliation(s)
- Daniel Di Matteo
- The Centre for Automation of Medicine, The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Wendy Wang
- The Centre for Automation of Medicine, The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Kathryn Fotinos
- START Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
| | | | - Julia Yu
- START Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
| | - Tia Sternat
- START Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada.,Department of Psychology, Adler Graduate Professional School, Toronto, ON, Canada
| | - Martin A Katzman
- START Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada.,Department of Psychology, Adler Graduate Professional School, Toronto, ON, Canada.,Department of Psychology, Lakehead University, Thunder Bay, ON, Canada.,The Northern Ontario School of Medicine, Thunder Bay, ON, Canada
| | - Jonathan Rose
- The Centre for Automation of Medicine, The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
17
|
Sezgin E, Huang Y, Ramtekkar U, Lin S. Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. NPJ Digit Med 2020; 3:122. [PMID: 33015374 PMCID: PMC7494948 DOI: 10.1038/s41746-020-00332-0] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/31/2020] [Indexed: 12/13/2022] Open
Abstract
To prevent the spread of COVID-19 and to continue responding to healthcare needs, hospitals are rapidly adopting telehealth and other digital health tools to deliver care remotely. Intelligent conversational agents and virtual assistants, such as chatbots and voice assistants, have been utilized to augment health service capacity to screen symptoms, deliver healthcare information, and reduce exposure. In this commentary, we examined the state of voice assistants (e.g., Google Assistant, Apple Siri, Amazon Alexa) as an emerging tool for remote healthcare delivery service and discussed the readiness of the health system and technology providers to adapt voice assistants as an alternative healthcare delivery modality during a health crisis and pandemic.
Collapse
Affiliation(s)
- Emre Sezgin
- Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH 43205 USA
| | - Yungui Huang
- Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH 43205 USA
| | - Ujjwal Ramtekkar
- Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH 43205 USA
| | - Simon Lin
- Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, OH 43205 USA
| |
Collapse
|
18
|
Haque A, Milstein A, Fei-Fei L. Illuminating the dark spaces of healthcare with ambient intelligence. Nature 2020; 585:193-202. [PMID: 32908264 DOI: 10.1038/s41586-020-2669-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 07/14/2020] [Indexed: 11/09/2022]
Abstract
Advances in machine learning and contactless sensors have given rise to ambient intelligence-physical spaces that are sensitive and responsive to the presence of humans. Here we review how this technology could improve our understanding of the metaphorically dark, unobserved spaces of healthcare. In hospital spaces, early applications could soon enable more efficient clinical workflows and improved patient safety in intensive care units and operating rooms. In daily living spaces, ambient intelligence could prolong the independence of older individuals and improve the management of individuals with a chronic disease by understanding everyday behaviour. Similar to other technologies, transformation into clinical applications at scale must overcome challenges such as rigorous clinical validation, appropriate data privacy and model transparency. Thoughtful use of this technology would enable us to understand the complex interplay between the physical environment and health-critical human behaviours.
Collapse
Affiliation(s)
- Albert Haque
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Arnold Milstein
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Li Fei-Fei
- Department of Computer Science, Stanford University, Stanford, CA, USA. .,Stanford Institute for Human-Centered Artificial Intelligence, Stanford University, Stanford, CA, USA.
| |
Collapse
|