1
|
Frank AC, Li R, Peterson BS, Narayanan SS. Wearable and Mobile Technologies for the Evaluation and Treatment of Obsessive-Compulsive Disorder: Scoping Review. JMIR Ment Health 2023; 10:e45572. [PMID: 37463010 PMCID: PMC10394606 DOI: 10.2196/45572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 05/27/2023] [Accepted: 06/13/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND Smartphones and wearable biosensors can continuously and passively measure aspects of behavior and physiology while also collecting data that require user input. These devices can potentially be used to monitor symptom burden; estimate diagnosis and risk for relapse; predict treatment response; and deliver digital interventions in patients with obsessive-compulsive disorder (OCD), a prevalent and disabling psychiatric condition that often follows a chronic and fluctuating course and may uniquely benefit from these technologies. OBJECTIVE Given the speed at which mobile and wearable technologies are being developed and implemented in clinical settings, a continual reappraisal of this field is needed. In this scoping review, we map the literature on the use of wearable devices and smartphone-based devices or apps in the assessment, monitoring, or treatment of OCD. METHODS In July 2022 and April 2023, we conducted an initial search and an updated search, respectively, of multiple databases, including PubMed, Embase, APA PsycINFO, and Web of Science, with no restriction on publication period, using the following search strategy: ("OCD" OR "obsessive" OR "obsessive-compulsive") AND ("smartphone" OR "phone" OR "wearable" OR "sensing" OR "biofeedback" OR "neurofeedback" OR "neuro feedback" OR "digital" OR "phenotyping" OR "mobile" OR "heart rate variability" OR "actigraphy" OR "actimetry" OR "biosignals" OR "biomarker" OR "signals" OR "mobile health"). RESULTS We analyzed 2748 articles, reviewed the full text of 77 articles, and extracted data from the 25 articles included in this review. We divided our review into the following three parts: studies without digital or mobile intervention and with passive data collection, studies without digital or mobile intervention and with active or mixed data collection, and studies with a digital or mobile intervention. CONCLUSIONS Use of mobile and wearable technologies for OCD has developed primarily in the past 15 years, with an increasing pace of related publications. Passive measures from actigraphy generally match subjective reports. Ecological momentary assessment is well tolerated for the naturalistic assessment of symptoms, may capture novel OCD symptoms, and may also document lower symptom burden than retrospective recall. Digital or mobile treatments are diverse; however, they generally provide some improvement in OCD symptom burden. Finally, ongoing work is needed for a safe and trusted uptake of technology by patients and providers.
Collapse
Affiliation(s)
- Adam C Frank
- Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States
| | - Ruibei Li
- Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States
| | - Bradley S Peterson
- Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States
- Division of Child and Adolescent Psychiatry, Children's Hospital Los Angeles, Los Angeles, CA, United States
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
2
|
Paromita P, Mundnich K, Nadarajan A, Booth BM, Narayanan SS, Chaspari T. Modeling inter-individual differences in ambulatory-based multimodal signals via metric learning: a case study of personalized well-being estimation of healthcare workers. Front Digit Health 2023; 5:1195795. [PMID: 37363272 PMCID: PMC10289192 DOI: 10.3389/fdgth.2023.1195795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023] Open
Abstract
Introduction Intelligent ambulatory tracking can assist in the automatic detection of psychological and emotional states relevant to the mental health changes of professionals with high-stakes job responsibilities, such as healthcare workers. However, well-known differences in the variability of ambulatory data across individuals challenge many existing automated approaches seeking to learn a generalizable means of well-being estimation. This paper proposes a novel metric learning technique that improves the accuracy and generalizability of automated well-being estimation by reducing inter-individual variability while preserving the variability pertaining to the behavioral construct. Methods The metric learning technique implemented in this paper entails learning a transformed multimodal feature space from pairwise similarity information between (dis)similar samples per participant via a Siamese neural network. Improved accuracy via personalization is further achieved by considering the trait characteristics of each individual as additional input to the metric learning models, as well as individual trait base cluster criteria to group participants followed by training a metric learning model for each group. Results The outcomes of the proposed models demonstrate significant improvement over the other inter-individual variability reduction and deep neural baseline methods for stress, anxiety, positive affect, and negative affect. Discussion This study lays the foundation for accurate estimation of psychological and emotional states in realistic and ambulatory environments leading to early diagnosis of mental health changes and enabling just-in-time adaptive interventions.
Collapse
Affiliation(s)
- Projna Paromita
- HUman Bio-Behavioral Signals Lab, Texas A & M University, College Station, TX, United States
| | - Karel Mundnich
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States
| | - Amrutha Nadarajan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States
| | - Brandon M. Booth
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States
| | - Shrikanth S. Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States
| | - Theodora Chaspari
- HUman Bio-Behavioral Signals Lab, Texas A & M University, College Station, TX, United States
| |
Collapse
|
3
|
Berkel C, Knox DC, Flemotomos N, Martinez VR, Atkins DC, Narayanan SS, Rodriguez LA, Gallo CG, Smith JD. A machine learning approach to improve implementation monitoring of family-based preventive interventions in primary care. Implement Res Pract 2023; 4:26334895231187906. [PMID: 37790171 PMCID: PMC10375039 DOI: 10.1177/26334895231187906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023] Open
Abstract
Background Evidence-based parenting programs effectively prevent the onset and escalation of child and adolescent behavioral health problems. When programs have been taken to scale, declines in the quality of implementation diminish intervention effects. Gold-standard methods of implementation monitoring are cost-prohibitive and impractical in resource-scarce delivery systems. Technological developments using computational linguistics and machine learning offer an opportunity to assess fidelity in a low burden, timely, and comprehensive manner. Methods In this study, we test two natural language processing (NLP) methods [i.e., Term Frequency-Inverse Document Frequency (TF-IDF) and Bidirectional Encoder Representations from Transformers (BERT)] to assess the delivery of the Family Check-Up 4 Health (FCU4Health) program in a type 2 hybrid effectiveness-implementation trial conducted in primary care settings that serve primarily Latino families. We trained and evaluated models using 116 English and 81 Spanish-language transcripts from the 113 families who initiated FCU4Health services. We evaluated the concurrent validity of the TF-IDF and BERT models using observer ratings of program sessions using the COACH measure of competent adherence. Following the Implementation Cascade model, we assessed predictive validity using multiple indicators of parent engagement, which have been demonstrated to predict improvements in parenting and child outcomes. Results Both TF-IDF and BERT ratings were significantly associated with observer ratings and engagement outcomes. Using mean squared error, results demonstrated improvement over baseline for observer ratings from a range of 0.83-1.02 to 0.62-0.76, resulting in an average improvement of 24%. Similarly, results demonstrated improvement over baseline for parent engagement indicators from a range of 0.81-27.3 to 0.62-19.50, resulting in an approximate average improvement of 18%. Conclusions These results demonstrate the potential for NLP methods to assess implementation in evidence-based parenting programs delivered at scale. Future directions are presented. Trial registration NCT03013309 ClinicalTrials.gov.
Collapse
Affiliation(s)
- Cady Berkel
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
- Ming Hsieh Department of Electrical Engineering, USC Viterbi School of Engineering, REACH Institute, Arizona State University, Tempe, AZ, USA
| | - Dillon C. Knox
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
| | - Nikolaos Flemotomos
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
| | - Victor R. Martinez
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
| | - David C. Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Shrikanth S. Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, USA
| | - Lizeth Alonso Rodriguez
- Ming Hsieh Department of Electrical Engineering, USC Viterbi School of Engineering, REACH Institute, Arizona State University, Tempe, AZ, USA
| | - Carlos G. Gallo
- Department of Psychiatry and Behavioral Sciences, Northwestern University, Chicago, IL, USA
| | - Justin D. Smith
- Department of Population Health Sciences, Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
4
|
Smith JD, Berkel C, Carroll AJ, Fu E, Grimm KJ, Mauricio AM, Rudo-Stern J, Winslow E, Dishion TJ, Jordan N, Atkins DC, Narayanan SS, Gallo C, Bruening MM, Wilson C, Lokey F, Samaddar K. Health behaviour outcomes of a family based intervention for paediatric obesity in primary care: A randomized type II hybrid effectiveness-implementation trial. Pediatr Obes 2021; 16:e12780. [PMID: 33783104 DOI: 10.1111/ijpo.12780] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 01/08/2021] [Accepted: 01/25/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND Paediatric obesity is a multifaceted public health problem. Family based behavioural interventions are the recommended approach for the prevention of excess weight gain in children and adolescents, yet few have been tested under "real-world" conditions. OBJECTIVES To evaluate the effectiveness of a family based intervention, delivered in coordination with paediatric primary care, on child and family health outcomes. METHODS A sample of 240 families with racially and ethnically diverse (86% non-White) and predominantly low-income children (49% female) ages 6 to 12 years (M = 9.5 years) with body mass index (BMI) ≥85th percentile for age and gender were identified in paediatric primary care. Participants were randomized to either the Family Check-Up 4 Health (FCU4Health) program (N = 141) or usual care plus information (N = 99). FCU4Health, an assessment-driven individually tailored intervention designed to preempt excess weight gain by improving parenting skills was delivered for 6 months in clinic, at home and in the community. Child BMI and body fat were assessed using a bioelectrical impedance scale and caregiver-reported health behaviours (eg, diet, physical activity and family health routines) were obtained at baseline, 3, 6 and 12 months. RESULTS Change in child BMI and percent body fat did not differ by group assignment. Path analysis indicated significant group differences in child health behaviours at 12 months, mediated by improved family health routines at 6 months. CONCLUSION The FCU4Health, delivered in coordination with paediatric primary care, significantly impacted child and family health behaviours that are associated with the development and maintenance of paediatric obesity. BMI did not significantly differ.
Collapse
Affiliation(s)
- Justin D Smith
- Department of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, Utah, USA.,Department of Psychiatry and Behavioral Sciences and Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Cady Berkel
- Integrated Behavioral Health Program, College of Health Solutions, Arizona State University, Tempe, Arizona, USA
| | - Allison J Carroll
- Department of Psychiatry and Behavioral Sciences and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Emily Fu
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Kevin J Grimm
- Department of Psychology, Arizona State University, Tempe, Arizona, USA
| | - Anne M Mauricio
- Prevention Science Institute, University of Oregon, Eugene, Oregon, USA
| | | | - Emily Winslow
- REACH Institute, Department of Psychology, Arizona State University, Tempe, Arizona, USA
| | - Thomas J Dishion
- REACH Institute, Department of Psychology, Arizona State University, Tempe, Arizona, USA
| | - Neil Jordan
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - David C Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Shrikanth S Narayanan
- Department of Electrical Engineering and Computer Science, University of Southern California, Los Angeles, California, USA
| | - Carlos Gallo
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Meg M Bruening
- Department of Nutrition, College of Health Solutions, Arizona State University, Tempe, Arizona, USA
| | | | - Farah Lokey
- Palo Verde Pediatrics, Phoenix Children's Hospital, Phoenix, Arizona, USA
| | | |
Collapse
|
5
|
Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, Vaz C, Sorensen T, Oh M, Harper S, Chen W, Lee Y, Töger J, Monteserin ML, Smith C, Godinez B, Goldstein L, Byrd D, Nayak KS, Narayanan SS. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data 2021; 8:187. [PMID: 34285240 PMCID: PMC8292336 DOI: 10.1038/s41597-021-00976-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/22/2021] [Indexed: 12/11/2022] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yannick Bliesener
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Colin Vaz
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Tanner Sorensen
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Miran Oh
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Sarah Harper
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Weiyi Chen
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yoonjeong Lee
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Johannes Töger
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Mairym Lloréns Monteserin
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Caitlin Smith
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Bianca Godinez
- Department of Linguistics, California State University Long Beach, Long Beach, California, USA
| | - Louis Goldstein
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.
| |
Collapse
|
6
|
Lynn E, Narayanan SS, Lammert AC. Dark tone quality and vocal tract shaping in soprano song production: Insights from real-time MRI. JASA Express Lett 2021; 1:075202. [PMID: 34291230 PMCID: PMC8273971 DOI: 10.1121/10.0005109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
Tone quality termed "dark" is an aesthetically important property of Western classical voice performance and has been associated with lowered formant frequencies, lowered larynx, and widened pharynx. The present study uses real-time magnetic resonance imaging with synchronous audio recordings to investigate dark tone quality in four professionally trained sopranos with enhanced ecological validity and a relatively complete view of the vocal tract. Findings differ from traditional accounts, indicating that labial narrowing may be the primary driver of dark tone quality across performers, while many other aspects of vocal tract shaping are shown to differ significantly in a performer-specific way.
Collapse
Affiliation(s)
- Elisabeth Lynn
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01690, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, California 95616, USA , ,
| | - Adam C Lammert
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01690, USA
| |
Collapse
|
7
|
Hagedorn C, Kim J, Sinha U, Goldstein L, Narayanan SS. Complexity of vocal tract shaping in glossectomy patients and typical speakers: A principal component analysis. J Acoust Soc Am 2021; 149:4437. [PMID: 34241468 PMCID: PMC8221817 DOI: 10.1121/10.0004789] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 04/02/2021] [Accepted: 04/06/2021] [Indexed: 06/13/2023]
Abstract
The glossectomy procedure, involving surgical resection of cancerous lingual tissue, has long been observed to affect speech production. This study aims to quantitatively index and compare complexity of vocal tract shaping due to lingual movement in individuals who have undergone glossectomy and typical speakers using real-time magnetic resonance imaging data and Principal Component Analysis. The data reveal that (i) the type of glossectomy undergone largely predicts the patterns in vocal tract shaping observed, (ii) gross forward and backward motion of the tongue body accounts for more change in vocal tract shaping than do subtler movements of the tongue (e.g., tongue tip constrictions) in patient data, and (iii) fewer vocal tract shaping components are required to account for the patients' speech data than typical speech data, suggesting that the patient data at hand exhibit less complex vocal tract shaping in the midsagittal plane than do the data from the typical speakers observed.
Collapse
Affiliation(s)
- Christina Hagedorn
- Linguistics, College of Staten Island-City University of New York, 2800 Victory Boulevard, Staten Island, New York 10314, USA
| | - Jangwon Kim
- Amazon Care, 410 Terry Avenue North, Seattle, Washington 98109, USA
| | - Uttam Sinha
- Keck School of Medicine, University of Southern California, 1975 Zonal Avenue, Los Angeles, California 90033, USA
| | - Louis Goldstein
- Linguistics, University of Southern California, 3601 Watt Way, Los Angeles, California 90089, USA
| | - Shrikanth S Narayanan
- Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California 90089, USA
| |
Collapse
|
8
|
Ravuri V, Paromita P, Mundnich K, Nadarajan A, Booth BM, Narayanan SS, Chaspari T. Investigating Group-Specific Models of Hospital Workers’ Well-Being: Implications for Algorithmic Bias. Int J Semantic Computing 2021. [DOI: 10.1142/s1793351x20500075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Hospital workers often experience burnout due to the demanding job responsibilities and long work hours. Data yielding from ambulatory monitoring combined with machine learning algorithms can afford us a better understanding of the naturalistic processes that contribute to this burnout. Motivated by the challenges related to the accurate tracking of well-being in real-life, prior work has investigated group-specific machine learning (GS-ML) models that are tailored to groups of participants. We examine a novel GS-ML for estimating well-being from real-life multimodal measures collected in situ from hospital workers. In contrast to the majority of prior work that uses pre-determined clustering criteria, we propose an iterative procedure that refines participant clusters based on the representations learned by the GS-ML models. Motivated by prior work that highlights the differential impact of job demands on well-being, we further explore the participant clusters in terms of demography and job-related attributes. Results indicate that the GS-ML models mostly outperform general models in estimating well-being constructs. The GS-ML models further depict different degrees of predictive power for each participant cluster, as distinguished upon age, education, occupational role, and number of supervisees. The observed discrepancies with respect to the GS-ML model decisions are discussed in association with algorithmic bias.
Collapse
|
9
|
Goldberg SB, Flemotomos N, Martinez VR, Tanana MJ, Kuo PB, Pace BT, Villatte JL, Georgiou PG, Van Epps J, Imel ZE, Narayanan SS, Atkins DC. Machine learning and natural language processing in psychotherapy research: Alliance as example use case. J Couns Psychol 2020; 67:438-448. [PMID: 32614225 PMCID: PMC7393999 DOI: 10.1037/cou0000382] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Artificial intelligence generally and machine learning specifically have become deeply woven into the lives and technologies of modern life. Machine learning is dramatically changing scientific research and industry and may also hold promise for addressing limitations encountered in mental health care and psychotherapy. The current paper introduces machine learning and natural language processing as related methodologies that may prove valuable for automating the assessment of meaningful aspects of treatment. Prediction of therapeutic alliance from session recordings is used as a case in point. Recordings from 1,235 sessions of 386 clients seen by 40 therapists at a university counseling center were processed using automatic speech recognition software. Machine learning algorithms learned associations between client ratings of therapeutic alliance exclusively from session linguistic content. Using a portion of the data to train the model, machine learning algorithms modestly predicted alliance ratings from session content in an independent test set (Spearman's ρ = .15, p < .001). These results highlight the potential to harness natural language processing and machine learning to predict a key psychotherapy process variable that is relatively distal from linguistic content. Six practical suggestions for conducting psychotherapy research using machine learning are presented along with several directions for future research. Questions of dissemination and implementation may be particularly important to explore as machine learning improves in its ability to automate assessment of psychotherapy process and outcome. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
|
10
|
Booth BM, Seamans TJ, Narayanan SS. An Evaluation of EEG-based Metrics for Engagement Assessment of Distance Learners. Annu Int Conf IEEE Eng Med Biol Soc 2018; 2018:307-310. [PMID: 30440399 DOI: 10.1109/embc.2018.8512302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Maintaining students' cognitive engagement in educational settings is crucial to their performance, though quantifying this mental state in real-time for distance learners has not been studied extensively in natural distance learning environments. We record electroencephalographic (EEG) data of students watching online lecture videos and use it to predict engagement rated by human annotators. An evaluation of prior EEG-based engagement metrics that utilize power spectral density (PSD) features is presented. We examine the predictive power of various supervised machine learning approaches with both subject-independent and individualized models when using simple PSD feature functions. Our results show that engagement metrics with few power band variables, including those proposed in prior research, do not produce predictions consistent with human observations. We quantify the performance disparity between cross-subject and per-subject models and demonstrate that individual differences in EEG patterns necessitate a more complex metric for educational engagement assessment in natural distance learning environments.
Collapse
|
11
|
Lim Y, Lingala SG, Narayanan SS, Nayak KS. Dynamic off-resonance correction for spiral real-time MRI of speech. Magn Reson Med 2018; 81:234-246. [PMID: 30058147 DOI: 10.1002/mrm.27373] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/01/2018] [Accepted: 05/02/2018] [Indexed: 02/03/2023]
Abstract
PURPOSE To improve the depiction and tracking of vocal tract articulators in spiral real-time MRI (RT-MRI) of speech production by estimating and correcting for dynamic changes in off-resonance. METHODS The proposed method computes a dynamic field map from the phase of single-TE dynamic images after a coil phase compensation where complex coil sensitivity maps are estimated from the single-TE dynamic scan itself. This method is tested using simulations and in vivo data. The depiction of air-tissue boundaries is evaluated quantitatively using a sharpness metric and visual inspection. RESULTS Simulations demonstrate that the proposed method provides robust off-resonance correction for spiral readout durations up to 5 ms at 1.5T. In -vivo experiments during human speech production demonstrate that image sharpness is improved in a majority of data sets at air-tissue boundaries including the upper lip, hard palate, soft palate, and tongue boundaries, whereas the lower lip shows little improvement in the edge sharpness after correction. CONCLUSION Dynamic off-resonance correction is feasible from single-TE spiral RT-MRI data, and provides a practical performance improvement in articulator sharpness when applied to speech production imaging.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| |
Collapse
|
12
|
Smith JD, Berkel C, Jordan N, Atkins DC, Narayanan SS, Gallo C, Grimm KJ, Dishion TJ, Mauricio AM, Rudo-Stern J, Meachum MK, Winslow E, Bruening MM. An individually tailored family-centered intervention for pediatric obesity in primary care: study protocol of a randomized type II hybrid effectiveness-implementation trial (Raising Healthy Children study). Implement Sci 2018; 13:11. [PMID: 29334983 PMCID: PMC5769381 DOI: 10.1186/s13012-017-0697-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 12/07/2017] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Pediatric obesity is a multi-faceted public health concern that can lead to cardiovascular diseases, cancers, and early mortality. Small changes in diet, physical activity, or BMI can significantly reduce the possibility of developing cardiometabolic risk factors. Family-based behavioral interventions are an underutilized, evidence-based approach that have been found to significantly prevent excess weight gain and obesity in children and adolescents. Poor program availability, low participation rates, and non-adherence are noted barriers to positive outcomes. Effective interventions for pediatric obesity in primary care are hampered by low family functioning, motivation, and adherence to recommendations. METHODS This (type II) hybrid effectiveness-implementation randomized trial tests the Family Check-Up 4 Health (FCU4Health) program, which was designed to target health behavior change in children by improving family management practices and parenting skills, with the goal of preventing obesity and excess weight gain. The FCU4Health is assessment driven to tailor services and increase parent motivation. A sample of 350 families with children aged 6 to 12 years who are identified as overweight or obese (BMI ≥ 85th percentile for age and gender) will be enrolled at three primary care clinics [two Federally Qualified Healthcare Centers (FQHCs) and a children's hospital]. All clinics serve predominantly Medicaid patients and a large ethnic minority population, including Latinos, African Americans, and American Indians who face disparities in obesity, cardiometabolic risk, and access to care. The FCU4Health will be coordinated with usual care, using two different delivery strategies: an embedded approach for the two FQHCs and a referral model for the hospital-based clinic. To assess program effectiveness (BMI, body composition, child health behaviors, parenting, and utilization of support services) and implementation outcomes (such outcomes as acceptability, adoption, feasibility, appropriateness, fidelity, and cost), we use a multi-method and multi-informant assessment strategy including electronic health record data, behavioral observation, questionnaires, interviews, and cost capture methods. DISCUSSION This study has the potential to prevent excess weight gain, obesity, and health disparities in children by establishing the effectiveness of the FCU4Health and collecting information critical for healthcare decision makers to support sustainable implementation of family-based programs in primary care. TRIAL REGISTRATION NCT03013309 ClinicalTrials.gov.
Collapse
Affiliation(s)
- Justin D. Smith
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL USA
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL USA
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL USA
| | - Cady Berkel
- REACH Institute, Department of Psychology, Arizona State University, Tempe, AZ USA
| | - Neil Jordan
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL USA
| | - David C. Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Shrikanth S. Narayanan
- Department of Electrical Engineering and Computer Science, University of Southern California, CA, Los Angeles USA
| | - Carlos Gallo
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL USA
| | - Kevin J. Grimm
- REACH Institute, Department of Psychology, Arizona State University, Tempe, AZ USA
| | - Thomas J. Dishion
- REACH Institute, Department of Psychology, Arizona State University, Tempe, AZ USA
| | - Anne M. Mauricio
- REACH Institute, Department of Psychology, Arizona State University, Tempe, AZ USA
| | - Jenna Rudo-Stern
- REACH Institute, Department of Psychology, Arizona State University, Tempe, AZ USA
| | - Mariah K. Meachum
- Department of Psychiatry and Behavioral Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL USA
| | - Emily Winslow
- REACH Institute, Department of Psychology, Arizona State University, Tempe, AZ USA
| | - Meg M. Bruening
- Department of Nutrition, Arizona State University, Tempe, AZ USA
| |
Collapse
|
13
|
Abstract
Several studies have established that facial expressions of children with autism are often perceived as atypical, awkward or less engaging by typical adult observers. Despite this clear deficit in the quality of facial expression production, very little is understood about its underlying mechanisms and characteristics. This paper takes a computational approach to studying details of facial expressions of children with high functioning autism (HFA). The objective is to uncover those characteristics of facial expressions, notably distinct from those in typically developing children, and which are otherwise difficult to detect by visual inspection. We use motion capture data obtained from subjects with HFA and typically developing subjects while they produced various facial expressions. This data is analyzed to investigate how the overall and local facial dynamics of children with HFA differ from their typically developing peers. Our major observations include reduced complexity in the dynamic facial behavior of the HFA group arising primarily from the eye region.
Collapse
Affiliation(s)
- Tanaya Guha
- Department of Electrical Engineering, Indian Institute of Technology Kanpur, India
| | - Zhaojun Yang
- Signal Analysis and Interpretation Lab (SAIL), University of Southern California, Los Angeles
| | | | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Lab (SAIL), University of Southern California, Los Angeles
| |
Collapse
|
14
|
Chaspari T, Tsiartas A, Stein Duker LI, Cermak SA, Narayanan SS. EDA-gram: designing electrodermal activity fingerprints for visualization and feature extraction. Annu Int Conf IEEE Eng Med Biol Soc 2017; 2016:403-406. [PMID: 28268358 DOI: 10.1109/embc.2016.7590725] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Wearable technology permeates every aspect of our daily life increasing the need of reliable and interpretable models for processing the large amount of biomedical data. We propose the EDA-Gram, a multidimensional fingerprint of the electrodermal activity (EDA) signal, inspired by the widely-used notion of spectrogram. The EDA-Gram is based on the sparse decomposition of EDA from a knowledge-driven set of dictionary atoms. The time axis reflects the analysis frames, the spectral dimension depicts the width of selected dictionary atoms, while intensity values are computed from the atom coefficients. In this way, EDA-Gram incorporates the amplitude and shape of Skin Conductance Responses (SCR), which comprise an essential part of the signal. EDA-Gram is further used as a foundation for signal-specific feature design. Our results indicate that the proposed representation can accentuate fine-grain signal fluctuations, which might not always be apparent through simple visual inspection. Statistical analysis and classification/regression experiments further suggest that the derived features can differentiate between multiple arousal levels and stress-eliciting environments for two datasets.
Collapse
|
15
|
Timmons AC, Baucom BR, Han SC, Perrone L, Chaspari T, Narayanan SS, Margolin G. New Frontiers in Ambulatory Assessment. Social Psychological and Personality Science 2017. [DOI: 10.1177/1948550617709115] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Adela C. Timmons
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| | | | - Sohyun C. Han
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| | - Laura Perrone
- Department of Psychology, Stony Brook University, New York, NY, USA
| | - Theodora Chaspari
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
| | - Shrikanth S. Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
| | - Gayla Margolin
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
16
|
Hagedorn C, Proctor M, Goldstein L, Wilson SM, Miller B, Gorno-Tempini ML, Narayanan SS. Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. J Speech Lang Hear Res 2017; 60:877-891. [PMID: 28314241 PMCID: PMC5548083 DOI: 10.1044/2016_jslhr-s-15-0112] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Revised: 12/19/2015] [Accepted: 07/15/2016] [Indexed: 05/29/2023]
Abstract
Purpose Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed. Method One adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest. Results Real-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured. Conclusion Real-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
Collapse
Affiliation(s)
| | - Michael Proctor
- Macquarie University, North Ryde, New South Wales, Australia
| | | | | | | | | | | |
Collapse
|
17
|
Baucom BRW, Georgiou P, Bryan CJ, Garland EL, Leifker F, May A, Wong A, Narayanan SS. The Promise and the Challenge of Technology-Facilitated Methods for Assessing Behavioral and Cognitive Markers of Risk for Suicide among U.S. Army National Guard Personnel. Int J Environ Res Public Health 2017; 14:E361. [PMID: 28362333 PMCID: PMC5409562 DOI: 10.3390/ijerph14040361] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 03/20/2017] [Accepted: 03/25/2017] [Indexed: 12/02/2022]
Abstract
Suicide was the 10th leading cause of death for Americans in 2015 and rates have been steadily climbing over the last 25 years. Rates are particularly high amongst U.S. military personnel. Suicide prevention efforts in the military are significantly hampered by the lack of: (1) assessment tools for measuring baseline risk and (2) methods to detect periods of particularly heightened risk. Two specific barriers to assessing suicide risk in military personnel that call for innovation are: (1) the geographic dispersion of military personnel from healthcare settings, particularly amongst components like the Reserves; and (2) professional and social disincentives to acknowledging psychological distress. The primary aim of this paper is to describe recent technological developments that could contribute to risk assessment tools that are not subject to the limitations mentioned above. More specifically, Behavioral Signal Processing can be used to assess behaviors during interaction and conversation that likely indicate increased risk for suicide, and computer-administered, cognitive performance tasks can be used to assess activation of the suicidal mode. These novel methods can be used remotely and do not require direct disclosure or endorsement of psychological distress, solving two challenges to suicide risk assessment in military and other sensitive settings. We present an introduction to these technologies, describe how they can specifically be applied to assessing behavioral and cognitive risk for suicide, and close with recommendations for future research.
Collapse
Affiliation(s)
- Brian R W Baucom
- Department of Psychology, University of Utah, Salt Lake City, UT 84108, USA.
| | - Panayiotis Georgiou
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA.
| | - Craig J Bryan
- Department of Psychology, University of Utah, Salt Lake City, UT 84108, USA.
- National Center for Veterans Studies, University of Utah, Salt Lake City, UT 84108, USA.
| | - Eric L Garland
- Department of Social Work, University of Utah, Salt Lake City, UT 84108, USA.
| | - Feea Leifker
- Department of Psychology, University of Utah, Salt Lake City, UT 84108, USA.
| | - Alexis May
- Department of Psychology, University of Utah, Salt Lake City, UT 84108, USA.
| | - Alexander Wong
- Department of Psychology, University of Utah, Salt Lake City, UT 84108, USA.
| | - Shrikanth S Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
18
|
Bone D, Bishop S, Black MP, Goodwin MS, Lord C, Narayanan SS. Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. J Child Psychol Psychiatry 2016; 57:927-37. [PMID: 27090613 PMCID: PMC4958551 DOI: 10.1111/jcpp.12559] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/29/2016] [Indexed: 01/23/2023]
Abstract
BACKGROUND Machine learning (ML) provides novel opportunities for human behavior research and clinical translation, yet its application can have noted pitfalls (Bone et al., 2015). In this work, we fastidiously utilize ML to derive autism spectrum disorder (ASD) instrument algorithms in an attempt to improve upon widely used ASD screening and diagnostic tools. METHODS The data consisted of Autism Diagnostic Interview-Revised (ADI-R) and Social Responsiveness Scale (SRS) scores for 1,264 verbal individuals with ASD and 462 verbal individuals with non-ASD developmental or psychiatric disorders, split at age 10. Algorithms were created via a robust ML classifier, support vector machine, while targeting best-estimate clinical diagnosis of ASD versus non-ASD. Parameter settings were tuned in multiple levels of cross-validation. RESULTS The created algorithms were more effective (higher performing) than the current algorithms, were tunable (sensitivity and specificity can be differentially weighted), and were more efficient (achieving near-peak performance with five or fewer codes). Results from ML-based fusion of ADI-R and SRS are reported. We present a screener algorithm for below (above) age 10 that reached 89.2% (86.7%) sensitivity and 59.0% (53.4%) specificity with only five behavioral codes. CONCLUSIONS ML is useful for creating robust, customizable instrument algorithms. In a unique dataset comprised of controls with other difficulties, our findings highlight the limitations of current caregiver-report instruments and indicate possible avenues for improving ASD screening and diagnostic tools.
Collapse
Affiliation(s)
- Daniel Bone
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA
| | - Somer Bishop
- San Francisco School of Medicine, University of California, San Francisco, CA
| | - Matthew P. Black
- Information Sciences Institute, University of Southern California, Los Angeles, CA
| | | | - Catherine Lord
- Center for Autism and the Developing Brain, Weill Cornell Medical College, New York, NY, USA
| | - Shrikanth S. Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA
| |
Collapse
|
19
|
Xiao B, Huang C, Imel ZE, Atkins DC, Georgiou P, Narayanan SS. A technology prototype system for rating therapist empathy from audio recordings in addiction counseling. PeerJ Comput Sci 2016; 2:e59. [PMID: 28286867 PMCID: PMC5344199 DOI: 10.7717/peerj-cs.59] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
Collapse
Affiliation(s)
- Bo Xiao
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, United States
| | - Chewei Huang
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, United States
| | - Zac E. Imel
- Department of Educational Psychology, University of Utah, Salt Lake City, UT, United States
| | - David C. Atkins
- Department of Psychiatry & Behavioral Sciences, University of Washington, Seattle, WA, United States
| | - Panayiotis Georgiou
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, United States
| | - Shrikanth S. Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
20
|
Toutios A, Narayanan SS. Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA Trans Signal Inf Process 2016; 5:e6. [PMID: 27833745 PMCID: PMC5100697 DOI: 10.1017/atsip.2016.5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Collapse
Affiliation(s)
- Asterios Toutios
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| |
Collapse
|
21
|
Ramanarayanan V, Van Segbroeck M, Narayanan SS. Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. COMPUT SPEECH LANG 2016; 36:330-346. [PMID: 26688612 DOI: 10.1016/j.csl.2015.03.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step towards answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e. representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA - 90089
| | - Maarten Van Segbroeck
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA - 90089
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA - 90089
| |
Collapse
|
22
|
Can D, Marín RA, Georgiou PG, Imel ZE, Atkins DC, Narayanan SS. "It sounds like...": A natural language processing approach to detecting counselor reflections in motivational interviewing. J Couns Psychol 2016; 63:343-350. [PMID: 26784286 DOI: 10.1037/cou0000111] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The dissemination and evaluation of evidence-based behavioral treatments for substance abuse problems rely on the evaluation of counselor interventions. In Motivational Interviewing (MI), a treatment that directs the therapist to utilize a particular linguistic style, proficiency is assessed via behavioral coding-a time consuming, nontechnological approach. Natural language processing techniques have the potential to scale up the evaluation of behavioral treatments such as MI. We present a novel computational approach to assessing components of MI, focusing on 1 specific counselor behavior-reflections, which are believed to be a critical MI ingredient. Using 57 sessions from 3 MI clinical trials, we automatically detected counselor reflections in a maximum entropy Markov modeling framework using the raw linguistic data derived from session transcripts. We achieved 93% recall, 90% specificity, and 73% precision. Results provide insight into the linguistic information used by coders to make ratings and demonstrate the feasibility of new computational approaches to scaling up the evaluation of behavioral treatments.
Collapse
Affiliation(s)
- Doğan Can
- Department of Computer Science, University of Southern California
| | - Rebeca A Marín
- Department of Psychiatry and Behavioral Sciences, University of Washington
| | | | - Zac E Imel
- Department of Educational Psychology, University of Utah
| | - David C Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington
| | | |
Collapse
|
23
|
Xiao B, Imel ZE, Georgiou PG, Atkins DC, Narayanan SS. "Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing. PLoS One 2015; 10:e0143055. [PMID: 26630392 PMCID: PMC4668058 DOI: 10.1371/journal.pone.0143055] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 10/30/2015] [Indexed: 11/30/2022] Open
Abstract
The technology for evaluating patient-provider interactions in psychotherapy–observational coding–has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.
Collapse
Affiliation(s)
- Bo Xiao
- Department of Electrical Engineering, University of Southern California, Los Angeles, United States of America
| | - Zac E. Imel
- Department of Educational Psychology, University of Utah, Salt Lake City, United States of America
- * E-mail:
| | - Panayiotis G. Georgiou
- Department of Electrical Engineering, University of Southern California, Los Angeles, United States of America
| | - David C. Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, United States of America
| | - Shrikanth S. Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, United States of America
| |
Collapse
|
24
|
Abstract
Vocal tract length is highly variable across speakers and determines many aspects of the acoustic speech signal, making it an essential parameter to consider for explaining behavioral variability. A method for accurate estimation of vocal tract length from formant frequencies would afford normalization of interspeaker variability and facilitate acoustic comparisons across speakers. A framework for considering estimation methods is developed from the basic principles of vocal tract acoustics, and an estimation method is proposed that follows naturally from this framework. The proposed method is evaluated using acoustic characteristics of simulated vocal tracts ranging from 14 to 19 cm in length, as well as real-time magnetic resonance imaging data with synchronous audio from five speakers whose vocal tracts range from 14.5 to 18.0 cm in length. Evaluations show improvements in accuracy over previously proposed methods, with 0.631 and 1.277 cm root mean square error on simulated and human speech data, respectively. Empirical results show that the effectiveness of the proposed method is based on emphasizing higher formant frequencies, which seem less affected by speech articulation. Theoretical predictions of formant sensitivity reinforce this empirical finding. Moreover, theoretical insights are explained regarding the reason for differences in formant sensitivity.
Collapse
Affiliation(s)
- Adam C. Lammert
- Computer Science Department, Swarthmore College, Swarthmore, PA, United States of America
- * E-mail:
| | - Shrikanth S. Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States of America
| |
Collapse
|
25
|
Abstract
This paper presents a computational study of head motion in human interaction, notably of its role in conveying interlocutors' behavioral characteristics. Head motion is physically complex and carries rich information; current modeling approaches based on visual signals, however, are still limited in their ability to adequately capture these important properties. Guided by the methodology of kinesics, we propose a data driven approach to identify typical head motion patterns. The approach follows the steps of first segmenting motion events, then parametrically representing the motion by linear predictive features, and finally generalizing the motion types using Gaussian mixture models. The proposed approach is experimentally validated using video recordings of communication sessions from real couples involved in a couples therapy study. In particular we use the head motion model to classify binarized expert judgments of the interactants' specific behavioral characteristics where entrainment in head motion is hypothesized to play a role: Acceptance, Blame, Positive, and Negative behavior. We achieve accuracies in the range of 60% to 70% for the various experimental settings and conditions. In addition, we describe a measure of motion similarity between the interaction partners based on the proposed model. We show that the relative change of head motion similarity during the interaction significantly correlates with the expert judgments of the interactants' behavioral characteristics. These findings demonstrate the effectiveness of the proposed head motion model, and underscore the promise of analyzing human behavioral characteristics through signal processing methods.
Collapse
Affiliation(s)
- Bo Xiao
- Signal and Image Processing Institute, Department of Electrical Engineering, University of Southern California, Los Angeles, CA, 90089 USA
| | - Panayiotis Georgiou
- Signal and Image Processing Institute, Department of Electrical Engineering, University of Southern California, Los Angeles, CA, 90089 USA
| | - Brian Baucom
- Department of Psychology, University of Utah, Salt Lack City, UT, 84112 USA
| | - Shrikanth S Narayanan
- Signal and Image Processing Institute, Department of Electrical Engineering, University of Southern California, Los Angeles, CA, 90089 USA
| |
Collapse
|
26
|
Baucom BR, Sheng E, Christensen A, Georgiou PG, Narayanan SS, Atkins DC. Behaviorally-based couple therapies reduce emotional arousal during couple conflict. Behav Res Ther 2015; 72:49-55. [PMID: 26183021 DOI: 10.1016/j.brat.2015.06.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Revised: 06/28/2015] [Accepted: 06/30/2015] [Indexed: 11/18/2022]
Abstract
Emotional arousal during relationship conflict is a major target for intervention in couple therapies. The current study examines changes in conflict-related emotional arousal in 104 couples that participated in a randomized clinical trial of two behaviorally-based couple therapies. Emotional arousal is measured using mean fundamental frequency of spouse's speech, and changes in emotional arousal from pre-to post-therapy are examined using multilevel models. Overall emotional arousal, the rate of increase in emotional arousal at the beginning of conflict, and the duration of emotional arousal declined for all couples. Reductions in overall arousal were stronger for TBCT wives than for IBCT wives but not significantly different for IBCT and TBCT husbands. Reductions in the rate of initial arousal were larger for TBCT couples than IBCT couples. Reductions in duration were larger for IBCT couples than TBCT couples. These findings suggest that both therapies can reduce emotional arousal, but that the two therapies create different kinds of change in emotional arousal.
Collapse
Affiliation(s)
- Brian R Baucom
- Department of Psychology, University of Utah, Salt Lake City, UT 84112, USA.
| | - Elisa Sheng
- Department of Psychiatry and Behavioral Science, University of Washington, USA
| | | | | | | | - David C Atkins
- Department of Psychiatry and Behavioral Science, University of Washington, USA
| |
Collapse
|
27
|
Guha T, Yang Z, Ramakrishna A, Grossman RB, Darren H, Lee S, Narayanan SS. On Quantifying Facial Expression-Related Atypicality of Children with Autism Spectrum Disorder. Proc IEEE Int Conf Acoust Speech Signal Process 2015; 2015:803-807. [PMID: 26705397 DOI: 10.1109/icassp.2015.7178080] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Children with Autism Spectrum Disorder (ASD) are known to have difficulty in producing and perceiving emotional facial expressions. Their expressions are often perceived as atypical by adult observers. This paper focuses on data driven ways to analyze and quantify atypicality in facial expressions of children with ASD. Our objective is to uncover those characteristics of facial gestures that induce the sense of perceived atypicality in observers. Using a carefully collected motion capture database, facial expressions of children with and without ASD are compared within six basic emotion categories employing methods from information theory, time-series modeling and statistical analysis. Our experiments show that children with ASD usually have less complex expression producing mechanisms; the differences in facial dynamics between children with and without ASD primarily come from the eye region. Our study also notes that children with ASD exhibit lower symmetry between left and right regions, and lower variation in motion intensity across facial regions.
Collapse
Affiliation(s)
- Tanaya Guha
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA
| | - Zhaojun Yang
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA
| | - Anil Ramakrishna
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA
| | - Ruth B Grossman
- Emerson College, Boston, MA ; University of Massachusetts Medical School, Boston, MA
| | | | - Sungbok Lee
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA
| |
Collapse
|
28
|
Kim J, Toutios A, Lee S, Narayanan SS. A kinematic study of critical and non-critical articulators in emotional speech production. J Acoust Soc Am 2015; 137:1411-1429. [PMID: 25786953 DOI: 10.1121/1.4908284] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This study explores one aspect of the articulatory mechanism that underlies emotional speech production, namely, the behavior of linguistically critical and non-critical articulators in the encoding of emotional information. The hypothesis is that the possible larger kinematic variability in the behavior of non-critical articulators enables revealing underlying emotional expression goal more explicitly than that of the critical articulators; the critical articulators are strictly controlled in service of achieving linguistic goals and exhibit smaller kinematic variability. This hypothesis is examined by kinematic analysis of the movements of critical and non-critical speech articulators gathered using eletromagnetic articulography during spoken expressions of five categorical emotions. Analysis results at the level of consonant-vowel-consonant segments reveal that critical articulators for the consonants show more (less) peripheral articulations during production of the consonant-vowel-consonant syllables for high (low) arousal emotions, while non-critical articulators show less sensitive emotional variation of articulatory position to the linguistic gestures. Analysis results at the individual phonetic targets show that overall, between- and within-emotion variability in articulatory positions is larger for non-critical cases than for critical cases. Finally, the results of simulation experiments suggest that the postural variation of non-critical articulators depending on emotion is significantly associated with the controls of critical articulators.
Collapse
Affiliation(s)
- Jangwon Kim
- Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
| | - Asterios Toutios
- Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
| | - Sungbok Lee
- Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
| | - Shrikanth S Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
| |
Collapse
|
29
|
Abstract
Biometric sensors and portable devices are being increasingly embedded into our everyday life, creating the need for robust physiological models that efficiently represent, analyze, and interpret the acquired signals. We propose a knowledge-driven method to represent electrodermal activity (EDA), a psychophysiological signal linked to stress, affect, and cognitive processing. We build EDA-specific dictionaries that accurately model both the slow varying tonic part and the signal fluctuations, called skin conductance responses (SCR), and use greedy sparse representation techniques to decompose the signal into a small number of atoms from the dictionary. Quantitative evaluation of our method considers signal reconstruction, compression rate, and information retrieval measures, that capture the ability of the model to incorporate the main signal characteristics, such as SCR occurrences. Compared to previous studies fitting a predetermined structure to the signal, results indicate that our approach provides benefits across all aforementioned criteria. This paper demonstrates the ability of appropriate dictionaries along with sparse decomposition methods to reliably represent EDA signals and provides a foundation for automatic measurement of SCR characteristics and the extraction of meaningful EDA features.
Collapse
Affiliation(s)
- Theodora Chaspari
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089 USA
| | | | - Leah I. Stein
- Division of Occupational Science and Occupational Therapy, Herman Ostrow School of Dentistry, University of Southern California
| | - Sharon A. Cermak
- Division of Occupational Science and Occupational Therapy, Herman Ostrow School of Dentistry, University of Southern California
| | - Shrikanth S. Narayanan
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of southern California
| |
Collapse
|
30
|
Abstract
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects' data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes).
Collapse
Affiliation(s)
- Jangwon Kim
- Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Naveen Kumar
- Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Andreas Tsiartas
- Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Ming Li
- Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA ; Department of Electrical Engineering, Computer Science, Linguistics and Psychology, University of Southern California (USC), 3620 McClintock Ave., Los Angeles, CA 90089, USA
| |
Collapse
|
31
|
Can D, Gibson J, Vaz C, Georgiou PG, Narayanan SS. Barista: A Framework for Concurrent Speech Processing by USC-SAIL. Proc IEEE Int Conf Acoust Speech Signal Process 2014; 2014:3306-3310. [PMID: 27610047 DOI: 10.1109/icassp.2014.6854212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Collapse
Affiliation(s)
- Doğan Can
- Signal Analysis and Interpretation Lab, University of Southern California, CA 90089
| | - James Gibson
- Signal Analysis and Interpretation Lab, University of Southern California, CA 90089
| | - Colin Vaz
- Signal Analysis and Interpretation Lab, University of Southern California, CA 90089
| | | | | |
Collapse
|
32
|
Lee CC, Katsamanis A, Black MP, Baucom BR, Christensen A, Georgiou PG, Narayanan SS. Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2012.06.006] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
33
|
Bone D, Li M, Black MP, Narayanan SS. Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors. COMPUT SPEECH LANG 2014; 28. [PMID: 24376305 DOI: 10.1016/j.csl.2012.09.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external biochemical actions (e.g., sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting speaker state from speech is a challenging task. In this paper, we present a system constructed with multiple representations of prosodic and spectral features that provided the best result at the Intoxication Subchallenge of Interspeech 2011 on the Alcohol Language Corpus. We discuss the details of each classifier and show that fusion improves performance. We additionally address the question of how best to construct a speaker state detection system in terms of robust and practical marginalization of associated variability such as through modeling speakers, utterance type, gender, and utterance length. As is the case in human perception, speaker normalization provides significant improvements to our system. We show that a held-out set of baseline (sober) data can be used to achieve comparable gains to other speaker normalization techniques. Our fused frame-level statistic-functional systems, fused GMM systems, and final combined system achieve unweighted average recalls (UARs) of 69.7%, 65.1%, and 68.8%, respectively, on the test set. More consistent numbers compared to development set results occur with matched-prompt training, where the UARs are 70.4%, 66.2%, and 71.4%, respectively. The combined system improves over the Challenge baseline by 5.5% absolute (8.4% relative), also improving upon our previously best result.
Collapse
Affiliation(s)
- Daniel Bone
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Ming Li
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Matthew P Black
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Shrikanth S Narayanan
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA ; Department of Linguistics, University of Southern California (USC), 3620 McClintock Ave., Los Angeles, CA 90089, USA
| |
Collapse
|
34
|
Zu Y, Narayanan SS, Kim YC, Nayak K, Bronson-Lowe C, Villegas B, Ouyoung M, Sinha UK. Evaluation of swallow function after tongue cancer treatment using real-time magnetic resonance imaging: a pilot study. JAMA Otolaryngol Head Neck Surg 2014; 139:1312-9. [PMID: 24177574 DOI: 10.1001/jamaoto.2013.5444] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
IMPORTANCE Magnetic resonance imaging (MRI) has the advantage of imaging swallow function at any anatomical level without changing the position of patient, which can provide detailed information than modified barium swallow, by far the gold standard of swallow evaluation. OBJECTIVE To investigate the use of real-time MRI in the evaluation of swallow function of patients with tongue cancer. DESIGN, SETTING, AND PARTICIPANTS Real-time MRI experiments were performed on a Signa Excite HD 1.5-T scanner (GE Healthcare), with gradients capable of 40-mT/m (milli-Tesla per meter) amplitudes and 150-mT/m/ms (mT/m per millisecond) slew rates. The sequence used was spiral fast gradient echo sequence. Four men with base of tongue or oral tongue squamous cell carcinoma and 3 age-matched healthy men with normal swallowing participated in the experiment. INTERVENTIONS Real-time MRI of the midsagittal plane was collected during swallowing. Coronal planes between the oral tongue and base of tongue and through the middle of the larynx were collected from 1 of the patients. MAIN OUTCOMES AND MEASURES Oral transit time, pharyngeal transit time, submental muscle length change, and the distance change between the hyoid bone and anterior boundary of the thyroid cartilage were measured frame by frame during swallowing. RESULTS All the measurable oral transit and pharyngeal transit times of the patients with cancer were significantly longer than the ones of the healthy participants. The changes in submental muscle length and the distance between the hyoid bone and thyroid cartilage happened in concert for all 60 normal swallows; however, the pattern differed for each patient with cancer. To our knowledge, the coronal view of the tongue and larynx revealed information that has not been previously reported. CONCLUSIONS AND RELEVANCE This study has demonstrated the potential of real-time MRI to reveal critical information beyond the capacity of traditional videofluoroscopy. Further investigation is needed to fully consider the technique, procedure, and standard scope of applying MRI to evaluate swallow function of patients with cancer in research and clinic practice.
Collapse
Affiliation(s)
- Yihe Zu
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles
| | | | - Yoon-Chul Kim
- Viterbi School of Engineering, University of Southern California, Los Angeles
| | - Krishna Nayak
- Viterbi School of Engineering, University of Southern California, Los Angeles
| | | | - Brenda Villegas
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles
| | - Melody Ouyoung
- Department of Speech Pathology, Keck Hospital of University of Southern California, Los Angeles
| | - Uttam K Sinha
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles
| |
Collapse
|
35
|
Kim J, Lammert AC, Ghosh PK, Narayanan SS. Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging. J Acoust Soc Am 2014; 135:EL115-21. [PMID: 25234914 PMCID: PMC3985906 DOI: 10.1121/1.4862880] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics.
Collapse
Affiliation(s)
- Jangwon Kim
- Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
| | - Adam C Lammert
- Department of Computer Science, University of Southern California, Los Angeles, California 90089
| | - Prasanta Kumar Ghosh
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, Karnataka, India
| | - Shrikanth S Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
| |
Collapse
|
36
|
Ramanarayanan V, Goldstein L, Narayanan SS. Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation. J Acoust Soc Am 2013; 134:1378-1394. [PMID: 23927134 PMCID: PMC3745549 DOI: 10.1121/1.4812765] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 04/12/2013] [Accepted: 06/12/2013] [Indexed: 05/28/2023]
Abstract
This paper presents a computational approach to derive interpretable movement primitives from speech articulation data. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a cost function that trades off the mismatch between the proposed model and the input data against the number of primitives that are active at any given instant. The method is applied to both measured articulatory data obtained through electromagnetic articulography as well as synthetic data generated using an articulatory synthesizer. The paper then describes how to evaluate the algorithm performance quantitatively and further performs a qualitative assessment of the algorithm's ability to recover compositional structure from data. This is done using pseudo ground-truth primitives generated by the articulatory synthesizer based on an Articulatory Phonology frame-work [Browman and Goldstein (1995). "Dynamics and articulatory phonology," in Mind as motion: Explorations in the dynamics of cognition, edited by R. F. Port and T.van Gelder (MIT Press, Cambridge, MA), pp. 175-194]. The results suggest that the proposed algorithm extracts movement primitives from human speech production data that are linguistically interpretable. Such a framework might aid the understanding of longstanding issues in speech production such as motor control and coarticulation.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.
| | | | | |
Collapse
|
37
|
Ghosh PK, Narayanan SS. On smoothing articulatory trajectories obtained from Gaussian mixture model based acoustic-to-articulatory inversion. J Acoust Soc Am 2013; 134:EL258-EL264. [PMID: 23927234 PMCID: PMC4109078 DOI: 10.1121/1.4813590] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Accepted: 06/27/2013] [Indexed: 06/02/2023]
Abstract
It is well-known that the performance of acoustic-to-articulatory inversion improves by smoothing the articulatory trajectories estimated using Gaussian mixture model (GMM) mapping (denoted by GMM + Smoothing). GMM + Smoothing also provides similar performance with GMM mapping using dynamic features, which integrates smoothing directly in the mapping criterion. Due to the separation between smoothing and mapping, what objective criterion GMM + Smoothing optimizes remains unclear. In this work a new integrated smoothness criterion, the smoothed-GMM (SGMM), is proposed. GMM + Smoothing is shown, both analytically and experimentally, to be identical to the asymptotic solution of SGMM suggesting GMM + Smoothing to be a near optimal solution of SGMM.
Collapse
Affiliation(s)
- Prasanta K Ghosh
- Electrical Engineering, Indian Institute of Science (IISc), Bangalore, Karnataka 560012, India.
| | | |
Collapse
|
38
|
Ramanarayanan V, Goldstein L, Byrd D, Narayanan SS. An investigation of articulatory setting using real-time magnetic resonance imaging. J Acoust Soc Am 2013; 134:510-9. [PMID: 23862826 PMCID: PMC3724797 DOI: 10.1121/1.4807639] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This paper presents an automatic procedure to analyze articulatory setting in speech production using real-time magnetic resonance imaging of the moving human vocal tract. The procedure extracts frames corresponding to inter-speech pauses, speech-ready intervals and absolute rest intervals from magnetic resonance imaging sequences of read and spontaneous speech elicited from five healthy speakers of American English and uses automatically extracted image features to quantify vocal tract posture during these intervals. Statistical analyses show significant differences between vocal tract postures adopted during inter-speech pauses and those at absolute rest before speech; the latter also exhibits a greater variability in the adopted postures. In addition, the articulatory settings adopted during inter-speech pauses in read and spontaneous speech are distinct. The results suggest that adopted vocal tract postures differ on average during rest positions, ready positions and inter-speech pauses, and might, in that order, involve an increasing degree of active control by the cognitive speech planning mechanism.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.
| | | | | | | |
Collapse
|
39
|
Abstract
Noninvasive imaging is widely used in speech research as a means to investigate the shaping and dynamics of the vocal tract during speech production. 3-D dynamic MRI would be a major advance, as it would provide 3-D dynamic visualization of the entire vocal tract. We present a novel method for the creation of 3-D dynamic movies of vocal tract shaping based on the acquisition of 2-D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2-D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /a.ιa/, /asa/, and /a∫a/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The aligned image sequences enable dynamic 3-D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. The resulting volumes allow for dynamic 3-D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.
Collapse
Affiliation(s)
- Yinghua Zhu
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA.
| | | | | | | | | |
Collapse
|
40
|
Ettelaie E, Georgiou PG, Narayanan SS. Unsupervised data processing for classifier-based speech translator. COMPUT SPEECH LANG 2013. [DOI: 10.1016/j.csl.2012.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
41
|
Xiao B, Can D, Georgiou PG, Atkins D, Narayanan SS. Analyzing the Language of Therapist Empathy in Motivational Interview based Psychotherapy. Signal Inf Process Assoc Annu Summit Conf APSIPA Asia Pac 2012; 2012:6411762. [PMID: 27602411 PMCID: PMC5010859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Empathy is an important aspect of social communication, especially in medical and psychotherapy applications. Measures of empathy can offer insights into the quality of therapy. We use an N-gram language model based maximum likelihood strategy to classify empathic versus non-empathic utterances and report the precision and recall of classification for various parameters. High recall is obtained with unigram while bigram features achieved the highest F1-score. Based on the utterance level models, a group of lexical features are extracted at the therapy session level. The effectiveness of these features in modeling session level annotator perceptions of empathy is evaluated through correlation with expert-coded session level empathy scores. Our combined feature set achieved a correlation of 0.558 between predicted and expert-coded empathy scores. Results also suggest that the longer term empathy perception process may be more related to isolated empathic salient events.
Collapse
Affiliation(s)
- Bo Xiao
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, U.S.A
| | - Dogan Can
- Department of Computer Science, University of Southern California, Los Angeles, CA, U.S.A
| | - Panayiotis G. Georgiou
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, U.S.A
| | - David Atkins
- Department of Psychiatry & Behavioral Sciences, University of Washington, Seattle, WA, U.S.A
| | - Shrikanth S. Narayanan
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA, U.S.A
- Department of Computer Science, University of Southern California, Los Angeles, CA, U.S.A
| |
Collapse
|
42
|
Kim YC, Proctor MI, Narayanan SS, Nayak KS. Improved imaging of lingual articulation using real-time multislice MRI. J Magn Reson Imaging 2011; 35:943-8. [PMID: 22127935 DOI: 10.1002/jmri.23510] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 10/24/2011] [Indexed: 11/09/2022] Open
Abstract
PURPOSE To develop a real-time imaging technique that allows for simultaneous visualization of vocal tract shaping in multiple scan planes, and provides dynamic visualization of complex articulatory features. MATERIALS AND METHODS Simultaneous imaging of multiple slices was implemented using a custom real-time imaging platform. Midsagittal, coronal, and axial scan planes of the human upper airway were prescribed and imaged in real-time using a fast spiral gradient-echo pulse sequence. Two native speakers of English produced voiceless and voiced fricatives /f/-/v/, /θ/-/ð/, /s/-/z/, /∫/- in symmetrical maximally contrastive vocalic contexts /a_a/, /i_i/, and /u_u/. Vocal tract videos were synchronized with noise-cancelled audio recordings, facilitating the selection of frames associated with production of English fricatives. RESULTS Coronal slices intersecting the postalveolar region of the vocal tract revealed tongue grooving to be most pronounced during fricative production in back vowel contexts, and more pronounced for sibilants /s/-/z/ than for /∫/-. The axial slice best revealed differences in dorsal and pharyngeal articulation; voiced fricatives were observed to be produced with a larger cross-sectional area in the pharyngeal airway. Partial saturation of spins provided accurate location of imaging planes with respect to each other. CONCLUSION Real-time MRI of multiple intersecting slices can provide valuable spatial and temporal information about vocal tract shaping, including details not observable from a single slice.
Collapse
Affiliation(s)
- Yoon-Chul Kim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
| | | | | | | |
Collapse
|
43
|
Ghosh PK, Goldstein LM, Narayanan SS. Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures. J Acoust Soc Am 2011; 129:4014-4022. [PMID: 21682422 PMCID: PMC3135153 DOI: 10.1121/1.3573987] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Revised: 02/19/2011] [Accepted: 03/13/2011] [Indexed: 05/30/2023]
Abstract
Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed.
Collapse
Affiliation(s)
- Prasanta Kumar Ghosh
- Signal Analysis and Interpretation Laboratory, Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.
| | | | | |
Collapse
|
44
|
Lee CC, Katsamanis A, Black MP, Baucom BR, Georgiou PG, Narayanan SS. Affective State Recognition in Married Couples’ Interactions Using PCA-Based Vocal Entrainment Measures with Multiple Instance Learning. Affective Computing and Intelligent Interaction 2011. [DOI: 10.1007/978-3-642-24571-8_4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
45
|
Kim YC, Hayes CE, Narayanan SS, Nayak KS. Novel 16-channel receive coil array for accelerated upper airway MRI at 3 Tesla. Magn Reson Med 2010; 65:1711-7. [PMID: 21590804 DOI: 10.1002/mrm.22742] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 11/01/2010] [Accepted: 11/07/2010] [Indexed: 11/09/2022]
Abstract
Upper airway MRI can provide a noninvasive assessment of speech and swallowing disorders and sleep apnea. Recent work has demonstrated the value of high-resolution three-dimensional imaging and dynamic two-dimensional imaging and the importance of further improvements in spatio-temporal resolution. The purpose of the study was to describe a novel 16-channel 3 Tesla receive coil that is highly sensitive to the human upper airway and investigate the performance of accelerated upper airway MRI with the coil. In three-dimensional imaging of the upper airway during static posture, 6-fold acceleration is demonstrated using parallel imaging, potentially leading to capturing a whole three-dimensional vocal tract with 1.25 mm isotropic resolution within 9 sec of sustained sound production. Midsagittal spiral parallel imaging of vocal tract dynamics during natural speech production is demonstrated with 2 × 2 mm(2) in-plane spatial and 84 ms temporal resolution.
Collapse
Affiliation(s)
- Yoon-Chul Kim
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089-2564, USA.
| | | | | | | |
Collapse
|
46
|
Kim YC, Narayanan SS, Nayak KS. Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order. Magn Reson Med 2010; 65:1365-71. [PMID: 21500262 DOI: 10.1002/mrm.22714] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Revised: 10/08/2010] [Accepted: 10/12/2010] [Indexed: 11/09/2022]
Abstract
In speech production research using real-time magnetic resonance imaging (MRI), the analysis of articulatory dynamics is performed retrospectively. A flexible selection of temporal resolution is highly desirable because of natural variations in speech rate and variations in the speed of different articulators. The purpose of the study is to demonstrate a first application of golden-ratio spiral temporal view order to real-time speech MRI and investigate its performance by comparison with conventional bit-reversed temporal view order. Golden-ratio view order proved to be more effective at capturing the dynamics of rapid tongue tip motion. A method for automated blockwise selection of temporal resolution is presented that enables the synthesis of a single video from multiple temporal resolution videos and potentially facilitates subsequent vocal tract shape analysis.
Collapse
Affiliation(s)
- Yoon-Chul Kim
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089-2564, USA.
| | | | | |
Collapse
|
47
|
|
48
|
Abstract
We propose an arbitrary order stable allpass filter structure for frequency transformation from Hertz to Bark scale. According to the proposed filter structure, the first order allpass filter is causal, but the second and higher order allpass filters are non-causal. We find that the accuracy of the transformation significantly improves when a second or higher order allpass filter is designed compared to a first order allpass filter. We also find that the RMS error of the transformation monotonically decreases by increasing the order of the allpass filter.
Collapse
|
49
|
Ramanarayanan V, Bresch E, Byrd D, Goldstein L, Narayanan SS. Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation. J Acoust Soc Am 2009; 126:EL160-5. [PMID: 19894792 PMCID: PMC2776778 DOI: 10.1121/1.3213452] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 07/30/2009] [Indexed: 05/22/2023]
Abstract
It is hypothesized that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. Real-time magnetic resonance imaging is used to analyze articulation at and around grammatical and ungrammatical pauses in spontaneous speech. Measures quantifying the speed of articulators were developed and applied during these pauses as well as during their immediate neighborhoods. Grammatical pauses were found to have an appreciable drop in speed at the pause itself as compared to ungrammatical pauses, which is consistent with our hypothesis that grammatical pauses are indeed choreographed by a central cognitive planner.
Collapse
|
50
|
Abstract
We propose a dynamic programming (DP) based piecewise polynomial approximation of discrete data such that the L2 norm of the approximation error is minimized. We apply this technique for the stylization of speech pitch contour. Objective evaluation verifies that the DP based technique indeed yields minimum mean square error (MSE) compared to other approximation methods. Subjective evaluation reveals that the quality of the synthesized speech using stylized pitch contour obtained by the DP method is almost identical to that of the original speech.
Collapse
|