1
|
Benway NR, Preston JL, Salekin A, Hitchcock E, McAllister T. Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders. JASA EXPRESS LETTERS 2024; 4:025201. [PMID: 38299984 DOI: 10.1121/10.0024632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/10/2024] [Indexed: 02/02/2024]
Abstract
The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.
Collapse
Affiliation(s)
- Nina R Benway
- Communication Sciences & Disorders, Syracuse University, Syracuse, New York 13244, USA
- Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA
| | - Jonathan L Preston
- Communication Sciences & Disorders, Syracuse University, Syracuse, New York 13244, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, New York 13244, USA
| | - Elaine Hitchcock
- Communication Sciences & Disorders, Montclair State University, Montclair, New Jersey 07043, USA
| | - Tara McAllister
- Communicative Sciences & Disorders, New York University, New York, New York 10007, , , , ,
| |
Collapse
|
2
|
Li Y, Wohlan BJ, Pham DS, Chan KY, Ward R, Hennessey N, Tan T. Improving Text-Independent Forced Alignment to Support Speech-Language Pathologists with Phonetic Transcription. SENSORS (BASEL, SWITZERLAND) 2023; 23:9650. [PMID: 38139496 PMCID: PMC10747711 DOI: 10.3390/s23249650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 11/27/2023] [Accepted: 11/28/2023] [Indexed: 12/24/2023]
Abstract
Problem: Phonetic transcription is crucial in diagnosing speech sound disorders (SSDs) but is susceptible to transcriber experience and perceptual bias. Current forced alignment (FA) tools, which annotate audio files to determine spoken content and its placement, often require manual transcription, limiting their effectiveness. Method: We introduce a novel, text-independent forced alignment model that autonomously recognises individual phonemes and their boundaries, addressing these limitations. Our approach leverages an advanced, pre-trained wav2vec 2.0 model to segment speech into tokens and recognise them automatically. To accurately identify phoneme boundaries, we utilise an unsupervised segmentation tool, UnsupSeg. Labelling of segments employs nearest-neighbour classification with wav2vec 2.0 labels, before connectionist temporal classification (CTC) collapse, determining class labels based on maximum overlap. Additional post-processing, including overfitting cleaning and voice activity detection, is implemented to enhance segmentation. Results: We benchmarked our model against existing methods using the TIMIT dataset for normal speakers and, for the first time, evaluated its performance on the TORGO dataset containing SSD speakers. Our model demonstrated competitive performance, achieving a harmonic mean score of 76.88% on TIMIT and 70.31% on TORGO. Implications: This research presents a significant advancement in the assessment and diagnosis of SSDs, offering a more objective and less biased approach than traditional methods. Our model's effectiveness, particularly with SSD speakers, opens new avenues for research and clinical application in speech pathology.
Collapse
Affiliation(s)
- Ying Li
- School of EECMS, Curtin University, Bentley, WA 6102, Australia (D.-S.P.)
| | | | - Duc-Son Pham
- School of EECMS, Curtin University, Bentley, WA 6102, Australia (D.-S.P.)
| | - Kit Yan Chan
- School of EECMS, Curtin University, Bentley, WA 6102, Australia (D.-S.P.)
| | - Roslyn Ward
- School of Allied Health, Curtin University, Bentley, WA 6102, Australia
| | - Neville Hennessey
- School of Allied Health, Curtin University, Bentley, WA 6102, Australia
| | - Tele Tan
- School of EECMS, Curtin University, Bentley, WA 6102, Australia (D.-S.P.)
| |
Collapse
|
3
|
Benway NR, Preston JL, Hitchcock E, Rose Y, Salekin A, Liang W, McAllister T. Reproducible Speech Research With the Artificial Intelligence-Ready PERCEPT Corpora. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1986-2009. [PMID: 37319018 DOI: 10.1044/2023_jslhr-22-00343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
BACKGROUND Publicly available speech corpora facilitate reproducible research by providing open-access data for participants who have consented/assented to data sharing among different research teams. Such corpora can also support clinical education, including perceptual training and training in the use of speech analysis tools. PURPOSE In this research note, we introduce the PERCEPT (Perceptual Error Rating for the Clinical Evaluation of Phonetic Targets) corpora, PERCEPT-R (Rhotics) and PERCEPT-GFTA (Goldman-Fristoe Test of Articulation), which together contain over 36 hr of speech audio (> 125,000 syllable, word, and phrase utterances) from children, adolescents, and young adults aged 6-24 years with speech sound disorder (primarily residual speech sound disorders impacting /ɹ/) and age-matched peers. We highlight PhonBank as the repository for the corpora and demonstrate use of the associated speech analysis software, Phon, to query PERCEPT-R. A worked example of research with PERCEPT-R, suitable for clinical education and research training, is included as an appendix. Support for end users and information/descriptive statistics for future releases of the PERCEPT corpora can be found in a dedicated Slack channel. Finally, we discuss the potential for PERCEPT corpora to support the training of artificial intelligence clinical speech technology appropriate for use with children with speech sound disorders, the development of which has historically been constrained by the limited representation of either children or individuals with speech impairments in publicly available training corpora. CONCLUSIONS We demonstrate the use of PERCEPT corpora, PhonBank, and Phon for clinical training and research questions appropriate to child citation speech. Increased use of these tools has the potential to enhance reproducibility in the study of speech development and disorders.
Collapse
Affiliation(s)
- Nina R Benway
- Department of Communication Sciences & Disorders, Syracuse University, NY
| | - Jonathan L Preston
- Department of Communication Sciences & Disorders, Syracuse University, NY
- Haskins Laboratories, New Haven, CT
| | - Elaine Hitchcock
- Department of Communication Sciences and Disorders, Montclair State University, NJ
| | - Yvan Rose
- Department of Linguistics, Memorial University, St. John's, Newfoundland and Labrador, Canada
| | - Asif Salekin
- Department of Electrical Engineering and Computer Science, Syracuse University, NY
| | - Wendy Liang
- Department of Communicative Sciences and Disorders, New York University, NY
| | - Tara McAllister
- Department of Communicative Sciences and Disorders, New York University, NY
| |
Collapse
|
4
|
Costanzo F, Fucà E, Caciolo C, Ruà D, Smolley S, Weissberg D, Vicari S. Talkitt: toward a new instrument based on artificial intelligence for augmentative and alternative communication in children with down syndrome. Front Psychol 2023; 14:1176683. [PMID: 37346421 PMCID: PMC10279874 DOI: 10.3389/fpsyg.2023.1176683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 06/23/2023] Open
Abstract
Introduction Individuals with Down syndrome (DS) often exhibit a severe speech impairment, with important consequences on language intelligibility. For these cases, the use of Augmentative Alternative Communication instruments, that increase an individual's communication abilities, becomes crucial. Talkitt is a mobile application created by Voiceitt Company, exploiting speech recognition technology and artificial intelligence models to translate in real-time unintelligible sounds into clear words, allowing individuals with language production impairment to verbally communicate in real-time. Methods The study evaluated the usability and satisfaction related to the Talkitt application use, as well as effects on adapted behavior and communication, of participants with DS. A final number of 23 individuals with DS, aged 5.54 to 28.9 years, participated in this study and completed 6 months of training. The application was trained to consistently recognize at least 20 different unintelligible words (e.g., nouns and/or short phrases)/person. Results Results revealed good usability and high levels of satisfaction related to the application use. Moreover, we registered improvement in linguistic abilities, particularly naming. Discussion These results paves the road for a potential role of Talkitt application as a supportive and rehabilitative tool for DS.
Collapse
Affiliation(s)
- Floriana Costanzo
- Child and Adolescent Neuropsychiatry Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
| | - Elisa Fucà
- Child and Adolescent Neuropsychiatry Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
| | - Cristina Caciolo
- Child and Adolescent Neuropsychiatry Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
| | - Deborah Ruà
- Child and Adolescent Neuropsychiatry Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
| | | | | | - Stefano Vicari
- Child and Adolescent Neuropsychiatry Unit, Bambino Gesù Children’s Hospital, IRCCS, Rome, Italy
- Department of Life Science and Public Health, Catholic University of the Sacred Heart, Rome, Italy
| |
Collapse
|
5
|
Usha GP, Alex JSR. Speech assessment tool methods for speech impaired children: a systematic literature review on the state-of-the-art in Speech impairment analysis. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:1-38. [PMID: 37362682 PMCID: PMC9986674 DOI: 10.1007/s11042-023-14913-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 12/23/2022] [Accepted: 02/12/2023] [Indexed: 06/28/2023]
Abstract
Speech is a powerful, natural mode of communication that facilitates effective interactions in human societies. However, when fluency or flow of speech is affected or interrupted, it leads to speech impairment. There are several types of speech impairment depending on the speech pattern and range from mild to severe. Childhood apraxia of speech (CAS) is the most common speech disorder in children, with 1 out of 12 children diagnosed globally. Significant advancements in speech assessment tools have been reported to assist speech-language pathologists diagnosis speech impairment. In recent years, speech assessment tools have also gained popularity among pediatricians and teachers who work with preschoolers. Automatic speech tools can be more accurate for detecting speech sound disorders (SSD) than human-based speech assessment methods. This systematic literature review covers 88 studies, including more than 500 children, infants, toddlers, and a few adolescents, (both male and female) (age = 0-17) representing speech impairment from more than 10 countries. It discusses the state-of-the-art speech assessment methods, including tools, techniques, and protocols for speech-impaired children. Additionally, this review summarizes notable outcomes in detecting speech impairments using said assessment methods and discusses various limitations such as universality, reliability, and validity. Finally, we consider the challenges and future directions for speech impairment assessment tool research.
Collapse
Affiliation(s)
- Gowri Prasood Usha
- School of Electronics Engineering, Vellore Institute of Technology, Chennai, 600127 India
| | - John Sahaya Rani Alex
- School of Electronics Engineering, Vellore Institute of Technology, Chennai, 600127 India
| |
Collapse
|
6
|
Novotny M, Cmejla R, Tykalova T. Automated prediction of children's age from voice acoustics. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
7
|
Application of Digital Games for Speech Therapy in Children: A Systematic Review of Features and Challenges. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:4814945. [PMID: 35509705 PMCID: PMC9061057 DOI: 10.1155/2022/4814945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 02/22/2022] [Accepted: 03/02/2022] [Indexed: 11/17/2022]
Abstract
Introduction Treatment of speech disorders during childhood is essential. Many technologies can help speech and language pathologists (SLPs) to practice speech skills, one of which is digital games. This study aimed to systematically investigate the games developed to treat speech disorders and their challenges in children. Methods A comprehensive search was conducted in four databases, including Medline (through PubMed), Scopus, Web of Science, and IEEE Xplore, to retrieve English articles published by July 14, 2021. The articles in which a digital game was developed to treat speech disorders in children were included in the study. Then, the features of the designed games and their challenges were extracted from the studies. Results After reviewing the full texts of 69 articles and assessing them in terms of inclusion and exclusion criteria, 27 articles were included in the systematic review. In these articles, 59.25% of the games had been developed in English language and children with hearing impairments had received much attention from researchers compared to other patients. Also, the Mel-Frequency Cepstral Coefficients (MFCC) algorithm and the PocketSphinx speech recognition engine had been used more than any other speech recognition algorithm and tool. In terms of the games, 48.15% had been designed in a way that children could practice with the help of their parents. The evaluation of games showed a positive effect on children's satisfaction, motivation, and attention during speech therapy exercises. The biggest barriers and challenges mentioned in the studies included sense of frustration, low self-esteem after several failures in playing games, environmental noise, contradiction between games levels and the target group's needs, and problems related to speech recognition. Conclusion The results of this study showed that the games positively affect children's motivation to continue speech therapy, and they can also be used as the SLPs' aids. Before designing these tools, the obstacles and challenges should be considered, and also, the solutions should be suggested.
Collapse
|
8
|
Peterson L, Savarese C, Campbell T, Ma Z, Simpson KO, McAllister T. Telepractice Treatment of Residual Rhotic Errors Using App-Based Biofeedback: A Pilot Study. Lang Speech Hear Serv Sch 2022; 53:256-274. [PMID: 35050705 DOI: 10.1044/2021_lshss-21-00084] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE Although mobile apps are used extensively by speech-language pathologists, evidence for app-based treatments remains limited in quantity and quality. This study investigated the efficacy of app-based visual-acoustic biofeedback relative to nonbiofeedback treatment using a single-case randomization design. Because of COVID-19, all intervention was delivered via telepractice. METHOD Participants were four children aged 9-10 years with residual errors affecting American English /ɹ/. Using a randomization design, individual sessions were randomly assigned to feature practice with or without biofeedback, all delivered using the speech app Speech Therapist's App for /r/ Treatment. Progress was assessed using blinded listener ratings of word probes administered at baseline, posttreatment, and immediately before and after each treatment session. RESULTS All participants showed a clinically significant response to the overall treatment package, with effect sizes ranging from moderate to very large. One participant showed a significant advantage for biofeedback over nonbiofeedback treatment, although the order of treatment delivery poses a potential confound for interpretation in this case. CONCLUSIONS While larger scale studies are needed, these results suggest that app-based treatment for residual errors can be effective when delivered via telepractice. These results are compatible with previous findings in the motor learning literature regarding the importance of treatment dose and the timing of feedback conditions. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.18461576.
Collapse
Affiliation(s)
- Laura Peterson
- Department of Speech-Language Pathology, Rocky Mountain University of Health Professions, Provo, UT
| | | | - Twylah Campbell
- Department of Communicative Sciences and Disorders, New York University, NY
| | - Zhigong Ma
- Department of Communicative Sciences and Disorders, New York University, NY
| | - Kenneth O Simpson
- Department of Speech-Language Pathology, Rocky Mountain University of Health Professions, Provo, UT
| | - Tara McAllister
- Department of Communicative Sciences and Disorders, New York University, NY
| |
Collapse
|
9
|
An Automated Lexical Stress Classification Tool for Assessing Dysprosody in Childhood Apraxia of Speech. Brain Sci 2021; 11:brainsci11111408. [PMID: 34827407 PMCID: PMC8615988 DOI: 10.3390/brainsci11111408] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 10/15/2021] [Accepted: 10/21/2021] [Indexed: 11/17/2022] Open
Abstract
Childhood apraxia of speech (CAS) commonly affects the production of lexical stress contrast in polysyllabic words. Automated classification tools have the potential to increase reliability and efficiency in measuring lexical stress. Here, factors affecting the accuracy of a custom-built deep neural network (DNN)-based classification tool are evaluated. Sixteen children with typical development (TD) and 26 with CAS produced 50 polysyllabic words. Words with strong-weak (SW, e.g., dinosaur) or WS (e.g., banana) stress were fed to the classification tool, and the accuracy measured (a) against expert judgment, (b) for speaker group, and (c) with/without prior knowledge of phonemic errors in the sample. The influence of segmental features and participant factors on tool accuracy was analysed. Linear mixed modelling showed significant interaction between group and stress type, surviving adjustment for age and CAS severity. For TD, agreement for SW and WS words was >80%, but CAS speech was higher for SW (>80%) than WS (~60%). Prior knowledge of segmental errors conferred no clear advantage. Automatic lexical stress classification shows promise for identifying errors in children's speech at diagnosis or with treatment-related change, but accuracy for WS words in apraxic speech needs improvement. Further training of algorithms using larger sets of labelled data containing impaired speech and WS words may increase accuracy.
Collapse
|
10
|
Hair A, Ballard KJ, Markoulli C, Monroe P, Mckechnie J, Ahmed B, Gutierrez-Osuna R. A Longitudinal Evaluation of Tablet-Based Child Speech Therapy with Apraxia World. ACM TRANSACTIONS ON ACCESSIBLE COMPUTING 2021. [DOI: 10.1145/3433607] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Digital games can make speech therapy exercises more enjoyable for children and increase their motivation during therapy. However, many such games developed to date have not been designed for long-term use. To address this issue, we developed Apraxia World, a speech therapy game specifically intended to be played over extended periods. In this study, we examined pronunciation improvements, child engagement over time, and caregiver and automated pronunciation evaluation accuracy while using our game over a multi-month period. Ten children played Apraxia World at home during two counterbalanced 4-week treatment blocks separated by a 2-week break. In one treatment phase, children received pronunciation feedback from caregivers and in the other treatment phase, utterances were evaluated with an automated framework built into the game. We found that children made therapeutically significant speech improvements while using Apraxia World, and that the game successfully increased engagement during speech therapy practice. Additionally, in offline mispronunciation detection tests, our automated pronunciation evaluation framework outperformed a traditional method based on goodness of pronunciation scoring. Our results suggest that this type of speech therapy game is a valid complement to traditional home practice.
Collapse
Affiliation(s)
- Adam Hair
- Texas A&M University, College Station, Texas, USA
| | | | | | | | | | | | | |
Collapse
|
11
|
Kuschmann A, Nayar R, Lowit A, Dunlop M. The use of technology in the management of children with phonological delay and adults with acquired dysarthria: A UK survey of current speech-language pathology practice. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2021; 23:145-154. [PMID: 32408766 DOI: 10.1080/17549507.2020.1750700] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
PURPOSE Technology is increasingly important for the speech-language pathology profession, but little is currently known about its use by clinicians. This study aimed to determine (i) the types of technology that speech-language pathologists (SLPs) in the UK have access to and use in practice and (ii) the barriers they encounter when assessing and treating adults with acquired dysarthria and children with phonological delay. METHOD UK SLPs were invited to complete two online surveys covering device availability, the use of technology for the assessment and treatment of acquired dysarthria and phonological delay, and barriers to using technology. Results were analysed using descriptive statistics. RESULT 126 SLPs completed the surveys. Most respondents had a range of devices available in clinic, including computer and touchscreen devices. Technology was primarily used for treatment to engage clients, provide direct feedback in sessions and encourage home practice. Reported key barriers include lack of knowledge and training, and technical support issues. CONCLUSION The use of technology in UK clinical practice varies widely, and technology adoption is hampered by various barriers. Findings indicate a need for more collaborative work between SLPs, technologists and policy-makers to develop the evidence-base for technology use in the management of acquired dysarthria and phonological delay.
Collapse
Affiliation(s)
- Anja Kuschmann
- School of Psychological Sciences and Health, University of Strathclyde, Glasgow, UK
| | - Revathy Nayar
- Department of Computer and Information Science, University of Strathclyde, Glasgow, UK
| | - Anja Lowit
- School of Psychological Sciences and Health, University of Strathclyde, Glasgow, UK
| | - Mark Dunlop
- Department of Computer and Information Science, University of Strathclyde, Glasgow, UK
| |
Collapse
|
12
|
McLeod S, Ballard KJ, Ahmed B, McGill N, Brown MI. Supporting Children With Speech Sound Disorders During COVID-19 Restrictions: Technological Solutions. ACTA ACUST UNITED AC 2020. [DOI: 10.1044/2020_persp-20-00128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Purpose“Children are the hidden victims of the COVID-19 pandemic” (United Nations Children's Fund, 2020). Timely and effective speech intervention is important to reduce the impact on children's school achievement, ability to make friends, mental health, future life opportunities, and government resources. Prior to the coronavirus disease (COVID-19) pandemic, many Australian children did not receive sufficient speech-language pathology (SLP) services due to long waiting lists in the public health system. COVID-19 restrictions exacerbated this issue, as even children who were at the top of lengthy SLP waiting lists often received limited services, particularly in rural areas. To facilitate children receiving speech intervention remotely during the COVID-19 pandemic, evidence from randomized controlled trials regarding three technological solutions are examined: (a) Phoneme Factory Sound Sorter (Sound Start Study), (b) Waiting for Speech Pathology website, and (c) Apraxia World.ConclusionsFor the first two technological solutions, there were similar gains in speech production between the intervention and control groups, whereas, for the third solution, the average magnitude of treatment effect was comparable to face-to-face SLP therapy. Automated therapy management systems may be able to accelerate speech development and support communication resilience to counteract the effects of the COVID-19 restrictions on children with speech sound disorders. Technology-based strategies may also provide a potential solution to the chronic shortage of SLP services in rural areas into the future.
Collapse
Affiliation(s)
- Sharynne McLeod
- Charles Sturt University, Bathurst, New South Wales, Australia
| | | | - Beena Ahmed
- University of New South Wales, Sydney, Australia
| | - Nicole McGill
- Charles Sturt University, Bathurst, New South Wales, Australia
| | | |
Collapse
|
13
|
Moreno–Torres I, Nava E. Consonant and vowel articulation accuracy in younger and middle-aged Spanish healthy adults. PLoS One 2020; 15:e0242018. [PMID: 33166341 PMCID: PMC7652263 DOI: 10.1371/journal.pone.0242018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/23/2020] [Indexed: 11/20/2022] Open
Abstract
Children acquire vowels earlier than consonants, and the former are less vulnerable to speech disorders than the latter. This study explores the hypothesis that a similar contrast exists later in life and that consonants are more vulnerable to ageing than vowels. Data was obtained with two experiments comparing the speech of Younger Adults (YAs) and Middle–aged Adults (MAs). In the first experiment an Automatic Speech Recognition (ASR) system was trained with a balanced corpus of 29 YAs and 27 MAs. The productions of each speaker were obtained in a Spanish language word (W) and non–word (NW) repetition task. The performance of the system was evaluated with the same corpus used for training using a cross validation approach. The ASR system recognized to a similar extent the Ws of both groups of speakers, but it was more successful with the NWs of the YAs than with those of the MAs. Detailed error analysis revealed that the MA speakers scored below the YA speakers for consonants and also for the place and manner of articulation features; the results were almost identical in both groups of speakers for vowels and for the voicing feature. In the second experiment a group of healthy native listeners was asked to recognize isolated syllables presented with background noise. The target speakers were one YA and one MA that had taken part in the first experiment. The results were consistent with those of the ASR experiment: the manner and place of articulation were better recognized, and vowels and voicing were worse recognized, in the YA speaker than in the MA speaker. We conclude that consonant articulation is more vulnerable to ageing than vowel articulation. Future studies should explore whether or not these early and selective changes in articulation accuracy might be caused by changes in speech perception skills (e.g., in auditory temporal processing).
Collapse
Affiliation(s)
| | - Enrique Nava
- Department of Communications Engineering, University of Málaga, Málaga, Spain
| |
Collapse
|
14
|
Ballard KJ, Etter NM, Shen S, Monroe P, Tien Tan C. Feasibility of Automatic Speech Recognition for Providing Feedback During Tablet-Based Treatment for Apraxia of Speech Plus Aphasia. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2019; 28:818-834. [PMID: 31306595 DOI: 10.1044/2018_ajslp-msc18-18-0109] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose Individuals with neurogenic speech disorders require ongoing therapeutic support to achieve functional communication goals. Alternative methods for service delivery, such as tablet-based speech therapy applications, may help bridge the gap and bring therapeutic interventions to the patient in an engaging way. The purpose of this study was to evaluate an iPad-based speech therapy app that uses automatic speech recognition (ASR) software to provide feedback on speech accuracy to determine the ASR's accuracy against human judgment and whether participants' speech improved with this ASR-based feedback. Method Five participants with apraxia of speech plus aphasia secondary to stroke completed an intensive 4-week at-home therapy program using a novel word training app with built-in ASR. Multiple baselines across participants and behaviors designs were employed, with weekly probes and follow-up at 1 month posttreatment. Four sessions a week of 100 practice trials each were prescribed, with 1 being clinician-run and the remainder done independently. Dependent variables of interest were ASR-human agreement on accuracy during practice trials and human-judged word production accuracy over time in probes. Also, user experience surveys were completed immediately posttreatment. Results ASR-human agreement on accuracy averaged ~80%, which is a common threshold applied for interrater agreement. All participants demonstrated improved word production accuracy over time with the ASR-based feedback and maintenance of gains after 1 month. All participants reported enjoying using the app with support of a speech pathologist. Conclusion For these participants with apraxia of speech plus aphasia due to stroke, satisfactory gains were made in word production accuracy with an app-based therapy program providing ASR-based feedback on accuracy. Findings support further testing of this ASR-based approach as a supplement to clinician-run sessions to assist clients with similar profiles in achieving higher amount and intensity of practice as well as empowering them to manage their own therapy program. Supplemental Material https://doi.org/10.23641/asha.8206628.
Collapse
Affiliation(s)
- Kirrie J Ballard
- Faculty of Health Sciences, University of Sydney, New South Wales, Australia
| | - Nicole M Etter
- Department of Communication Sciences and Disorders, Pennsylvania State University, University Park
| | - Songjia Shen
- Games Studio, University of Technology Sydney, New South Wales, Australia
| | - Penelope Monroe
- Faculty of Health Sciences, University of Sydney, New South Wales, Australia
| | - Chek Tien Tan
- InfoComm Technology Cluster, Singapore Institute of Technology, Singapore
| |
Collapse
|
15
|
McAllister T, Ballard KJ. Bringing advanced speech processing technology to the clinical management of speech disorders. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2018; 20:581-582. [PMID: 31274356 DOI: 10.1080/17549507.2018.1510034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Accepted: 08/06/2018] [Indexed: 06/09/2023]
Affiliation(s)
- Tara McAllister
- a Department of Communicative Sciences and Disorders, New York University , New York , NY , USA
| | - Kirrie J Ballard
- b Faculty of Health Sciences, The University of Sydney , Sydney , Australia
| |
Collapse
|