1
|
Tremblay P, Sato M. Movement-related cortical potential and speech-induced suppression during speech production in younger and older adults. BRAIN AND LANGUAGE 2024; 253:105415. [PMID: 38692095 DOI: 10.1016/j.bandl.2024.105415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 05/03/2024]
Abstract
With age, the speech system undergoes important changes that render speech production more laborious, slower and often less intelligible. And yet, the neural mechanisms that underlie these age-related changes remain unclear. In this EEG study, we examined two important mechanisms in speech motor control: pre-speech movement-related cortical potential (MRCP), which reflects speech motor planning, and speaking-induced suppression (SIS), which indexes auditory predictions of speech motor commands, in 20 healthy young and 20 healthy older adults. Participants undertook a vowel production task which was followed by passive listening of their own recorded vowels. Our results revealed extensive differences in MRCP in older compared to younger adults. Further, while longer latencies were observed in older adults on N1 and P2, in contrast, the SIS was preserved. The observed reduced MRCP appears as a potential explanatory mechanism for the known age-related slowing of speech production, while preserved SIS suggests intact motor-to-auditory integration.
Collapse
Affiliation(s)
- Pascale Tremblay
- Université Laval, Faculté de Médecine, Département de Réadaptation, Quebec City G1V 0A6, Canada; CERVO Brain Research Center, Quebec City G1J 2G3, Canada.
| | - Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France
| |
Collapse
|
2
|
Neuhaus TJ, Scherer RC, Whitfield JA. Gender Perception of Speech: Dependence on Fundamental Frequency, Implied Vocal Tract Length, and Source Spectral Tilt. J Voice 2024:S0892-1997(24)00016-X. [PMID: 38789366 DOI: 10.1016/j.jvoice.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 05/26/2024]
Abstract
OBJECTIVE To investigate how listeners use fundamental frequency, implied vocal tract length, and source spectral tilt to infer speaker gender. METHODS Sound files each containing the vowels /i, æ, ɑ, u/ interspersed by brief silences were synthesized. Each of the 210 stimuli was a combination of 10 values for fundamental frequency and 7 values for implied vocal tract length (and the associated formant frequencies) ranging from male-typical to female-typical, and 3 values for source spectral tilt approximating the voice qualities of breathy, normal, and pressed. Twenty-three listeners judged each synthesized "speaker" as "female" or "male." Generalized linear mixed model analysis was used to determine the extent to which fundamental frequency, implied vocal track length, and spectral tilt influenced listener judgment. RESULTS Increasing fundamental frequency and decreasing implied vocal tract length resulted in increased probability of female judgment. Two interactions were identified: An increase in fundamental frequency and also a decrease in source spectral tilt (more negative) resulted in a greater increase in the probability of female judgment when the vocal tract length was relatively short. CONCLUSIONS The relationships among fundamental frequency, implied vocal tract length, source spectral tilt, and probability of female judgment changed across the range of normal values, suggesting that the relative contributions of fundamental frequency and implied vocal tract length to gender perception varied over the ranges studied. There was no threshold of fundamental frequency or implied vocal tract length that dramatically shifted the perception between male and female.
Collapse
Affiliation(s)
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, Ohio.
| | - Jason A Whitfield
- Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, Ohio
| |
Collapse
|
3
|
Clarke H, Leav S, Zestic J, Mohamed I, Salisbury I, Sanderson P. Enhanced Neonatal Pulse Oximetry Sounds for the First Minutes of Life: A Laboratory Trial. HUMAN FACTORS 2024; 66:1017-1036. [PMID: 35993422 DOI: 10.1177/00187208221118472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
OBJECTIVE Auditory enhancements to the pulse oximetry tone may help clinicians detect deviations from target ranges for oxygen saturation (SpO2) and heart rate (HR). BACKGROUND Clinical guidelines recommend target ranges for SpO2 and HR during neonatal resuscitation in the first 10 minutes after birth. The pulse oximeter currently maps HR to tone rate, and SpO2 to tone pitch. However, deviations from target ranges for SpO2 and HR are not easy to detect. METHOD Forty-one participants were presented with 30-second simulated scenarios of an infant's SpO2 and HR levels in the first minutes after birth. Tremolo marked distinct HR ranges and formants marked distinct SpO2 ranges. Participants were randomly allocated to conditions: (a) No Enhancement control, (b) Enhanced HR Only, (c) Enhanced SpO2 Only, and (d) Enhanced Both. RESULTS Participants in the Enhanced HR Only and Enhanced SpO2 Only conditions identified HR and SpO2 ranges, respectively, more accurately than participants in the No Enhancement condition, ps < 0.001. In the Enhanced Both condition, the tremolo enhancement of HR did not affect participants' ability to identify SpO2 range, but the formants enhancement of SpO2 may have attenuated participants' ability to identify tremolo-enhanced HR range. CONCLUSION Tremolo and formant enhancements improve range identification for HR and SpO2, respectively, and could improve clinicians' ability to identify SpO2 and HR ranges in the first minutes after birth. APPLICATION Enhancements to the pulse oximeter tone to indicate clinically important ranges could improve the management of oxygen delivery to the neonate during resuscitation in the first 10 minutes after birth.
Collapse
Affiliation(s)
- Hugh Clarke
- School of Psychology, The University of Queensland, St Lucia, QLD, Australia
| | - Samnang Leav
- School of Psychology, The University of Queensland, St Lucia, QLD, Australia
| | - Jelena Zestic
- School of Psychology, The University of Queensland, St Lucia, QLD, Australia
| | - Ismail Mohamed
- School of Psychology, The University of Queensland, St Lucia, QLD, Australia
| | - Isaac Salisbury
- School of Psychology, The University of Queensland, St Lucia, QLD, Australia
| | - Penelope Sanderson
- School of Psychology
- School of Information Technology and Electrical Engineering, and
- School of Clinical Medicine, The University of Queensland, St Lucia, QLD, Australia
| |
Collapse
|
4
|
Isaev DY, Vlasova RM, Di Martino JM, Stephen CD, Schmahmann JD, Sapiro G, Gupta AS. Uncertainty of Vowel Predictions as a Digital Biomarker for Ataxic Dysarthria. CEREBELLUM (LONDON, ENGLAND) 2024; 23:459-470. [PMID: 37039956 PMCID: PMC10826261 DOI: 10.1007/s12311-023-01539-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/27/2023] [Indexed: 04/12/2023]
Abstract
Dysarthria is a common manifestation across cerebellar ataxias leading to impairments in communication, reduced social connections, and decreased quality of life. While dysarthria symptoms may be present in other neurological conditions, ataxic dysarthria is a perceptually distinct motor speech disorder, with the most prominent characteristics being articulation and prosody abnormalities along with distorted vowels. We hypothesized that uncertainty of vowel predictions by an automatic speech recognition system can capture speech changes present in cerebellar ataxia. Speech of participants with ataxia (N=61) and healthy controls (N=25) was recorded during the "picture description" task. Additionally, participants' dysarthric speech and ataxia severity were assessed on a Brief Ataxia Rating Scale (BARS). Eight participants with ataxia had speech and BARS data at two timepoints. A neural network trained for phoneme prediction was applied to speech recordings. Average entropy of vowel tokens predictions (AVE) was computed for each participant's recording, together with mean pitch and intensity standard deviations (MPSD and MISD) in the vowel segments. AVE and MISD demonstrated associations with BARS speech score (Spearman's rho=0.45 and 0.51), and AVE demonstrated associations with BARS total (rho=0.39). In the longitudinal cohort, Wilcoxon pairwise signed rank test demonstrated an increase in BARS total and AVE, while BARS speech and acoustic measures did not significantly increase. Relationship of AVE to both BARS speech and BARS total, as well as the ability to capture disease progression even in absence of measured speech decline, indicates the potential of AVE as a digital biomarker for cerebellar ataxia.
Collapse
Affiliation(s)
- Dmitry Yu Isaev
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
| | - Roza M Vlasova
- Department of Psychiatry, UNC School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - J Matias Di Martino
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
| | - Christopher D Stephen
- Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Jeremy D Schmahmann
- Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Guillermo Sapiro
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
- Departments of Mathematics & Computer Science, Duke University, Durham, NC, USA
| | - Anoopum S Gupta
- Ataxia Center & Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Heller Murray E. Conducting high-quality and reliable acoustic analysis: A tutorial focused on training research assistants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2603-2611. [PMID: 38629881 PMCID: PMC11026110 DOI: 10.1121/10.0025536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/19/2024]
Abstract
Open science practices have led to an increase in available speech datasets for researchers interested in acoustic analysis. Accurate evaluation of these databases frequently requires manual or semi-automated analysis. The time-intensive nature of these analyses makes them ideally suited for research assistants in laboratories focused on speech and voice production. However, the completion of high-quality, consistent, and reliable analyses requires clear rules and guidelines for all research assistants to follow. This tutorial will provide information on training and mentoring research assistants to complete these analyses, covering areas including RA training, ongoing data analysis monitoring, and documentation needed for reliable and re-creatable findings.
Collapse
Affiliation(s)
- Elizabeth Heller Murray
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
6
|
Södersten M, Oates J, Sand A, Granqvist S, Quinn S, Dacakis G, Nygren U. Gender-Affirming Voice Training for Trans Women: Acoustic Outcomes and Their Associations With Listener Perceptions Related to Gender. J Voice 2024:S0892-1997(24)00023-7. [PMID: 38503674 DOI: 10.1016/j.jvoice.2024.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/21/2024]
Abstract
OBJECTIVES To investigate acoustic outcomes of gender-affirming voice training for trans women wanting to develop a female sounding voice and to describe what happens acoustically when male sounding voices become more female sounding. STUDY DESIGN Prospective treatment study with repeated measures. METHODS N = 74 trans women completed a voice training program of 8-12 sessions and had their voices audio recorded twice before and twice after training. Reference data were obtained from N = 40 cisgender speakers. Fundamental frequency (fo), formant frequencies (F1-F4), sound pressure level (Leq), and level difference between first and second harmonic (L1-L2) were extracted from a reading passage and spontaneous speech. N = 79 naive listeners provided gender-related ratings of participants' audio recordings. A linear mixed-effects model was used to estimate average training effects. Individual level analyses determined how changes in acoustic data were related to listeners' ratings. RESULTS Group data showed substantial training effects on fo (average, minimum, and maximum) and formant frequencies. Individual data demonstrated that many participants also increased Leq and some increased L1-L2. Measures that most strongly predicted listener ratings of a female sounding voice were: fo, average formant frequency, and Leq. CONCLUSIONS This is the largest prospective study reporting on acoustic outcomes of gender-affirming voice training for trans women. We confirm findings from previous smaller scale studies by demonstrating that listener perceptions of male and female sounding voices are related to acoustic voice features, and that voice training for trans women wanting to sound female is associated with desirable acoustic changes, indicating training effectiveness. Although acoustic measures can be a valuable indicator of training effectiveness, particularly from the perspective of clinicians and researchers, we contend that a combination of outcome measures, including client perspectives, are needed to provide comprehensive evaluation of gender-affirming voice training that is relevant for all stakeholders.
Collapse
Affiliation(s)
- Maria Södersten
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden; Speech and Language Pathology, Medical Unit, Karolinska University Hospital, Stockholm, Sweden.
| | - Jennifer Oates
- Discipline of Speech Pathology, School of Allied Health, Human Services and Sport, La Trobe University, Melbourne, Australia
| | - Anders Sand
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Svante Granqvist
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Sterling Quinn
- Discipline of Speech Pathology, School of Allied Health, Human Services and Sport, La Trobe University, Melbourne, Australia
| | - Georgia Dacakis
- Discipline of Speech Pathology, School of Allied Health, Human Services and Sport, La Trobe University, Melbourne, Australia
| | - Ulrika Nygren
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden; Speech and Language Pathology, Medical Unit, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
7
|
Singh VP, Sahidullah M, Kinnunen T. ChildAugment: Data augmentation methods for zero-resource children's speaker verification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2221-2232. [PMID: 38530014 DOI: 10.1121/10.0025178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 02/19/2024] [Indexed: 03/27/2024]
Abstract
The accuracy of modern automatic speaker verification (ASV) systems, when trained exclusively on adult data, drops substantially when applied to children's speech. The scarcity of children's speech corpora hinders fine-tuning ASV systems for children's speech. Hence, there is a timely need to explore more effective ways of reusing adults' speech data. One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment. Specifically, we modify the formant frequencies and formant bandwidths of adult speech to emulate children's speech. The modified spectra are used to train emphasized channel attention, propagation, and aggregation in time-delay neural network recognizer for children. We compare ChildAugment against various state-of-the-art data augmentation techniques for children's ASV. We also extensively compare different scoring methods, including cosine scoring, probabilistic linear discriminant analysis (PLDA), and neural PLDA. We also propose a low-complexity weighted cosine score for extremely low-resource children ASV. Our findings on the CSLU kids corpus indicate that ChildAugment holds promise as a simple, acoustics-motivated approach, for improving state-of-the-art deep learning based ASV for children. We achieve up to 12.45% (boys) and 11.96% (girls) relative improvement over the baseline. For reproducibility, we provide the evaluation protocols and codes here.
Collapse
Affiliation(s)
| | - Md Sahidullah
- Institute for Advancing Intelligence, TCG CREST, Kolkata, West Bengal 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad- 201002, India
| | - Tomi Kinnunen
- School of Computing, University of Eastern Finland, Joensuu 80130, Finland
| |
Collapse
|
8
|
Simeone PJ, Green JR, Tager-Flusberg H, Chenausky KV. Vowel distinctiveness as a concurrent predictor of expressive language function in autistic children. Autism Res 2024; 17:419-431. [PMID: 38348589 DOI: 10.1002/aur.3102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 01/10/2024] [Indexed: 02/21/2024]
Abstract
Speech ability may limit spoken language development in some minimally verbal autistic children. In this study, we aimed to determine whether an acoustic measure of speech production, vowel distinctiveness, is concurrently related to expressive language (EL) for autistic children. Syllables containing the vowels [i] and [a] were recorded remotely from 27 autistic children (4;1-7;11) with a range of spoken language abilities. Vowel distinctiveness was calculated using automatic formant tracking software. Robust hierarchical regressions were conducted with receptive language (RL) and vowel distinctiveness as predictors of EL. Hierarchical regressions were also conducted within a High EL and a Low EL subgroup. Vowel distinctiveness accounted for 29% of the variance in EL for the entire group, RL for 38%. For the Low EL group, only vowel distinctiveness was significant, accounting for 38% of variance in EL. Conversely, in the High EL group, only RL was significant and accounted for 26% of variance in EL. Replicating previous results, speech production and RL significantly predicted concurrent EL in autistic children, with speech production being the sole significant predictor for the Low EL group and RL the sole significant predictor for the High EL group. Further work is needed to determine whether vowel distinctiveness longitudinally, as well as concurrently, predicts EL. Findings have important implications for the early identification of language impairment and in developing language interventions for autistic children.
Collapse
Affiliation(s)
- Paul J Simeone
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Division of Allied health and Supportive Technology, May Institute, Randolph, Massachusetts, USA
| | - Jordan R Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Department of Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard University, Cambridge, Massachusetts, USA
| | - Helen Tager-Flusberg
- Department of Psychological & Brain Sciences, College of Arts and Sciences, Boston University, Boston, Massachusetts, USA
| | - Karen V Chenausky
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Department of Neurology, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
9
|
van Zelst AL, Earle FS. A Matter of Time: A Web-Based Investigation of Rest and Sleep Effects on Speech Motor Learning. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:59-71. [PMID: 38056482 PMCID: PMC11000790 DOI: 10.1044/2023_jslhr-22-00309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 09/07/2022] [Accepted: 09/29/2023] [Indexed: 12/08/2023]
Abstract
PURPOSE Here, we examine the possibility that memory consolidation during a period of postpractice rest or nocturnal sleep can bolster speech motor learning in the absence of additional practice or effort. METHOD Using web-administered experiments, 74 typical, American English talkers trained in a nonnative vowel contrast then had a 12-hr delay with (SLEEP) or without nocturnal sleep (REST) or proceeded immediately (IMMEDIATE) to a posttraining production assessment. For ecological validity, 51 native Danish talkers perceptually identified the American English talkers' productions. RESULTS We observed that practice resulted in productions that were more acoustically similar to the Danish target. In addition, we found that rest in the absence of further practice reduced the token-to-token variability of the productions. Last, for vowels produced immediately following training, listeners more accurately identified vowels in the trained context, whereas in the untrained context, listener accuracy improved only for vowels produced by talkers who slept. CONCLUSIONS A single session of speech motor training promotes observable change to speech production behavior. Specifically, practice facilitates acoustic similarity to the target. Moreover, although a 12-hr postpractice period of rest appears to promote productions that are less variable, only the productions of those who slept are perceived as more accurate by listeners. This may point to sleep's role in contextualizing the acoustic goal of the production to the learner's own vocal tract and its role as a protective mechanism during learning. These results are unaccounted for under existing models and offer potential for future educational and clinical applications to maximize speech motor learning. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.24707442.
Collapse
Affiliation(s)
- Anne L. van Zelst
- Department of Communication Sciences & Disorders, University of Delaware, Newark
| | - F. Sayako Earle
- Department of Communication Sciences & Disorders, University of Delaware, Newark
| |
Collapse
|
10
|
Lester-Smith RA, Derrick E, Larson CR. Characterization of Source-Filter Interactions in Vocal Vibrato Using a Neck-Surface Vibration Sensor: A Pilot Study. J Voice 2024; 38:1-9. [PMID: 34649740 PMCID: PMC8995401 DOI: 10.1016/j.jvoice.2021.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/23/2022]
Abstract
PURPOSE Vocal vibrato is a singing technique that involves periodic modulation of fundamental frequency (fo) and intensity. The physiological sources of modulation within the speech mechanism and the interactions between the laryngeal source and vocal tract filter in vibrato are not fully understood. Therefore, the purpose of this study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded signals from a neck-surface vibration sensor and a microphone, which represent features of the source before and after supraglottal vocal tract filtering. METHOD Nine classically-trained singers produced sustained vowels with vibrato while simultaneous signals were recorded using a vibration sensor and a microphone. Acoustical analyses were performed to measure the rate and extent of fo and intensity modulation for each trial. Paired-samples sign tests were used to analyze differences between the rate and extent of fo and intensity modulation in the vibration sensor and microphone signals. RESULTS The rate and extent of fo modulation and the extent of intensity modulation were equivalent in the vibration sensor and microphone signals, but the rate of intensity modulation was significantly higher in the microphone signal than in the vibration sensor signal. Larger differences in the rate of intensity modulation were seen with vowels that typically have smaller differences between the first and second formant frequencies. CONCLUSIONS This study demonstrated that the rate of intensity modulation at the source prior to supraglottal vocal tract filtering, as measured in neck-surface vibration sensor signals, was lower than the rate of intensity modulation after supraglottal vocal tract filtering, as measured in microphone signals. The difference in rate varied based on the vowel. These findings provide further support of the resonance-harmonics interaction in vocal vibrato. Further investigation is warranted to determine if differences in the physiological source(s) of vibrato account for inconsistent relationships between the extent of intensity modulation in neck-surface vibration sensor and microphone signals.
Collapse
Affiliation(s)
- Rosemary A Lester-Smith
- Department of Physical Medicine & Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, Illinois.
| | - Elaina Derrick
- Department of Speech, Language and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas
| | - Charles R Larson
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois
| |
Collapse
|
11
|
Baker CP, Brockmann-Bauser M, Purdy SC, Rakena TO. High and Wide: An In Silico Investigation of Frequency, Intensity, and Vibrato Effects on Widely Applied Acoustic Voice Perturbation and Noise Measures. J Voice 2023:S0892-1997(23)00316-8. [PMID: 37925330 DOI: 10.1016/j.jvoice.2023.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/05/2023] [Accepted: 10/05/2023] [Indexed: 11/06/2023]
Abstract
OBJECTIVES This in silico study explored the effects of a wide range of fundamental frequency (fo), source-spectrum tilt (SST), and vibrato extent (VE) on commonly used frequency and amplitude perturbation and noise measures. METHOD Using 53 synthesized tones produced in Madde, the effects of stepwise increases in fo, intensity (modeled by decreasing SST), and VE on the PRAAT parameters jitter % (local), relative average perturbation (RAP) %, shimmer % (local), amplitude perturbation quotient 3 (APQ3) %, and harmonics-to-noise ratio (HNR) dB were investigated. A secondary experiment was conducted to determine whether any fo effects on jitter, RAP, shimmer, APQ3, and HNR were stable. A total of 10 sinewaves were synthesized in Sopran from 100 to 1000 Hz using formant frequencies for /a/, /i/, and /u/-like vowels, respectively. All effects were statistically assessed with Kendall's tau-b and partial correlation. RESULTS Increasing fo resulted in an overall increase in jitter, RAP, shimmer, and APQ3 values, respectively (P < 0.01). Oscillations of the data across the explored fo range were observed in all measurement outputs. In the Sopran tests, the oscillatory pattern seen in the Madde fo condition remained and showed differences between vowel conditions. Increasing intensity (decreasing SST) led to reduced pitch and amplitude perturbation and HNR (P < 0.05). Increasing VE led to lower HNR and an almost linear increase of all other measures (P < 0.05). CONCLUSION These novel data offer a controlled demonstration for the behavior of jitter (local) %, RAP %, shimmer (local) %, APQ3 %, and HNR (dB) when varying fo, SST, and VE in synthesized tones. Since humans will vary in all of these aspects in spoken language and vowel phonation, researchers should take potential resonance-harmonics type effects into account when comparing intersubject or preintervention and postintervention data using these measures.
Collapse
Affiliation(s)
- Calvin Peter Baker
- Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand; School of Music, University of Auckland, Auckland, New Zealand.
| | - Meike Brockmann-Bauser
- Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Suzanne C Purdy
- Speech Science, School of Psychology, University of Auckland, Auckland, New Zealand
| | - Te Oti Rakena
- School of Music, University of Auckland, Auckland, New Zealand
| |
Collapse
|
12
|
Kim JA, Jang H, Choi Y, Min YG, Hong YH, Sung JJ, Choi SJ. Subclinical articulatory changes of vowel parameters in Korean amyotrophic lateral sclerosis patients with perceptually normal voices. PLoS One 2023; 18:e0292460. [PMID: 37831677 PMCID: PMC10575489 DOI: 10.1371/journal.pone.0292460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
The available quantitative methods for evaluating bulbar dysfunction in patients with amyotrophic lateral sclerosis (ALS) are limited. We aimed to characterize vowel properties in Korean ALS patients, investigate associations between vowel parameters and clinical features of ALS, and analyze subclinical articulatory changes of vowel parameters in those with perceptually normal voices. Forty-three patients with ALS (27 with dysarthria and 16 without dysarthria) and 20 healthy controls were prospectively collected in the study. Dysarthria was assessed using the ALS Functional Rating Scale-Revised (ALSFRS-R) speech subscores, with any loss of 4 points indicating the presence of dysarthria. The structured speech samples were recorded and analyzed using Praat software. For three corner vowels (/a/, /i/, and /u/), data on the vowel duration, fundamental frequency, frequencies of the first two formants (F1 and F2), harmonics-to-noise ratio, vowel space area (VSA), and vowel articulation index (VAI) were extracted from the speech samples. Corner vowel durations were significantly longer in ALS patients with dysarthria than in healthy controls. The F1 frequency of /a/, F2 frequencies of /i/ and /u/, the VSA, and the VAI showed significant differences between ALS patients with dysarthria and healthy controls. The area under the curve (AUC) was 0.912. The F1 frequency of /a/ and the VSA were the major determinants for differentiating ALS patients who had not yet developed apparent dysarthria from healthy controls (AUC 0.887). In linear regression analyses, as the ALSFRS-R speech subscore decreased, both the VSA and VAI were reduced. In contrast, vowel durations were found to be rather prolonged. The analyses of vowel parameters provided a useful metric correlated with disease severity for detecting subclinical bulbar dysfunction in ALS patients.
Collapse
Affiliation(s)
- Jin-Ah Kim
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | - Hayeun Jang
- Division of English, Busan University of Foreign Studies, Busan, Republic of Korea
| | - Yoonji Choi
- Department of Korean Language and Literature, Seoul National University, Seoul, Republic of Korea
| | - Young Gi Min
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Yoon-Ho Hong
- Department of Neurology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Jung-Joon Sung
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Neuroscience Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Seok-Jin Choi
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Center for Hospital Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| |
Collapse
|
13
|
Feng Y, Chen F, Ma J, Wang L, Peng G. Production of Mandarin consonant aspiration and monophthongs in children with Autism Spectrum Disorder. CLINICAL LINGUISTICS & PHONETICS 2023; 37:899-918. [PMID: 35848409 DOI: 10.1080/02699206.2022.2099302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/29/2022] [Accepted: 07/04/2022] [Indexed: 06/15/2023]
Abstract
Impaired speech sound production adds difficulties to social communication in children with Autism Spectrum Disorder (ASD), while a limited attempt has been made to figure out the speech sound production among Mandarin-speaking children with ASD. The current study conducted both auditory-perceptual scoring and quantitative acoustic analysis of speech sound imitated by 27 Mandarin-speaking children with ASD (3.33-7.00 years) and 30 chronological-age-matched typically developing (TD) children. Auditory-perceptual scoring showed significantly lower scores for aspirated/unaspirated consonants and monophthongs in children with ASD. Moreover, the correlation between the developmental age of language and production accuracy in children with ASD emphasised the importance of language assessment. The quantitative acoustic analysis further indicated that the ASD group produced a much shorter voice onset time for aspirated consonants and showed a reduced vowel space than the TD group. Early interventions focusing on these production patterns should be introduced to improve the speech sound production in Mandarin-speaking children with ASD.
Collapse
Affiliation(s)
- Yan Feng
- School of Foreign Studies, Nanjing University of Science and Technology, Nanjing, Jiangsu province, China
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR
| | - Fei Chen
- School of Foreign Languages, Hunan University, Changsha, Hunan, China
| | - Junzhou Ma
- School of Foreign Languages, Taizhou University, Taizhou, Zhejiang, China
| | - Lan Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| |
Collapse
|
14
|
Roland V, Huet K, Harmegnies B, Piccaluga M, Verhaegen C, Delvaux V. Vowel production: a potential speech biomarker for early detection of dysarthria in Parkinson's disease. Front Psychol 2023; 14:1129830. [PMID: 37701868 PMCID: PMC10493417 DOI: 10.3389/fpsyg.2023.1129830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 07/26/2023] [Indexed: 09/14/2023] Open
Abstract
Objectives Our aim is to detect early, subclinical speech biomarkers of dysarthria in Parkinson's disease (PD), i.e., systematic atypicalities in speech that remain subtle, are not easily detectible by the clinician, so that the patient is labeled "non-dysarthric." Based on promising exploratory work, we examine here whether vowel articulation, as assessed by three acoustic metrics, can be used as early indicator of speech difficulties associated with Parkinson's disease. Study design This is a prospective case-control study. Methods Sixty-three individuals with PD and 35 without PD (healthy controls-HC) participated in this study. Out of 63 PD patients, 43 had been diagnosed with dysarthria (DPD) and 20 had not (NDPD). Sustained vowels were recorded for each speaker and formant frequencies were measured. The analyses focus on three acoustic metrics: individual vowel triangle areas (tVSA), vowel articulation index (VAI) and the Phi index. Results tVSA were found to be significantly smaller for DPD speakers than for HC. The VAI showed significant differences between these two groups, indicating greater centralization and lower vowel contrasts in the DPD speakers with dysarhtria. In addition, DPD and NDPD speakers had lower Phi values, indicating a lower organization of their vowel system compared to the HC. Results also showed that the VAI index was the most efficient to distinguish between DPD and NDPD whereas the Phi index was the best acoustic metric to discriminate NDPD and HC. Conclusion This acoustic study identified potential subclinical vowel-related speech biomarkers of dysarthria in speakers with Parkinson's disease who have not been diagnosed with dysarthria.
Collapse
Affiliation(s)
- Virginie Roland
- Metrology and Language Sciences Unit, Mons, Belgium
- Research Institute for Language Science and Technology, University of Mons, Mons, Belgium
| | - Kathy Huet
- Metrology and Language Sciences Unit, Mons, Belgium
- Research Institute for Language Science and Technology, University of Mons, Mons, Belgium
| | - Bernard Harmegnies
- Research Institute for Language Science and Technology, University of Mons, Mons, Belgium
| | - Myriam Piccaluga
- Metrology and Language Sciences Unit, Mons, Belgium
- Research Institute for Language Science and Technology, University of Mons, Mons, Belgium
| | - Clémence Verhaegen
- Metrology and Language Sciences Unit, Mons, Belgium
- Research Institute for Language Science and Technology, University of Mons, Mons, Belgium
| | - Véronique Delvaux
- Metrology and Language Sciences Unit, Mons, Belgium
- Research Institute for Language Science and Technology, University of Mons, Mons, Belgium
- National Fund for Scientific Research, Brussels, Belgium
| |
Collapse
|
15
|
Kuo C, Berry J. The Relationship Between Acoustic and Kinematic Vowel Space Areas With and Without Normalization for Speakers With and Without Dysarthria. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 32:1923-1937. [PMID: 37105919 PMCID: PMC10561967 DOI: 10.1044/2023_ajslp-22-00158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/09/2022] [Accepted: 01/17/2023] [Indexed: 06/19/2023]
Abstract
PURPOSE Few studies have reported on the vowel space area (VSA) in both acoustic and kinematic domains. This study examined acoustic and kinematic VSAs for speakers with and without dysarthria and evaluated effects of normalization on acoustic and kinematic VSAs and the relationship between these measures. METHOD Vowel data from 12 speakers with and without dysarthria, presenting with a range of speech abilities, were examined. The speakers included four speakers with Parkinson's disease (PD), four speakers with brain injury (BI), and four neurotypical (NT) speakers. Speech acoustic and kinematic data were acquired simultaneously using electromagnetic articulography during a passage reading task. Raw and normalized VSAs calculated from corner vowels /i/, /æ/, /ɑ/, and /u/ were evaluated. Normalization was achieved through z score transformations to the acoustic and kinematic data. The effect of normalization on variability within and across groups was evaluated. Regression analysis was used across speakers to assess the association between acoustic and kinematic VSAs for both raw and normalized data. RESULTS When evaluating the speakers as three different groups (i.e., PD, BI, and NT), normalization reduced the standard deviations within each group and changed the relative differences in average magnitude between groups. Regression analysis revealed a significant relationship between normalized, but not raw, acoustic and kinematic VSAs, after the exclusion of an outlier speaker. CONCLUSIONS Normalization reduces the variability across speakers, within groups, and changes average magnitudes affecting speaker group comparisons. Normalization also influences the correlation between acoustic and kinematic measures. Further investigation of the impact of normalization techniques upon acoustic and kinematic measures is warranted. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.22669747.
Collapse
Affiliation(s)
- Christina Kuo
- Department of Communication Sciences and Disorders, James Madison University, Harrisonburg, VA
| | - Jeffrey Berry
- Department of Speech Pathology and Audiology, Marquette University, Milwaukee, WI
| |
Collapse
|
16
|
Baker CP, Purdy SC, Rakena TO, Bonnini S. It Sounds like It Feels: Preliminary Exploration of an Aeroacoustic Diagnostic Protocol for Singers. J Clin Med 2023; 12:5130. [PMID: 37568532 PMCID: PMC10420037 DOI: 10.3390/jcm12155130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 07/26/2023] [Accepted: 07/31/2023] [Indexed: 08/13/2023] Open
Abstract
To date, no established protocol exists for measuring functional voice changes in singers with subclinical singing-voice complaints. Hence, these may go undiagnosed until they progress into greater severity. This exploratory study sought to (1) determine which scale items in the self-perceptual Evaluation of Ability to Sing Easily (EASE) are associated with instrumental voice measures, and (2) construct as proof-of-concept an instrumental index related to singers' perceptions of their vocal function and health status. Eighteen classical singers were acoustically recorded in a controlled environment singing an /a/ vowel using soft phonation. Aerodynamic data were collected during a softly sung /papapapapapapa/ task with the KayPENTAX Phonatory Aerodynamic System. Using multi and univariate linear regression techniques, CPPS, vibrato jitter, vibrato shimmer, and an efficiency ratio (SPL/PSub) were included in a significant model (p < 0.001) explaining 62.4% of variance in participants' composite scores of three scale items related to vocal fatigue. The instrumental index showed a significant association (p = 0.001) with the EASE vocal fatigue subscale overall. Findings illustrate that an aeroacoustic instrumental index may be useful for monitoring functional changes in the singing voice as part of a multidimensional diagnostic approach to preventative and rehabilitative voice healthcare for professional singing-voice users.
Collapse
Affiliation(s)
- Calvin Peter Baker
- Speech Science, School of Psychology, University of Auckland, Auckland 1023, New Zealand;
- School of Music, University of Auckland, Auckland 1010, New Zealand;
| | - Suzanne C. Purdy
- Speech Science, School of Psychology, University of Auckland, Auckland 1023, New Zealand;
| | - Te Oti Rakena
- School of Music, University of Auckland, Auckland 1010, New Zealand;
| | - Stefano Bonnini
- Department of Economics & Management, University of Ferrara, 44121 Ferrara, Italy;
| |
Collapse
|
17
|
Birkholz P, Blandin R, Kürbis S. Bandwidths of vocal tract resonances in physical models compared to transmission-line simulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:3281. [PMID: 37307363 DOI: 10.1121/10.0019682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/25/2023] [Indexed: 06/14/2023]
Abstract
This study investigated how the bandwidths of resonances simulated by transmission-line models of the vocal tract compare to bandwidths measured from physical three-dimensional printed vowel resonators. Three types of physical resonators were examined: models with realistic vocal tract shapes based on Magnetic Resonance Imaging (MRI) data, straight axisymmetric tubes with varying cross-sectional areas, and two-tube approximations of the vocal tract with notched lips. All physical models had hard walls and closed glottis so the main loss mechanisms contributing to the bandwidths were sound radiation, viscosity, and heat conduction. These losses were accordingly included in the simulations, in two variants: A coarse approximation of the losses with frequency-independent lumped elements, and a detailed, theoretically more precise loss model. Across the examined frequency range from 0 to 5 kHz, the resonance bandwidths increased systematically from the simulations with the coarse loss model to the simulations with the detailed loss model, to the tube-shaped physical resonators, and to the MRI-based resonators. This indicates that the simulated losses, especially the commonly used approximations, underestimate the real losses in physical resonators. Hence, more realistic acoustic simulations of the vocal tract require improved models for viscous and radiation losses.
Collapse
Affiliation(s)
- Peter Birkholz
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, 01062, Germany
| | - Rémi Blandin
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, 01062, Germany
| | - Steffen Kürbis
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, 01062, Germany
| |
Collapse
|
18
|
Maffei MF, Chenausky KV, Gill SV, Tager-Flusberg H, Green JR. Oromotor skills in autism spectrum disorder: A scoping review. Autism Res 2023; 16:879-917. [PMID: 37010327 PMCID: PMC10365059 DOI: 10.1002/aur.2923] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 03/15/2023] [Indexed: 04/04/2023]
Abstract
Oromotor functioning plays a foundational role in spoken communication and feeding, two areas of significant difficulty for many autistic individuals. However, despite years of research and established differences in gross and fine motor skills in this population, there is currently no clear consensus regarding the presence or nature of oral motor control deficits in autistic individuals. In this scoping review, we summarize research published between 1994 and 2022 to answer the following research questions: (1) What methods have been used to investigate oromotor functioning in autistic individuals? (2) Which oromotor behaviors have been investigated in this population? and (3) What conclusions can be drawn regarding oromotor skills in this population? Seven online databases were searched resulting in 107 studies meeting our inclusion criteria. Included studies varied widely in sample characteristics, behaviors analyzed, and research methodology. The large majority (81%) of included studies report a significant oromotor abnormality related to speech production, nonspeech oromotor skills, or feeding within a sample of autistic individuals based on age norms or in comparison to a control group. We examine these findings to identify trends, address methodological aspects hindering cross-study synthesis and generalization, and provide suggestions for future research.
Collapse
Affiliation(s)
- Marc F. Maffei
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
| | - Karen V. Chenausky
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Neurology Department, Harvard Medical School, Boston, Massachusetts, USA
| | - Simone V. Gill
- College of Health and Rehabilitation Sciences, Sargent College, Boston University, Boston, Massachusetts, USA
| | - Helen Tager-Flusberg
- Department of Psychological and Brain Sciences, Boston University, Boston, Massachusetts, USA
| | - Jordan R. Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Speech and Hearing Biosciences and Technology Program, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
19
|
Herbst CT, Story BH, Meyer D. Acoustical Theory of Vowel Modification Strategies in Belting. J Voice 2023:S0892-1997(23)00004-8. [PMID: 37080890 DOI: 10.1016/j.jvoice.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 04/22/2023]
Abstract
Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.
Collapse
Affiliation(s)
- Christian T Herbst
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia; Department of Vocal Studies, Mozarteum University, Salzburg, Austria.
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia
| |
Collapse
|
20
|
Vorperian HK, Kent RD, Lee Y, Buhr KA. Vowel Production in Children and Adults With Down Syndrome: Fundamental and Formant Frequencies of the Corner Vowels. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1208-1239. [PMID: 37015000 PMCID: PMC10187968 DOI: 10.1044/2022_jslhr-22-00510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/01/2022] [Accepted: 12/21/2022] [Indexed: 05/18/2023]
Abstract
PURPOSE Atypical vowel production contributes to reduced speech intelligibility in children and adults with Down syndrome (DS). This study compares the acoustic data of the corner vowels /i/, /u/, /æ/, and /ɑ/ from speakers with DS against typically developing/developed (TD) speakers. METHOD Measurements of the fundamental frequency (f o) and first four formant frequencies (F1-F4) were obtained from single word recordings containing the target vowels from 81 participants with DS (ages 3-54 years) and 293 TD speakers (ages 4-92 years), all native speakers of English. The data were used to construct developmental trajectories and to determine interspeaker and intraspeaker variability. RESULTS Trajectories for DS differed from TD based on age and sex, but the groups were similar with the striking change in f o and F1-F4 frequencies around age 10 years. Findings confirm higher f o in DS, and vowel-specific differences between DS and TD in F1 and F2 frequencies, but not F3 and F4. The measure of F2 differences of front-versus-back vowels was more sensitive of compression than reduced vowel space area/centralization across age and sex. Low vowels had more pronounced F2 compression as related to reduced speech intelligibility. Intraspeaker variability was significantly greater for DS than TD for nearly all frequency values across age. DISCUSSION Vowel production differences between DS and TD are age- and sex-specific, which helps explain contradictory results in previous studies. Increased intraspeaker variability across age in DS confirms the presence of a persisting motor speech disorder. Atypical vowel production in DS is common and related to dysmorphology, delayed development, and disordered motor control.
Collapse
Affiliation(s)
- Houri K. Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin–Madison
| | - Raymond D. Kent
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin–Madison
| | - Yen Lee
- Department of Educational Leadership, Edgewood College, Madison, Wisconsin
| | - Kevin A. Buhr
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison
| |
Collapse
|
21
|
Novotny M, Cmejla R, Tykalova T. Automated prediction of children's age from voice acoustics. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
22
|
The Formant Bandwidth as a Measure of Vowel Intelligibility in Dysphonic Speech. J Voice 2023; 37:173-177. [PMID: 33143999 DOI: 10.1016/j.jvoice.2020.10.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 10/13/2020] [Accepted: 10/15/2020] [Indexed: 11/21/2022]
Abstract
OBJECTIVE The current paper examined the impact of dysphonia on the bandwidth of the first two formants of vowels, and the relationship between the formant bandwidth and vowel intelligibility. METHODS Speaker participants of the study were 10 adult females with healthy voice and 10 adult females with dysphonic voice. Eleven vowels in American English were recorded in /h/-vowel-/d/ format. The vowels were presented to 10 native speakers of American English with normal hearing, who were asked to select a vowel they heard from a list of /h/-vowel-/d/ words. The vowels were acoustically analyzed to measure the bandwidth of the first and second formants (B1 and B2). Separate Wilcoxon rank sum tests were conducted for each vowel for normal and dysphonic speech because the differences in B1 and B2 were found to not be normally distributed. Spearman correlation tests were conducted to evaluate the association between the difference in formant bandwidths and vowel intelligibility between the healthy and dysphonic speakers. RESULTS B1 was significantly greater in dysphonic vowels for seven of the eleven vowels, and lesser for only one of the vowels. There was no statistically significant difference in B2 between the normal and dysphonic vowels, except for the vowel /i/. The difference in B1 between normal and dysphonic vowels strongly predicted the intelligibility difference. CONCLUSION Dysphonia significantly affects B1, and the difference in B1 may serve as an acoustic marker for the intelligibility reduction in dysphonic vowels. This acoustic-perceptual relationship should be confirmed by a larger-scale study in the future.
Collapse
|
23
|
Pravitharangul N, Miyamoto JJ, Yoshizawa H, Matsumoto T, Suzuki S, Chantarawaratit PO, Moriyama K. Vowel sound production and its association with cephalometric characteristics in skeletal Class III subjects. Eur J Orthod 2023; 45:20-28. [PMID: 35731636 DOI: 10.1093/ejo/cjac031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND This study aimed to evaluate differences in vowel production using acoustic analysis in skeletal Class III and Class I Japanese participants and to identify the correlation between vowel sounds and cephalometric variables in skeletal Class III subjects. MATERIALS AND METHODS Japanese males with skeletal Class III (ANB < 0°) and Class I skeletal anatomy (0.62° < ANB < 5.94°) were recruited (n = 18/group). Acoustic analysis of vowel sounds and cephalometric analysis of lateral cephalograms were performed. For sound analysis, an isolated Japanese vowel (/a/,/i/,/u/,/e/,/o/) pattern was recorded. Praat software was used to extract acoustic parameters such as fundamental frequency (F0) and the first four formants (F1, F2, F3, and F4). The formant graph area was calculated. Cephalometric values were obtained using ImageJ. Correlations between acoustic and cephalometric variables in skeletal Class III subjects were then investigated. RESULTS Skeletal Class III subjects exhibited significantly higher/o/F2 and lower/o/F4 values. Mandibular length, SNB, and overjet of Class III subjects were moderately negatively correlated with acoustic variables. LIMITATIONS This study did not take into account vertical skeletal patterns and tissue movements during sound production. CONCLUSION Skeletal Class III males produced different /o/ (back and rounded vowel), possibly owing to their anatomical positions or adaptive changes. Vowel production was moderately associated with cephalometric characteristics of Class III subjects. Thus, changes in speech after orthognathic surgery may be expected. A multidisciplinary team approach that included the input of a speech pathologist would be useful.
Collapse
Affiliation(s)
- Natthaporn Pravitharangul
- Department of Maxillofacial Orthognathics, Division of Maxillofacial and Neck Reconstruction, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Japan.,Department of Orthodontics, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.,Tokyo Medical and Dental University and Chulalongkorn University International Joint Degree Doctor of Philosophy Program in Orthodontics
| | - Jun J Miyamoto
- Department of Maxillofacial Orthognathics, Division of Maxillofacial and Neck Reconstruction, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Japan
| | - Hideyuki Yoshizawa
- Department of Maxillofacial Orthognathics, Division of Maxillofacial and Neck Reconstruction, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Japan
| | - Tsutomu Matsumoto
- Department of Maxillofacial Orthognathics, Division of Maxillofacial and Neck Reconstruction, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Japan
| | - Shoichi Suzuki
- Department of Maxillofacial Orthognathics, Division of Maxillofacial and Neck Reconstruction, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Japan
| | | | - Keiji Moriyama
- Department of Maxillofacial Orthognathics, Division of Maxillofacial and Neck Reconstruction, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), Japan
| |
Collapse
|
24
|
Ali IE, Sumita Y, Wakabayashi N. Comparison of Praat and Computerized Speech Lab for formant analysis of five Japanese vowels in maxillectomy patients. Front Neurosci 2023; 17:1098197. [PMID: 36816122 PMCID: PMC9928875 DOI: 10.3389/fnins.2023.1098197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 01/16/2023] [Indexed: 02/04/2023] Open
Abstract
Introduction Speech impairment is a common complication after surgical resection of maxillary tumors. Maxillofacial prosthodontists play a critical role in restoring this function so that affected patients can enjoy better lives. For that purpose, several acoustic software packages have been used for speech evaluation, among which Computerized Speech Lab (CSL) and Praat are widely used in clinical and research contexts. Although CSL is a commercial product, Praat is freely available on the internet and can be used by patients and clinicians to practice several therapy goals. Therefore, this study aimed to determine if both software produced comparable results for the first two formant frequencies (F1 and F2) and their respective formant ranges obtained from the same voice samples from Japanese participants with maxillectomy defects. Methods CSL was used as a reference to evaluate the accuracy of Praat with both the default and newly proposed adjusted settings. Thirty-seven participants were enrolled in this study for formant analysis of the five Japanese vowels (a/i/u/e/o) using CSL and Praat. Spearman's rank correlation coefficient was used to judge the correlation between the analysis results of both programs regarding F1 and F2 and their respective formant ranges. Results As the findings pointed out, highly positive correlations between both software were found for all acoustic features and all Praat settings. Discussion The strong correlations between the results of both CSL and Praat suggest that both programs may have similar decision strategies for atypical speech and for both sexes. This study highlights that the default settings in Praat can be used for formant analysis in maxillectomy patients with predictable accuracy. The proposed adjusted settings in Praat can yield more accurate results for formant analysis of atypical speech in maxillectomy cases when the examiner cannot precisely locate the formant frequencies using the default settings or confirm analysis results obtained using CSL.
Collapse
Affiliation(s)
- Islam E. Ali
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan,Department of Prosthodontics, Faculty of Dentistry, Mansoura University, Mansoura, Egypt
| | - Yuka Sumita
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan,*Correspondence: Yuka Sumita,
| | - Noriyuki Wakabayashi
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
25
|
Herbst CT, Elemans CPH, Tokuda IT, Chatziioannou V, Švec JG. Dynamic System Coupling in Voice Production. J Voice 2023:S0892-1997(22)00310-1. [PMID: 36737267 DOI: 10.1016/j.jvoice.2022.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/07/2022] [Accepted: 10/07/2022] [Indexed: 02/04/2023]
Abstract
Voice is a major means of communication for humans, non-human mammals and many other vertebrates like birds and anurans. The physical and physiological principles of voice production are described by two theories: the MyoElastic-AeroDynamic (MEAD) theory and the Source-Filter Theory (SFT). While MEAD employs a multiphysics approach to understand the motor control and dynamics of self-sustained vibration of vocal folds or analogous tissues, SFT predominantly uses acoustics to understand spectral changes of the source via linear propagation through the vocal tract. Because the two theories focus on different aspects of voice production, they are often applied distinctly in specific areas of science and engineering. Here, we argue that the MEAD and the SFT are linked integral aspects of a holistic theory of voice production, describing a dynamically coupled system. The aim of this manuscript is to provide a comprehensive review of both the MEAD and the source-filter theory with its nonlinear extension, the latter of which suggests a number of conceptual similarities to sound production in brass instruments. We discuss the application of both theories to voice production of humans as well as of animals. An appraisal of voice production in the light of non-linear dynamics supports the notion that voice production can best be described with a systems view, considering coupled systems rather than isolated contributions of individual sub-systems.
Collapse
Affiliation(s)
- Christian T Herbst
- Department of Vocal Studies, Mozarteum University, Salzburg, Austria; Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia. http://www.christian-herbst.org
| | - Coen P H Elemans
- Vocal Neuromechanics Lab, Department of Biology, University of Southern Denmark, Odense M, Denmark
| | - Isao T Tokuda
- Department of Mechanical Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| | | | - Jan G Švec
- Voice Research Laboratory, Department of Experimental Physics, Faculty of Science, Palacky University Olomouc, Olomouc, Czech Republic
| |
Collapse
|
26
|
Zhang LM, Li Y, Zhang YT, Ng GW, Leau YB, Yan H. A Deep Learning Method Using Gender-Specific Features for Emotion Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:1355. [PMID: 36772395 PMCID: PMC9921859 DOI: 10.3390/s23031355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 01/20/2023] [Accepted: 01/22/2023] [Indexed: 06/18/2023]
Abstract
Speech reflects people's mental state and using a microphone sensor is a potential method for human-computer interaction. Speech recognition using this sensor is conducive to the diagnosis of mental illnesses. The gender difference of speakers affects the process of speech emotion recognition based on specific acoustic features, resulting in the decline of emotion recognition accuracy. Therefore, we believe that the accuracy of speech emotion recognition can be effectively improved by selecting different features of speech for emotion recognition based on the speech representations of different genders. In this paper, we propose a speech emotion recognition method based on gender classification. First, we use MLP to classify the original speech by gender. Second, based on the different acoustic features of male and female speech, we analyze the influence weights of multiple speech emotion features in male and female speech, and establish the optimal feature sets for male and female emotion recognition, respectively. Finally, we train and test CNN and BiLSTM, respectively, by using the male and the female speech emotion feature sets. The results show that the proposed emotion recognition models have an advantage in terms of average recognition accuracy compared with gender-mixed recognition models.
Collapse
Affiliation(s)
- Li-Min Zhang
- Key Laboratory for Artificial Intelligence and Cognitive Neuroscience of Language, Xi’an International Studies University, Xi’an 610116, China
- Faculty of Computing and Informatics, Universiti Malaysia Sabah, Sabah 88400, Malaysia
| | - Yang Li
- Key Laboratory for Artificial Intelligence and Cognitive Neuroscience of Language, Xi’an International Studies University, Xi’an 610116, China
| | - Yue-Ting Zhang
- Key Laboratory for Artificial Intelligence and Cognitive Neuroscience of Language, Xi’an International Studies University, Xi’an 610116, China
| | - Giap Weng Ng
- Faculty of Computing and Informatics, Universiti Malaysia Sabah, Sabah 88400, Malaysia
| | - Yu-Beng Leau
- Faculty of Computing and Informatics, Universiti Malaysia Sabah, Sabah 88400, Malaysia
| | - Hao Yan
- Key Laboratory for Artificial Intelligence and Cognitive Neuroscience of Language, Xi’an International Studies University, Xi’an 610116, China
| |
Collapse
|
27
|
Albuquerque L, Oliveira C, Teixeira A, Sa-Couto P, Figueiredo D. A Comprehensive Analysis of Age and Gender Effects in European Portuguese Oral Vowels. J Voice 2023; 37:143.e13-143.e29. [PMID: 33293174 DOI: 10.1016/j.jvoice.2020.10.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/30/2020] [Accepted: 10/30/2020] [Indexed: 01/11/2023]
Abstract
The knowledge about the age effects in speech acoustics is still disperse and incomplete. This study extends the analyses of the effects of age and gender on acoustics of European Portuguese (EP) oral vowels, in order to complement initial studies with limited sets of acoustic parameters, and to further investigate unclear or inconsistent results. A database of EP vowels produced by a group of 113 adults, aged between 35 and 97, was used. Duration, fundamental frequency (f0), formant frequencies (F1 to F3), and a selection of vowel space metrics (F1 and F2 range ratios, vowel articulation index [VAI] and formant centralization ratio [FCR]) were analyzed. To avoid the arguable division into age groups, the analyses considered age as a continuous variable. The most relevant age-related results included: vowel duration increase in both genders; a general tendency to formant frequencies decrease for females; changes that were consistent with vowel centralization for males, confirmed by the vowel space acoustic indexes; and no evidence of F3 decrease with age, in both genders. This study has contributed to knowledge on aging speech, providing new information for an additional language. The results corroborated that acoustic characteristics of speech change with age and present different patterns between genders.
Collapse
Affiliation(s)
- Luciana Albuquerque
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal; Center for Health Technology and Services Research, University of Aveiro, Aveiro, Portugal; Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal; Department of Education and Psychology, University of Aveiro, Aveiro, Portugal.
| | - Catarina Oliveira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal; School of Health Science, University of Aveiro, Aveiro, Portugal
| | - António Teixeira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal; Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Pedro Sa-Couto
- Center for Research and Development in Mathematics and Applications, University of Aveiro, Aveiro, Portugal; Department of Mathematics, University of Aveiro, Aveiro, Portugal
| | - Daniela Figueiredo
- Center for Health Technology and Services Research, University of Aveiro, Aveiro, Portugal; School of Health Science, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
28
|
Skrabal D, Rusz J, Novotny M, Sonka K, Ruzicka E, Dusek P, Tykalova T. Articulatory undershoot of vowels in isolated REM sleep behavior disorder and early Parkinson's disease. NPJ Parkinsons Dis 2022; 8:137. [PMID: 36266347 PMCID: PMC9584921 DOI: 10.1038/s41531-022-00407-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 10/04/2022] [Indexed: 11/09/2022] Open
Abstract
Imprecise vowels represent a common deficit associated with hypokinetic dysarthria resulting from a reduced articulatory range of motion in Parkinson's disease (PD). It is not yet unknown whether the vowel articulation impairment is already evident in the prodromal stages of synucleinopathy. We aimed to assess whether vowel articulation abnormalities are present in isolated rapid eye movement sleep behaviour disorder (iRBD) and early-stage PD. A total of 180 male participants, including 60 iRBD, 60 de-novo PD and 60 age-matched healthy controls performed reading of a standardized passage. The first and second formant frequencies of the corner vowels /a/, /i/, and /u/ extracted from predefined words, were utilized to construct articulatory-acoustic measures of Vowel Space Area (VSA) and Vowel Articulation Index (VAI). Compared to controls, VSA was smaller in both iRBD (p = 0.01) and PD (p = 0.001) while VAI was lower only in PD (p = 0.002). iRBD subgroup with abnormal olfactory function had smaller VSA compared to iRBD subgroup with preserved olfactory function (p = 0.02). In PD patients, the extent of bradykinesia and rigidity correlated with VSA (r = -0.33, p = 0.01), while no correlation between axial gait symptoms or tremor and vowel articulation was detected. Vowel articulation impairment represents an early prodromal symptom in the disease process of synucleinopathy. Acoustic assessment of vowel articulation may provide a surrogate marker of synucleinopathy in scenarios where a single robust feature to monitor the dysarthria progression is needed.
Collapse
Affiliation(s)
- Dominik Skrabal
- grid.411798.20000 0000 9100 9940Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Jan Rusz
- grid.411798.20000 0000 9100 9940Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic ,grid.6652.70000000121738213Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic ,grid.5734.50000 0001 0726 5157Department of Neurology & ARTORG Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Michal Novotny
- grid.6652.70000000121738213Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | - Karel Sonka
- grid.411798.20000 0000 9100 9940Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Evzen Ruzicka
- grid.411798.20000 0000 9100 9940Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Petr Dusek
- grid.411798.20000 0000 9100 9940Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Tereza Tykalova
- grid.6652.70000000121738213Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| |
Collapse
|
29
|
Öhlund Wistbacka G, Shen W, Brunskog J. Virtual reality head-mounted displays affect sidetone perception. JASA EXPRESS LETTERS 2022; 2:105202. [PMID: 36319214 DOI: 10.1121/10.0014605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The purpose of this study was to investigate whether head-mounted displays (HMDs) change the sidetone to an auditory perceivable extent. Impulse responses (IRs) were recorded using a dummy head wearing a HMD (IRtest) and compared to IRs measured without HMD (IRref). Ten naive listeners were tested on their ability to discriminate between the IRtest and IRref using convolved speech signals. The spectral analysis showed that the HMDs decreased the spectral energy of the sidetone around 2000-4500 Hz. Most listeners were able to discriminate between the IRs. It is concluded that HMDs change the sidetone to a small but perceivable extent.
Collapse
Affiliation(s)
- Greta Öhlund Wistbacka
- Acoustic Technology, Department of Electrical and Photonics Engineering, Technical University of Denmark, Kongens Lyngby DK-2800, Denmark , ,
| | - Weihan Shen
- Acoustic Technology, Department of Electrical and Photonics Engineering, Technical University of Denmark, Kongens Lyngby DK-2800, Denmark , ,
| | - Jonas Brunskog
- Acoustic Technology, Department of Electrical and Photonics Engineering, Technical University of Denmark, Kongens Lyngby DK-2800, Denmark , ,
| |
Collapse
|
30
|
Myers BR, Mathy P, Roy N. Behavioral Treatment Approaches to Lowering Pitch in the Female Voice. J Voice 2022:S0892-1997(22)00241-7. [PMID: 36096897 DOI: 10.1016/j.jvoice.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/03/2022] [Accepted: 08/04/2022] [Indexed: 11/27/2022]
Abstract
PURPOSE To assess the outcomes of three voice therapy treatment approaches with an emphasis on lowering speaking pitch. Transmasculine and cisgender individuals may desire to lower their speaking pitch, yet there has not been a method described in the literature to do this effectively using only behavioral techniques. METHOD To investigate these approaches, we enrolled 32 adult cisgender females and randomly assigned them to one of four treatment groups: vocal function exercises (VFE), resonant voice therapy (RVT), lip-rounding therapy (LRT), and a control group. Participants received individual instruction and feedback on the given exercise program, and they continued to practice daily for 4 weeks. RESULTS Acoustic recordings were collected before treatment, immediately after the first session, and after 4 weeks of treatment. Results showed a lower minimum pitch in the physiological range, lower speaking fundamental frequency (SFF) in reading, and lower SFF in spontaneous speech-with treatment groups performing better than the control group. Additionally, participants' self-rating of the vocal effort expended to speak in a low pitch decreased over the treatment period. CONCLUSIONS Each treatment approach (VFE, RVT, and LRT) was successful in lowering the speaking pitch of cisgender females. These methods would likely be useful for clients seeking to speak in a lower pitch. Future research may expand results to include clinical populations, such as transmasculine individuals.
Collapse
Affiliation(s)
- Brett R Myers
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, UT.
| | - Pamela Mathy
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, UT
| | - Nelson Roy
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, UT
| |
Collapse
|
31
|
Martínez-Cifuentes R, Soto-Barba J. Desempeño fonético-acústico de vocales en hablantes del español chileno con enfermedad de Parkinson en estadios iniciales. REVISTA DE INVESTIGACIÓN EN LOGOPEDIA 2022. [DOI: 10.5209/rlog.79132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
La articulación de los sonidos lingüísticos consonánticos y vocálicos se afecta en la enfermedad de Parkinson (EP). En el caso de las vocales, esta alteración se manifiesta acústicamente en la estructura formántica y en el área de espacio vocálico. Debido a que no se ha explorado esta temática en Chile, la investigación tuvo por objetivo contrastar el desempeño fonético-acústico de vocales entre hablantes del español chileno con EP inicial y sin la enfermedad. Se efectuó un estudio cuantitativo, cuasiexperimental y correlacional. 15 hablantes con EP (M=69.6 años, DE=7.46) y 15 sin EP (M=70.07 años, DE=7.75) leyeron 30 frases que contenían las cinco vocales del español de Chile. Se analizaron los centros de frecuencia (F1 y F2) y los anchos de banda (B1 y B2) de los formantes vocálicos, y cinco índices del área de espacio vocálico. Se evidenciaron diferencias en el B2 de /i/ y /u/ entre personas con y sin EP; en el F1 de /e/ y /u/, el F2 de /u/, el B1 de /e/ y el B2 de /o/ entre hombres con y sin EP; y en el B2 de /i/ entre mujeres con y sin EP (p<.05). De esta forma, se reporta el desempeño acústico de las vocales en hablantes del español chileno con enfermedad de Parkinson.
Collapse
|
32
|
Modern Responses to Traditional Pitfalls in Gender Affirming Behavioral Voice Modification. Otolaryngol Clin North Am 2022; 55:727-738. [PMID: 35752493 DOI: 10.1016/j.otc.2022.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Gender-affirming behavioral voice modification has primarily been directed by cisgender clinicians who do not actively live or master the process of voice modification themselves but instead observe it from the outside looking in. The lack of a "lived experience" by cisgender instructors naturally leaves gaps and oversights that may reduce the effective potential of voice training. Input from transgender people who have learned voice modifications techniques is key to providing the best possible care. Ear training, direct vocal modeling, and mastery of gender-modification techniques are crucial elements that are less emphasized in the current system.
Collapse
|
33
|
Jibson J. Formant detail needed for identifying, rating, and discriminating vowels in Wisconsin English. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:4004. [PMID: 35778208 DOI: 10.1121/10.0011539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 05/12/2022] [Indexed: 06/15/2023]
Abstract
Neel [(2004). Acoust. Res. Lett. Online 5, 125-131] asked how much time-varying formant detail is needed for vowel identification. In that study, multiple stimuli were synthesized for each vowel: 1-point (monophthongal with midpoint frequencies), 2-point (linear from onset to offset), 3-point, 5-point, and 11-point. Results suggested that a 3-point model was optimal. This conflicted with the dual-target hypothesis of vowel inherent spectral change research, which has found that two targets are sufficient to model vowel identification. The present study replicates and expands upon the work of Neel. Ten English monophthongs were chosen for synthesis. One-, two-, three-, and five-point vowels were created as described above, and another 1-point stimulus was created with onset frequencies rather than midpoint frequencies. Three experiments were administered (n = 18 for each): vowel identification, goodness rating, and discrimination. The results ultimately align with the dual-target hypothesis, consistent with most vowel inherent spectral change studies.
Collapse
Affiliation(s)
- Jonathan Jibson
- English Department, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
34
|
Carl M, Levy ES, Icht M. Speech treatment for Hebrew-speaking adolescents and young adults with developmental dysarthria: A comparison of mSIT and Beatalk. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2022; 57:660-679. [PMID: 35363414 DOI: 10.1111/1460-6984.12715] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 02/16/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Individuals with developmental dysarthria typically demonstrate reduced functioning of one or more of the speech subsystems, which negatively impacts speech intelligibility and communication within social contexts. A few treatment approaches are available for improving speech production and intelligibility among individuals with developmental dysarthria. However, these approaches have only limited application and research findings among adolescents and young adults. AIMS To determine and compare the effectiveness of two treatment approaches, the modified Speech Intelligibility Treatment (mSIT) and the Beatalk technique, on speech production and intelligibility among Hebrew-speaking adolescents and young adults with developmental dysarthria. METHODS & PROCEDURES Two matched groups of adolescents and young adults with developmental dysarthria participated in the study. Each received one of the two treatments, mSIT or Beatalk, over the course of 9 weeks. Measures of speech intelligibility, articulatory accuracy, voice and vowel acoustics were assessed both pre- and post-treatment. OUTCOMES & RESULTS Both the mSIT and Beatalk groups demonstrated gains in at least some of the outcome measures. Participants in the mSIT group exhibited improvement in speech intelligibility and voice measures, while participants in the Beatalk group demonstrated increased articulatory accuracy and gains in voice measures from pre- to post-treatment. Significant increases were noted post-treatment for first formant values for select vowels. CONCLUSIONS & IMPLICATIONS Results of this preliminary study are promising for both treatment approaches. The differentiated results indicate their distinct application to speech intelligibility deficits. The current findings also hold clinical significance for treatment among adolescents and young adults with motor speech disorders and application for a language other than English. WHAT THIS PAPER ADDS What is already known on the subject Developmental dysarthria (e.g., secondary to cerebral palsy) is a motor speech disorder that negatively impacts speech intelligibility, and thus communication participation. Select treatment approaches are available with the aim of improving speech intelligibility in individuals with developmental dysarthria; however, these approaches are limited in number and have only seldomly been applied specifically to adolescents and young adults. What this paper adds to existing knowledge The current study presents preliminary data regarding two treatment approaches, the mSIT and Beatalk technique, administered to Hebrew-speaking adolescents and young adults with developmental dysarthria in a group setting. Results demonstrate the initial effectiveness of the treatment approaches, with different gains noted for each approach across speech and voice domains. What are the potential or actual clinical implications of this work? The findings add to the existing literature on potential treatment approaches aiming to improve speech production and intelligibility among individuals with developmental dysarthria. The presented approaches also show promise for group-based treatments as well as the potential for improvement among adolescents and young adults with motor speech disorders.
Collapse
Affiliation(s)
- Micalle Carl
- Department of Communication Disorders, Ariel University, Ariel, Israel
| | - Erika S Levy
- Teachers College, Columbia University, New York, NY, USA
| | - Michal Icht
- Department of Communication Disorders, Ariel University, Ariel, Israel
| |
Collapse
|
35
|
Sato M. Motor and visual influences on auditory neural processing during speaking and listening. Cortex 2022; 152:21-35. [DOI: 10.1016/j.cortex.2022.03.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 02/02/2022] [Accepted: 03/15/2022] [Indexed: 11/03/2022]
|
36
|
Exploring the Age Effects on European Portuguese Vowel Production: An Ultrasound Study. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
For aging speech, there is limited knowledge regarding the articulatory adjustments underlying the acoustic findings observed in previous studies. In order to investigate the age-related articulatory differences in European Portuguese (EP) vowels, the present study analyzes the tongue configuration of the nine EP oral vowels (isolated context and pseudoword context) produced by 10 female speakers of two different age groups (young and old). From the tongue contours automatically segmented from the US images and manually revised, the parameters (tongue height and tongue advancement) were extracted. The results suggest that the tongue tends to be higher and more advanced for the older females compared to the younger ones for almost all vowels. Thus, the vowel articulatory space tends to be higher, advanced, and bigger with age. For older females, unlike younger females that presented a sharp reduction in the articulatory vowel space in disyllabic sequences, the vowel space tends to be more advanced for isolated vowels compared with vowels produced in disyllabic sequences. This study extends our pilot research by reporting articulatory data from more speakers based on an improved automatic method of tongue contours tracing, and it performs an inter-speaker comparison through the application of a novel normalization procedure.
Collapse
|
37
|
Sanchez-Alonso S, Aslin RN. Towards a model of language neurobiology in early development. BRAIN AND LANGUAGE 2022; 224:105047. [PMID: 34894429 DOI: 10.1016/j.bandl.2021.105047] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 10/24/2021] [Accepted: 10/27/2021] [Indexed: 06/14/2023]
Abstract
Understanding language neurobiology in early childhood is essential for characterizing the developmental structural and functional changes that lead to the mature adult language network. In the last two decades, the field of language neurodevelopment has received increasing attention, particularly given the rapid advances in the implementation of neuroimaging techniques and analytic approaches that allow detailed investigations into the developing brain across a variety of cognitive domains. These methodological and analytical advances hold the promise of developing early markers of language outcomes that allow diagnosis and clinical interventions at the earliest stages of development. Here, we argue that findings in language neurobiology need to be integrated within an approach that captures the dynamic nature and inherent variability that characterizes the developing brain and the interplay between behavior and (structural and functional) neural patterns. Accordingly, we describe a framework for understanding language neurobiology in early development, which minimally requires an explicit characterization of the following core domains: i) computations underlying language learning mechanisms, ii) developmental patterns of change across neural and behavioral measures, iii) environmental variables that reinforce language learning (e.g., the social context), and iv) brain maturational constraints for optimal neural plasticity, which determine the infant's sensitivity to learning from the environment. We discuss each of these domains in the context of recent behavioral and neuroimaging findings and consider the need for quantitatively modeling two main sources of variation: individual differences or trait-like patterns of variation and within-subject differences or state-like patterns of variation. The goal is to enable models that allow prediction of language outcomes from neural measures that take into account these two types of variation. Finally, we examine how future methodological approaches would benefit from the inclusion of more ecologically valid paradigms that complement and allow generalization of traditional controlled laboratory methods.
Collapse
Affiliation(s)
| | - Richard N Aslin
- Haskins Laboratories, New Haven, CT, USA; Department of Psychology, Yale University, New Haven, CT, USA; Child Study Center, Yale University, New Haven, CT, USA.
| |
Collapse
|
38
|
Asghari SZ, Farashi S, Bashirian S, Jenabi E. Distinctive prosodic features of people with autism spectrum disorder: a systematic review and meta-analysis study. Sci Rep 2021; 11:23093. [PMID: 34845298 PMCID: PMC8630064 DOI: 10.1038/s41598-021-02487-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 11/16/2021] [Indexed: 12/26/2022] Open
Abstract
In this systematic review, we analyzed and evaluated the findings of studies on prosodic features of vocal productions of people with autism spectrum disorder (ASD) in order to recognize the statistically significant, most confirmed and reliable prosodic differences distinguishing people with ASD from typically developing individuals. Using suitable keywords, three major databases including Web of Science, PubMed and Scopus, were searched. The results for prosodic features such as mean pitch, pitch range and variability, speech rate, intensity and voice duration were extracted from eligible studies. The pooled standard mean difference between ASD and control groups was extracted or calculated. Using I2 statistic and Cochrane Q-test, between-study heterogeneity was evaluated. Furthermore, publication bias was assessed using funnel plot and its significance was evaluated using Egger's and Begg's tests. Thirty-nine eligible studies were retrieved (including 910 and 850 participants for ASD and control groups, respectively). This systematic review and meta-analysis showed that ASD group members had a significantly larger mean pitch (SMD = - 0.4, 95% CI [- 0.70, - 0.10]), larger pitch range (SMD = - 0.78, 95% CI [- 1.34, - 0.21]), longer voice duration (SMD = - 0.43, 95% CI [- 0.72, - 0.15]), and larger pitch variability (SMD = - 0.46, 95% CI [- 0.84, - 0.08]), compared with typically developing control group. However, no significant differences in pitch standard deviation, voice intensity and speech rate were found between groups. Chronological age of participants and voice elicitation tasks were two sources of between-study heterogeneity. Furthermore, no publication bias was observed during analyses (p > 0.05). Mean pitch, pitch range, pitch variability and voice duration were recognized as the prosodic features reliably distinguishing people with ASD from TD individuals.
Collapse
Affiliation(s)
| | - Sajjad Farashi
- Autism Spectrum Disorders Research Center, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Saeid Bashirian
- Department of Public Health, School of Health, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Ensiyeh Jenabi
- Autism Spectrum Disorders Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
39
|
Xia M, Cao S, Zhou R, Wang JY, Xu TY, Zhou ZK, Qian YM, Jiang H. Acoustic features as novel predictors of difficult laryngoscopy in orthognathic surgery: an observational study. ANNALS OF TRANSLATIONAL MEDICINE 2021; 9:1466. [PMID: 34734018 PMCID: PMC8506731 DOI: 10.21037/atm-21-4359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/07/2021] [Indexed: 01/19/2023]
Abstract
Background The evaluation of the difficult intubation is an important process before anaesthesia. The unanticipated difficult intubation is associated with morbidity and mortality. This study aimed to determine whether acoustic features are valuable as an alternative method to predict difficult laryngoscopy (DL) in patients scheduled to undergo orthognathic surgery. Methods This study included 225 adult patients who were undergoing elective orthognathic surgery under general anaesthesia with tracheal intubation. Preoperatively, clinical airway evaluation was performed, and the acoustic data were collected. Twelve phonemes {[a], [o], [e], [i], [u], [ü], [ci], [qi], [chi], [le], [ke], and [en]} were recorded, and their formants (f1-f4) and bandwidths (bw1-bw4) were extracted. Difficult laryngoscopy was defined as direct laryngoscopy with a Cormack-Lehane grade of 3 or 4. Univariate and multivariate logistic regression analyses were used to examine the associations between acoustic features and DL. Results Difficult laryngoscopy was reported in 59/225 (26.2%) patients. The area under the curve (AUC) of the backward stepwise model including en_f2 [odds ratio (OR), 0.996; 95% confidence interval (CI), 0.994–0.999; P=0.006], ci_bw4 (OR, 0.997; 95% CI, 0.993–1.000; P=0.057), qi_bw4 (OR, 0.996; 95% CI, 0.993–0.999; P=0.017), le_f3 (OR, 0.998; 95% CI, 0.996–1.000; P=0.079), o_bw4 (OR, 1.001; 95% CI, 1.000–1.003; P=0.014), chi_f4 (OR, 1.003; 95% CI, 1.000–1.005; P=0.041), a_bw4 (OR, 0.999; 95% CI, 0.998–1.000; P=0.078) attained a value of 0.761 in the training set, but a value of 0.709 in the testing set. The sensitivity and specificity of the model in the testing set are 86.7% and 63.0%, respectively. Conclusions Acoustic features may be considered as useful predictors of DL during orthognathic surgery.
Collapse
Affiliation(s)
- Ming Xia
- Department of Anaesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shuang Cao
- Department of Anaesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ren Zhou
- Department of Anaesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jia-Yi Wang
- Department of Anaesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tian-Yi Xu
- Department of Anaesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhi-Kai Zhou
- X-LANCE Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yan-Min Qian
- X-LANCE Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Hong Jiang
- Department of Anaesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
40
|
Domestic dogs (Canis lupus familiaris) are sensitive to the correlation between pitch and timbre in human speech. Anim Cogn 2021; 25:545-554. [PMID: 34714438 PMCID: PMC9107418 DOI: 10.1007/s10071-021-01567-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/14/2021] [Accepted: 10/15/2021] [Indexed: 12/01/2022]
Abstract
The perceived pitch of human voices is highly correlated with the fundamental frequency (f0) of the laryngeal source, which is determined largely by the length and mass of the vocal folds. The vocal folds are larger in adult males than in adult females, and men’s voices consequently have a lower pitch than women’s. The length of the supralaryngeal vocal tract (vocal-tract length; VTL) affects the resonant frequencies (formants) of speech which characterize the timbre of the voice. Men’s longer vocal tracts produce lower frequency, and less dispersed, formants than women’s shorter vocal tracts. Pitch and timbre combine to influence the perception of speaker characteristics such as size and age. Together, they can be used to categorize speaker sex with almost perfect accuracy. While it is known that domestic dogs can match a voice to a person of the same sex, there has been no investigation into whether dogs are sensitive to the correlation between pitch and timbre. We recorded a female voice giving three commands (‘Sit’, ‘Lay down’, ‘Come here’), and manipulated the recordings to lower the fundamental frequency (thus lowering pitch), increase simulated VTL (hence affecting timbre), or both (synthesized adult male voice). Dogs responded to the original adult female and synthesized adult male voices equivalently. Their tendency to obey the commands was, however, reduced when either pitch or timbre was manipulated alone. These results suggest that dogs are sensitive to both the pitch and timbre of human voices, and that they learn about the natural covariation of these perceptual attributes.
Collapse
|
41
|
Ge S, Wan Q, Yin M, Wang Y, Huang Z. Quantitative acoustic metrics of vowel production in mandarin-speakers with post-stroke spastic dysarthria. CLINICAL LINGUISTICS & PHONETICS 2021; 35:779-792. [PMID: 32985269 DOI: 10.1080/02699206.2020.1827295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 09/16/2020] [Accepted: 09/19/2020] [Indexed: 06/11/2023]
Abstract
Impairment of vowel production in dysarthria has been highly valued. This study aimed to explore the vowel production of Mandarin-speakers with post-stroke spastic dysarthria in connected speech and to explore the influence of gender and tone on the vowel production. Multiple vowel acoustic metrics, including F1 range, F2 range, vowel space area (VSA), vowel articulation index (VAI) and formant centralization ratio (FCR), were analyzed from vowel tokens embedded in connected speech produced. The participants included 25 clients with spastic dysarthria secondary to stroke (15 males, 10 females) and 25 speakers with no history of neurological disease (15 males, 10 females). Variance analyses were conducted and the results showed that the main effects of population, gender, and tone on F2 range, VSA, VAI, and FCR were all significant. Vowel production became centralized in the clients with post-stroke spastic dysarthria. Vowel production was found to be more centralized in males compared to females. Vowels in neutral tone (T0) were the most centralized among the other tones. The quantitative acoustic metrics of F2 range, VSA, VAI, and FCR were effective in predicting vowel production in Mandarin-speaking clients with post-stroke spastic dysarthria, and hence may be used as powerful tools to assess the speech performance for this population.
Collapse
Affiliation(s)
- Shengnan Ge
- Department of Education and Rehabilitation, Faculty of Education, East China Normal University, Shanghai, China
| | - Qin Wan
- Department of Education and Rehabilitation, Faculty of Education, East China Normal University, Shanghai, China
| | - Minmin Yin
- Department of Education and Rehabilitation, Faculty of Education, East China Normal University, Shanghai, China
| | - Yongli Wang
- Department of Education and Rehabilitation, Faculty of Education, East China Normal University, Shanghai, China
| | - Zhaoming Huang
- Department of Education and Rehabilitation, Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
42
|
Stehr DA, Hickok G, Ferguson SH, Grossman ED. Examining vocal attractiveness through articulatory working space. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1548. [PMID: 34470280 DOI: 10.1121/10.0005730] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 07/04/2021] [Indexed: 06/13/2023]
Abstract
Robust gender differences exist in the acoustic correlates of clearly articulated speech, with females, on average, producing speech that is acoustically and phonetically more distinct than that of males. This study investigates the relationship between several acoustic correlates of clear speech and subjective ratings of vocal attractiveness. Talkers were recorded producing vowels in /bVd/ context and sentences containing the four corner vowels. Multiple measures of working vowel space were computed from continuously sampled formant trajectories and were combined with measures of speech timing known to co-vary with clear articulation. Partial least squares regression (PLS-R) modeling was used to predict ratings of vocal attractiveness for male and female talkers based on the acoustic measures. PLS components that loaded on size and shape measures of working vowel space-including the quadrilateral vowel space area, convex hull area, and bivariate spread of formants-along with measures of speech timing were highly successful at predicting attractiveness in female talkers producing /bVd/ words. These findings are consistent with a number of hypotheses regarding human attractiveness judgments, including the role of sexual dimorphism in mate selection, the significance of traits signalling underlying health, and perceptual fluency accounts of preferences.
Collapse
Affiliation(s)
- Daniel A Stehr
- Department of Cognitive Sciences, University of California Irvine, 3151 Social Sciences Plaza, Irvine, California 92697, USA
| | - Gregory Hickok
- Department of Cognitive Sciences, University of California Irvine, 3151 Social Sciences Plaza, Irvine, California 92697, USA
| | - Sarah Hargus Ferguson
- Department of Communication Sciences and Disorders, University of Utah, 390 South 1530 East, Room 1201, Salt Lake City, Utah 84112, USA
| | - Emily D Grossman
- Department of Cognitive Sciences, University of California Irvine, 3151 Social Sciences Plaza, Irvine, California 92697, USA
| |
Collapse
|
43
|
Leung Y, Oates J, Chan SP, Papp V. Associations Between Speaking Fundamental Frequency, Vowel Formant Frequencies, and Listener Perceptions of Speaker Gender and Vocal Femininity-Masculinity. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2600-2622. [PMID: 34232704 DOI: 10.1044/2021_jslhr-20-00747] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose The aim of the study was to examine associations between speaking fundamental frequency (f os), vowel formant frequencies (F), listener perceptions of speaker gender, and vocal femininity-masculinity. Method An exploratory study was undertaken to examine associations between f os, F 1-F 3, listener perceptions of speaker gender (nominal scale), and vocal femininity-masculinity (visual analog scale). For 379 speakers of Australian English aged 18-60 years, f os mode and F 1-F 3 (12 monophthongs; total of 36 Fs) were analyzed on a standard reading passage. Seventeen listeners rated speaker gender and vocal femininity-masculinity on randomized audio recordings of these speakers. Results Model building using principal component analysis suggested the 36 Fs could be succinctly reduced to seven principal components (PCs). Generalized structural equation modeling (with the seven PCs of F and f os as predictors) suggested that only F 2 and f os predicted listener perceptions of speaker gender (male, female, unable to decide). However, listener perceptions of vocal femininity-masculinity behaved differently and were predicted by F 1, F 3, and the contrast between monophthongs at the extremities of the F 1 acoustic vowel space, in addition to F 2 and f os. Furthermore, listeners' perceptions of speaker gender also influenced ratings of vocal femininity-masculinity substantially. Conclusion Adjusted odds ratios highlighted the substantially larger contribution of F to listener perceptions of speaker gender and vocal femininity-masculinity relative to f os than has previously been reported.
Collapse
Affiliation(s)
- Yeptain Leung
- Discipline of Speech Pathology, Department of Speech Pathology, Orthoptics and Audiology, School of Allied Health, Human Services and Sport, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia
| | - Jennifer Oates
- Discipline of Speech Pathology, Department of Speech Pathology, Orthoptics and Audiology, School of Allied Health, Human Services and Sport, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia
| | - Siew-Pang Chan
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore
- Cardiovascular Research Institute, National University Heart Centre Singapore, National University Health System, Singapore
| | | |
Collapse
|
44
|
Levy ES, Chang YM, Hwang K, McAuliffe MJ. Perceptual and Acoustic Effects of Dual-Focus Speech Treatment in Children With Dysarthria. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2301-2316. [PMID: 33656916 DOI: 10.1044/2020_jslhr-20-00301] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose Children with dysarthria secondary to cerebral palsy may experience reduced speech intelligibility and diminished communicative participation. However, minimal research has been conducted examining the outcomes of behavioral speech treatments in this population. This study examined the effect of Speech Intelligibility Treatment (SIT), a dual-focus speech treatment targeting increased articulatory excursion and vocal intensity, on intelligibility of narrative speech, speech acoustics, and communicative participation in children with dysarthria. Method American English-speaking children with dysarthria (n = 17) received SIT in a 3-week summer camplike setting at Columbia University. SIT follows motor-learning principles to train the child-friendly, dual-focus strategy, "Speak with your big mouth and strong voice." Children produced a story narrative at baseline, immediate posttreatment (POST), and at 6-week follow-up (FUP). Outcomes were examined via blinded listener ratings of ease of understanding (n = 108 adult listeners), acoustic analyses, and questionnaires focused on communicative participation. Results SIT resulted in significant increases in ease of understanding at POST, that were maintained at FUP. There were no significant changes to vocal intensity, speech rate, or vowel spectral characteristics, with the exception of an increase in second formant difference between vowels following SIT. Significantly enhanced communicative participation was evident at POST and FUP. Considerable variability in response to SIT was observed between children. Conclusions Dual-focus treatment shows promise for improving intelligibility and communicative participation in children with dysarthria, although responses to treatment vary considerably across children. Possible mechanisms underlying the intelligibility gains, enhanced communicative participation, and variability in treatment effects are discussed.
Collapse
Affiliation(s)
- Erika S Levy
- Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY
| | - Younghwa M Chang
- Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY
| | - KyungHae Hwang
- Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY
| | - Megan J McAuliffe
- School of Psychology, Speech and Hearing and New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
45
|
Hidalgo-De la Guía I, Garayzábal-Heinze E, Gómez-Vilda P, Martínez-Olalla R, Palacios-Alonso D. Acoustic Analysis of Phonation in Children With Smith-Magenis Syndrome. Front Hum Neurosci 2021; 15:661392. [PMID: 34149380 PMCID: PMC8209519 DOI: 10.3389/fnhum.2021.661392] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
Complex simultaneous neuropsychophysiological mechanisms are responsible for the processing of the information to be transmitted and for the neuromotor planning of the articulatory organs involved in speech. The nature of this set of mechanisms is closely linked to the clinical state of the subject. Thus, for example, in populations with neurodevelopmental deficits, these underlying neuropsychophysiological procedures are deficient and determine their phonation. Most of these cases with neurodevelopmental deficits are due to a genetic abnormality, as is the case in the population with Smith–Magenis syndrome (SMS). SMS is associated with neurodevelopmental deficits, intellectual disability, and a cohort of characteristic phenotypic features, including voice quality, which does not seem to be in line with the gender, age, and complexion of the diagnosed subject. The phonatory profile and speech features in this syndrome are dysphonia, high f0, excess vocal muscle stiffness, fluency alterations, numerous syllabic simplifications, phoneme omissions, and unintelligibility of speech. This exploratory study investigates whether the neuromotor deficits in children with SMS adversely affect phonation as compared to typically developing children without neuromotor deficits, which has not been previously determined. The authors compare the phonatory performance of a group of children with SMS (N = 12) with a healthy control group of children (N = 12) matched in age, gender, and grouped into two age ranges. The first group ranges from 5 to 7 years old, and the second group goes from 8 to 12 years old. Group differences were determined for two forms of acoustic analysis performed on repeated recordings of the sustained vowel /a/ F1 and F2 extraction and cepstral peak prominence (CPP). It is expected that the results will enlighten the question of the underlying neuromotor aspects of phonation in SMS population. These findings could provide evidence of the susceptibility of phonation of speech to neuromotor disturbances, regardless of their origin.
Collapse
Affiliation(s)
| | | | - Pedro Gómez-Vilda
- Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Daniel Palacios-Alonso
- Escuela Técnica Superior de Ingeniería Informática, Universidad Rey Juan Carlos, Madrid, Spain
| |
Collapse
|
46
|
Xiao Y, Wang T, Deng W, Yang L, Zeng B, Lao X, Zhang S, Liu X, Ouyang D, Liao G, Liang Y. Data mining of an acoustic biomarker in tongue cancers and its clinical validation. Cancer Med 2021; 10:3822-3835. [PMID: 33938165 PMCID: PMC8178493 DOI: 10.1002/cam4.3872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Revised: 01/30/2021] [Accepted: 03/14/2021] [Indexed: 11/08/2022] Open
Abstract
The promise of speech disorders as biomarkers in clinical examination has been identified in a broad spectrum of neurodegenerative diseases. However, to the best of our knowledge, a validated acoustic marker with established discriminative and evaluative properties has not yet been developed for oral tongue cancers. Here we cross-sectionally collected a screening dataset that included acoustic parameters extracted from 3 sustained vowels /ɑ/, /i/, /u/ and binary perceptual outcomes from 12 consonant-vowel syllables. We used a support vector machine with linear kernel function within this dataset to identify the formant centralization ratio (FCR) as a dominant predictor of different perceptual outcomes across gender and syllable. The Acoustic analysis, Perceptual evaluation and Quality of Life assessment (APeQoL) was used to validate the FCR in 33 patients with primary resectable oral tongue cancers. Measurements were taken before (pre-op) and four to six weeks after (post-op) surgery. The speech handicap index (SHI), a speech-specific questionnaire, was also administrated at these time points. Pre-op correlation analysis within the APeQoL revealed overall consistency and a strong correlation between FCR and SHI scores. FCRs also increased significantly with increasing T classification pre-operatively, especially for women. Longitudinally, the main effects of T classification, the extent of resection, and their interaction effects with time (pre-op vs. post-op) on FCRs were all significant. For pre-operative FCR, after merging the two datasets, a cut-off value of 0.970 produced an AUC of 0.861 (95% confidence interval: 0.785-0.938) for T3-4 patients. In sum, this study determined that FCR is an acoustic marker with the potential to detect disease and related speech function in oral tongue cancers. These are preliminary findings that need to be replicated in longitudinal studies and/or larger cohorts.
Collapse
Affiliation(s)
- Yudong Xiao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Tao Wang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Wei Deng
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Le Yang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Bin Zeng
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Xiaomei Lao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Sien Zhang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Xiangqi Liu
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Daiqiao Ouyang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Guiqing Liao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Yujie Liang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
47
|
Nguyen DD, McCabe P, Thomas D, Purcell A, Doble M, Novakovic D, Chacon A, Madill C. Acoustic voice characteristics with and without wearing a facemask. Sci Rep 2021; 11:5651. [PMID: 33707509 PMCID: PMC7970997 DOI: 10.1038/s41598-021-85130-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 02/19/2021] [Indexed: 01/31/2023] Open
Abstract
Facemasks are essential for healthcare workers but characteristics of the voice whilst wearing this personal protective equipment are not well understood. In the present study, we compared acoustic voice measures in recordings of sixteen adults producing standardised vocal tasks with and without wearing either a surgical mask or a KN95 mask. Data were analysed for mean spectral levels at 0-1 kHz and 1-8 kHz regions, an energy ratio between 0-1 and 1-8 kHz (LH1000), harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPPS), and vocal intensity. In connected speech there was significant attenuation of mean spectral level at 1-8 kHz region and there was no significant change in this measure at 0-1 kHz. Mean spectral levels of vowel did not change significantly in mask-wearing conditions. LH1000 for connected speech significantly increased whilst wearing either a surgical mask or KN95 mask but no significant change in this measure was found for vowel. HNR was higher in the mask-wearing conditions than the no-mask condition. CPPS and vocal intensity did not change in mask-wearing conditions. These findings implied an attenuation effects of wearing these types of masks on the voice spectra with surgical mask showing less impact than the KN95.
Collapse
Affiliation(s)
- Duy Duong Nguyen
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Patricia McCabe
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Donna Thomas
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Alison Purcell
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Maree Doble
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Daniel Novakovic
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Antonia Chacon
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Catherine Madill
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| |
Collapse
|
48
|
Carl M, Icht M. Acoustic vowel analysis and speech intelligibility in young adult Hebrew speakers: Developmental dysarthria versus typical development. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2021; 56:283-298. [PMID: 33522087 DOI: 10.1111/1460-6984.12598] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 12/08/2020] [Accepted: 12/31/2020] [Indexed: 06/12/2023]
Abstract
BACKGROUND Developmental dysarthria is a motor speech impairment commonly characterized by varying levels of reduced speech intelligibility. The relationship between intelligibility deficits and acoustic vowel space among these individuals has long been noted in the literature, with evidence of vowel centralization (e.g., in English and Mandarin). However, the degree to which this centralization occurs and the intelligibility-acoustic relationship is maintained in different vowel systems has yet to be studied thoroughly. In comparison with American English, the Hebrew vowel system is significantly smaller, with a potentially smaller vowel space area, a factor that may impact upon the comparisons of the acoustic vowel space and its correlation with speech intelligibility. Data on vowel space and speech intelligibility are particularly limited for Hebrew speakers with motor speech disorders. AIMS To determine the nature and degree of vowel space centralization in Hebrew-speaking adolescents and young adults with dysarthria, in comparison with typically developing (TD) peers, and to correlate these findings with speech intelligibility scores. METHODS & PROCEDURES Adolescents and young adults with developmental dysarthria (secondary to cerebral palsy (CP) and other motor deficits, n = 17) and their TD peers (n = 17) were recorded producing Hebrew corner vowels within single words. For intelligibility assessments, naïve listeners transcribed those words produced by speakers with CP, and intelligibility scores were calculated. OUTCOMES & RESULTS Acoustic analysis of vowel formants (F1, F2) revealed a centralization of vowel space among speakers with CP for all acoustic metrics of vowel formants, and mainly for the formant centralization ratio (FCR), in comparison with TD peers. Intelligibility scores were correlated strongly with the FCR metric for speakers with CP. CONCLUSIONS & IMPLICATIONS The main results, vowel space centralization for speakers with CP in comparison with TD peers, echo previous cross-linguistic results. The correlation of acoustic results with speech intelligibility carries clinical implications. Taken together, the results contribute to better characterization of the speech production deficit in Hebrew speakers with motor speech disorders. Furthermore, they may guide clinical decision-making and intervention planning to improve speech intelligibility. What this paper adds What is already known on the subject Speech production and intelligibility deficits among individuals with developmental dysarthria (e.g., secondary to CP) are well documented. These deficits have also been correlated with centralization of the acoustic vowel space, although primarily in English speakers. Little is known about the acoustic characteristics of vowels in Hebrew speakers with motor speech disorders, and whether correlations with speech intelligibility are maintained. What this paper adds to existing knowledge This study is the first to describe the acoustic characteristics of vowel space in Hebrew-speaking adolescents and young adults with developmental dysarthria. The results demonstrate a centralization of the acoustic vowel space in comparison with TD peers for all measures, as found in other languages. Correlation between acoustic measures and speech intelligibility scores were also documented. We discuss these results within the context of cross-linguistic comparisons. What are the potential or actual clinical implications of this work? The results confirm the use of objective acoustic measures in the assessment of individuals with motor speech disorders, providing such data for Hebrew-speaking adolescents and young adults. These measures can be used to determine the nature and severity of the speech deficit across languages, may guide intervention planning, as well as measure the effectiveness of intelligibility-based treatment programmes.
Collapse
|
49
|
Cavalcanti JC, Eriksson A, Barbosa PA. Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison. PLoS One 2021; 16:e0246645. [PMID: 33600430 PMCID: PMC7891727 DOI: 10.1371/journal.pone.0246645] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 01/22/2021] [Indexed: 11/18/2022] Open
Abstract
The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels' acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.
Collapse
Affiliation(s)
- Julio Cesar Cavalcanti
- Department of linguistics, Stockholm University, Stockholm, Sweden
- Institute of Language Studies, Campinas State University, Campinas, Brazil
| | - Anders Eriksson
- Department of linguistics, Stockholm University, Stockholm, Sweden
| | - Plinio A. Barbosa
- Institute of Language Studies, Campinas State University, Campinas, Brazil
| |
Collapse
|
50
|
Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-020-00532-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|