1
|
Heller Murray E. Conducting high-quality and reliable acoustic analysis: A tutorial focused on training research assistants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2603-2611. [PMID: 38629881 PMCID: PMC11026110 DOI: 10.1121/10.0025536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/19/2024]
Abstract
Open science practices have led to an increase in available speech datasets for researchers interested in acoustic analysis. Accurate evaluation of these databases frequently requires manual or semi-automated analysis. The time-intensive nature of these analyses makes them ideally suited for research assistants in laboratories focused on speech and voice production. However, the completion of high-quality, consistent, and reliable analyses requires clear rules and guidelines for all research assistants to follow. This tutorial will provide information on training and mentoring research assistants to complete these analyses, covering areas including RA training, ongoing data analysis monitoring, and documentation needed for reliable and re-creatable findings.
Collapse
Affiliation(s)
- Elizabeth Heller Murray
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
2
|
Ancel EE, Smith ML, Rao VNV, Munson B. Relating Acoustic Measures to Listener Ratings of Children's Productions of Word-Initial /ɹ/ and /w/. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3413-3427. [PMID: 37591234 PMCID: PMC10558147 DOI: 10.1044/2023_jslhr-22-00713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 04/06/2023] [Accepted: 05/17/2023] [Indexed: 08/19/2023]
Abstract
PURPOSE The /ɹ/ productions of young children acquiring American English are highly variable and often inaccurate, with [w] as the most common substitution error. One acoustic indicator of the goodness of children's /ɹ/ productions is the difference between the frequency of the second formant (F2) and the third formant (F3), with a smaller F3-F2 difference being associated with a perceptually more adultlike /ɹ/. This study analyzed the effectiveness of automatically extracted F3-F2 differences in characterizing young children's productions of /ɹ/-/w/ in comparison with manually coded measurements. METHOD Automated F3-F2 differences were extracted from productions of a variety of different /ɹ/- and /w/-initial words spoken by 3- to 4-year-old monolingual preschoolers (N = 117; 2,278 tokens in total). These automated measures were compared to ratings of the phoneme goodness of children's productions as rated by untrained adult listeners (n = 132) on a visual analog scale, as well as to narrow transcriptions of the production into four categories: [ɹ], [w], and two intermediate categories. RESULTS Data visualizations show a weak relationship between automated F3-F2 differences with listener ratings and narrow transcriptions. Mixed-effects models suggest the automated F3-F2 difference only modestly predicts listener ratings (R 2 = .37) and narrow transcriptions (R 2 = .32). CONCLUSION The weak relationship between automated F3-F2 difference and both listener ratings and narrow transcriptions suggests that these automated acoustic measures are of questionable reliability and utility in assessing preschool children's mastery of the /ɹ/-/w/ contrast.
Collapse
Affiliation(s)
- Elizabeth E. Ancel
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis
| | - Michael L. Smith
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis
| | - V. N. Vimal Rao
- Department of Educational Psychology, University of Minnesota, Twin Cities, Minneapolis
| | - Benjamin Munson
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis
| |
Collapse
|
3
|
Alku P, Kadiri SR, Gowda D. Refining a deep learning-based formant tracker using linear prediction methods. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2023.101515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
4
|
Herbst CT, Story BH, Meyer D. Acoustical Theory of Vowel Modification Strategies in Belting. J Voice 2023:S0892-1997(23)00004-8. [PMID: 37080890 DOI: 10.1016/j.jvoice.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 04/22/2023]
Abstract
Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.
Collapse
Affiliation(s)
- Christian T Herbst
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia; Department of Vocal Studies, Mozarteum University, Salzburg, Austria.
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia
| |
Collapse
|
5
|
Whalen DH, Chen WR, Shadle CH, Fulop SA. Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:933. [PMID: 36050157 PMCID: PMC9374483 DOI: 10.1121/10.0013410] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.
Collapse
Affiliation(s)
- D H Whalen
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | - Wei-Rong Chen
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | | | - Sean A Fulop
- Department of Linguistics, California State University Fresno, Fresno, California 93740, USA
| |
Collapse
|
6
|
Torres Borda L, Jadoul Y, Rasilo H, Salazar Casals A, Ravignani A. Vocal plasticity in harbour seal pups. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200456. [PMID: 34719248 PMCID: PMC8558775 DOI: 10.1098/rstb.2020.0456] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/28/2021] [Indexed: 12/22/2022] Open
Abstract
Vocal plasticity can occur in response to environmental and biological factors, including conspecifics' vocalizations and noise. Pinnipeds are one of the few mammalian groups capable of vocal learning, and are therefore relevant to understanding the evolution of vocal plasticity in humans and other animals. Here, we investigate the vocal plasticity of harbour seals (Phoca vitulina), a species with vocal learning abilities observed in adulthood but not puppyhood. To evaluate early mammalian vocal development, we tested 1-3 weeks-old seal pups. We tailored noise playbacks to this species and age to induce seal pups to shift their fundamental frequency (f0), rather than adapt call amplitude or temporal characteristics. We exposed individual pups to low- and high-intensity bandpass-filtered noise, which spanned-and masked-their typical range of f0; simultaneously, we recorded pups' spontaneous calls. Unlike most mammals, pups modified their vocalizations by lowering their f0 in response to increased noise. This modulation was precise and adapted to the particular experimental manipulation of the noise condition. In addition, higher levels of noise induced less dispersion around the mean f0, suggesting that pups may have actively focused their phonatory efforts to target lower frequencies. Noise did not seem to affect call amplitude. However, one seal showed two characteristics of the Lombard effect known for human speech in noise: significant increase in call amplitude and flattening of spectral tilt. Our relatively low noise levels may have favoured f0 modulation while inhibiting amplitude adjustments. This lowering of f0 is unusual, as most animals commonly display no such f0 shift. Our data represent a relatively rare case in mammalian neonates, and have implications for the evolution of vocal plasticity and vocal learning across species, including humans. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Collapse
Affiliation(s)
- Laura Torres Borda
- Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, Wundtlaan 1, 6525 XD Nijmegen, The Netherlands
- Research Department, Sealcentre Pieterburen, Hoofdstraat 94-A, 9968 AG Pieterburen, The Netherlands
| | - Yannick Jadoul
- Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, Wundtlaan 1, 6525 XD Nijmegen, The Netherlands
- Artificial Intelligence Lab, Vrije Universiteit Brussel, 1050 Elsene/Ixelles, Belgium
| | - Heikki Rasilo
- Artificial Intelligence Lab, Vrije Universiteit Brussel, 1050 Elsene/Ixelles, Belgium
| | - Anna Salazar Casals
- Research Department, Sealcentre Pieterburen, Hoofdstraat 94-A, 9968 AG Pieterburen, The Netherlands
| | - Andrea Ravignani
- Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, Wundtlaan 1, 6525 XD Nijmegen, The Netherlands
- Research Department, Sealcentre Pieterburen, Hoofdstraat 94-A, 9968 AG Pieterburen, The Netherlands
| |
Collapse
|
7
|
Munson B, Logerquist MK, Kim H, Martell A, Edwards J. Does Early Phonetic Differentiation Predict Later Phonetic Development? Evidence From a Longitudinal Study of /ɹ/ Development in Preschool Children. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2417-2437. [PMID: 34057848 PMCID: PMC8632502 DOI: 10.1044/2021_jslhr-20-00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/06/2021] [Accepted: 02/24/2021] [Indexed: 06/12/2023]
Abstract
Purpose We evaluated whether children whose inaccurate /ɹ/ productions showed evidence phonetic differentiation with /w/ at 3.5-4.5 years of age improved in /ɹ/ production over the next year more than children whose inaccurate productions did not show evidence of such differentiation. We also examined whether speech perception, inhibitory control, and vocabulary size predicted growth in /ɹ/. Method A set of typically developing, monolingual English-speaking preschool children (n = 136) produced tokens of /ɹ/- and /w/-initial words at two time points (TPs), at which they were 39-52 and 51-65 months old. Children's productions of /ɹ/ and /w/ were narrowly phonetically transcribed. Children's productions at the earlier time point were rated by naïve listeners using a visual analog scale measure of phoneme goodness; these ratings were used to assess the degree of phonetic differentiation between /ɹ/ and /w/. Results Accuracy for both phonemes varied considerably at both TPs. The growth in accuracy of /ɹ/ between the two TPs was not predicted by any individual-differences measures, nor by the degree of differentiation between /ɹ/ and /w/at the earlier time point. Conclusion Low vocabulary size, low inhibitory control, poor speech perception, and the absence of early phonetic differentiation are not necessarily limiting factors in predicting /ɹ/ growth in individual children in the age range we studied.
Collapse
Affiliation(s)
- Benjamin Munson
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis
| | - Mara K. Logerquist
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis
| | - Hyuna Kim
- Department of Communication Sciences and Disorders, University of Wisconsin–Madison
| | - Alisha Martell
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis
| | - Jan Edwards
- Department of Communication Sciences and Disorders, University of Wisconsin–Madison
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| |
Collapse
|
8
|
Lynn E, Narayanan SS, Lammert AC. Dark tone quality and vocal tract shaping in soprano song production: Insights from real-time MRI. JASA EXPRESS LETTERS 2021; 1:075202. [PMID: 34291230 PMCID: PMC8273971 DOI: 10.1121/10.0005109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
Tone quality termed "dark" is an aesthetically important property of Western classical voice performance and has been associated with lowered formant frequencies, lowered larynx, and widened pharynx. The present study uses real-time magnetic resonance imaging with synchronous audio recordings to investigate dark tone quality in four professionally trained sopranos with enhanced ecological validity and a relatively complete view of the vocal tract. Findings differ from traditional accounts, indicating that labial narrowing may be the primary driver of dark tone quality across performers, while many other aspects of vocal tract shaping are shown to differ significantly in a performer-specific way.
Collapse
Affiliation(s)
- Elisabeth Lynn
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01690, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, California 95616, USA , ,
| | - Adam C Lammert
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01690, USA
| |
Collapse
|
9
|
Echternach M, Herbst CT, Köberlein M, Story B, Döllinger M, Gellrich D. Are source-filter interactions detectable in classical singing during vowel glides? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:4565. [PMID: 34241428 DOI: 10.1121/10.0005432] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 06/03/2021] [Indexed: 06/13/2023]
Abstract
In recent studies, it has been assumed that vocal tract formants (Fn) and the voice source could interact. However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/-/a/-/u/-/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4 [fundamental frequency (ƒo) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy (20.000 fps), electroglottography (EGG), and audio recordings. Fn data were calculated using a cepstral method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of major intersections of Fn/nƒo and (b) perceptual assessment of the EGG signal. Although the open quotient showed some increase for the /i-a/ and /u-i/ transitions, there were no clear effects at the expected Fn/nƒo intersections. In contrast, ƒo adjustments and changes in the phonovibrogram occurred at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by intersections between higher nƒo and Fn. The presented data partially corroborates that vowel transitions may result in level-two interactions also in professional singers. However, the lack of systematically detectable effects suggests either the absence of a strong interaction or existence of confounding factors, which may potentially counterbalance the level-two-interactions.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| | - Christian T Herbst
- Antonio Salieri Department of Vocal Studies and Vocal Research in Music Education, University of Music and Performing Arts Vienna, Vienna, Austria
| | - Marie Köberlein
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School Waldstrasse 1, Erlangen, 91054, Germany
| | - Donata Gellrich
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| |
Collapse
|
10
|
Kent RD, Rountrey C. What Acoustic Studies Tell Us About Vowels in Developing and Disordered Speech. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2020; 29:1749-1778. [PMID: 32631070 PMCID: PMC7893529 DOI: 10.1044/2020_ajslp-19-00178] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 11/04/2019] [Accepted: 04/19/2020] [Indexed: 05/05/2023]
Abstract
Purpose Literature was reviewed on the development of vowels in children's speech and on vowel disorders in children and adults, with an emphasis on studies using acoustic methods. Method Searches were conducted with PubMed/MEDLINE, Google Scholar, CINAHL, HighWire Press, and legacy sources in retrieved articles. The primary search items included, but were not limited to, vowels, vowel development, vowel disorders, vowel formants, vowel therapy, vowel inherent spectral change, speech rhythm, and prosody. Results/Discussion The main conclusions reached in this review are that vowels are (a) important to speech intelligibility; (b) intrinsically dynamic; (c) refined in both perceptual and productive aspects beyond the age typically given for their phonetic mastery; (d) produced to compensate for articulatory and auditory perturbations; (e) influenced by language and dialect even in early childhood; (f) affected by a variety of speech, language, and hearing disorders in children and adults; (g) inadequately assessed by standardized articulation tests; and (h) characterized by at least three factors-articulatory configuration, extrinsic and intrinsic regulation of duration, and role in speech rhythm and prosody. Also discussed are stages in typical vowel ontogeny, acoustic characterization of rhotic vowels, a sensory-motor perspective on vowel production, and implications for clinical assessment of vowels.
Collapse
Affiliation(s)
- Ray D. Kent
- Waisman Center, University of Wisconsin–Madison
| | - Carrie Rountrey
- Department of Communication Sciences and Disorders, University of Cincinnati, OH
| |
Collapse
|
11
|
Terband H, Namasivayam A, Maas E, van Brenk F, Mailend ML, Diepeveen S, van Lieshout P, Maassen B. Assessment of Childhood Apraxia of Speech: A Review/Tutorial of Objective Measurement Techniques. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:2999-3032. [PMID: 31465704 DOI: 10.1044/2019_jslhr-s-csmc7-19-0214] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Background With respect to the clinical criteria for diagnosing childhood apraxia of speech (commonly defined as a disorder of speech motor planning and/or programming), research has made important progress in recent years. Three segmental and suprasegmental speech characteristics-error inconsistency, lengthened and disrupted coarticulation, and inappropriate prosody-have gained wide acceptance in the literature for purposes of participant selection. However, little research has sought to empirically test the diagnostic validity of these features. One major obstacle to such empirical study is the fact that none of these features is stated in operationalized terms. Purpose This tutorial provides a structured overview of perceptual, acoustic, and articulatory measurement procedures that have been used or could be used to operationalize and assess these 3 core characteristics. Methodological details are reviewed for each procedure, along with a short overview of research results reported in the literature. Conclusion The 3 types of measurement procedures should be seen as complementary. Some characteristics are better suited to be described at the perceptual level (especially phonemic errors and prosody), others at the acoustic level (especially phonetic distortions, coarticulation, and prosody), and still others at the kinematic level (especially coarticulation, stability, and gestural coordination). The type of data collected determines, to a large extent, the interpretation that can be given regarding the underlying deficit. Comprehensive studies are needed that include more than 1 diagnostic feature and more than 1 type of measurement procedure.
Collapse
Affiliation(s)
- Hayo Terband
- Utrecht Institute of Linguistics-OTS, Utrecht University, the Netherlands
| | - Aravind Namasivayam
- Oral Dynamics Laboratory, Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
| | - Edwin Maas
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, PA
| | - Frits van Brenk
- Department of Communicative Disorders and Sciences, University at Buffalo, NY
| | - Marja-Liisa Mailend
- Moss Rehabilitation Research Institute, Moss Rehabilitation Hospital, Elkins Park, PA
| | - Sanne Diepeveen
- HAN University of Applied Sciences, Nijmegen, the Netherlands
- Department of Rehabilitation, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Pascal van Lieshout
- Oral Dynamics Laboratory, Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
| | - Ben Maassen
- Center for Language and Cognition, Research School of Behavioral and Cognitive Neurosciences, University of Groningen, The Netherlands
| |
Collapse
|
12
|
Patel RR, Lulich SM, Verdi A. Vocal tract shape and acoustic adjustments of children during phonation into narrow flow-resistant tubes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:352. [PMID: 31370566 DOI: 10.1121/1.5116681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 06/25/2019] [Indexed: 06/10/2023]
Abstract
The goal of the study is to quantify the salient vocal tract acoustic, subglottal acoustic, and vocal tract physiological characteristics during phonation into a narrow flow-resistant tube with 2.53 mm inner diameter and 124 mm length in typically developing vocally healthy children using simultaneous microphone, accelerometer, and 3D/4D ultrasound recordings. Acoustic measurements included fundamental frequency (fo), first formant frequency (F1), second formant frequency (F2), first subglottal resonance (FSg1), and peak-to-peak amplitude ratio (Pvt:Psg). Physiological measurements included posterior tongue height (D1), tongue dorsum height (D2), tongue tip height (D3), tongue length (D4), oral cavity width (D5), hyoid elevation (D6), pharynx width (D7). All measurements were made on eight boys and ten girls (6-9 years) during sustained /o:/ production at typical pitch and loudness, with and without flow-resistant tube. Phonation with the flow-resistant tube resulted in a significant decrease in F1, F2, and Pvt:Psg and a significant increase in D2, D3, and FSg1. A statistically significant gender effect was observed for D1, with D1 higher in boys. These findings agree well with reported findings from adults, suggesting common acoustic and articulatory mechanisms for narrow flow-resistant tube phonation. Theoretical implications of the findings are discussed.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405-7002, USA
| | - Steven M Lulich
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405-7002, USA
| | - Alessandra Verdi
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405-7002, USA
| |
Collapse
|
13
|
Plummer AR, Reidy PF. Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses. JOURNAL OF PHONETICS 2018; 71:355-375. [PMID: 31439969 PMCID: PMC6706093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Low-dimensional representations of speech data, such as formant values extracted by linear predictive coding analysis or spectral moments computed from whole spectra viewed as probability distributions, have been instrumental in both phonetic and phonological analyses over the last few decades. In this paper, we present a framework for computing low-dimensional representations of speech data based on two assumptions: that speech data represented in high-dimensional data spaces lie on shapes called manifolds that can be used to map speech data to low-dimensional coordinate spaces, and that manifolds underlying speech data are generated from a combination of language-specific lexical, phonological, and phonetic information as well as culture-specific socio-indexical information that is expressed by talkers of a given speech community. We demonstrate the basic mechanics of the framework by carrying out an analysis of children's productions of sibilant fricatives relative to those of adults in their speech community using the phoneigen package - a publicly available implementation of the framework. We focus the demonstration on enumerating the steps for constructing manifolds from data and then using them to map the data to a low-dimensional space, explicating how manifold structure affects the learned low-dimensional representations, and comparing the use of these representations against standard acoustic features in a phonetic analysis. We conclude with a discussion of the framework's underlying assumptions, its broader modeling potential, and its position relative to recent advances in the field of representation learning.
Collapse
Affiliation(s)
| | - Patrick F. Reidy
- Callier Center for Communication Disorders, University of Texas at Dallas, Dallas, Texas, USA
| |
Collapse
|
14
|
Kent RD, Vorperian HK. Static measurements of vowel formant frequencies and bandwidths: A review. JOURNAL OF COMMUNICATION DISORDERS 2018; 74:74-97. [PMID: 29891085 PMCID: PMC6002811 DOI: 10.1016/j.jcomdis.2018.05.004] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/23/2018] [Accepted: 05/27/2018] [Indexed: 05/05/2023]
Abstract
PURPOSE Data on vowel formants have been derived primarily from static measures representing an assumed steady state. This review summarizes data on formant frequencies and bandwidths for American English and also addresses (a) sources of variability (focusing on speech sample and time sampling point), and (b) methods of data reduction such as vowel area and dispersion. METHOD Searches were conducted with CINAHL, Google Scholar, MEDLINE/PubMed, SCOPUS, and other online sources including legacy articles and references. The primary search items were vowels, vowel space area, vowel dispersion, formants, formant frequency, and formant bandwidth. RESULTS Data on formant frequencies and bandwidths are available for both sexes over the lifespan, but considerable variability in results across studies affects even features of the basic vowel quadrilateral. Origins of variability likely include differences in speech sample and time sampling point. The data reveal the emergence of sex differences by 4 years of age, maturational reductions in formant bandwidth, and decreased formant frequencies with advancing age in some persons. It appears that a combination of methods of data reduction provide for optimal data interpretation. CONCLUSION The lifespan database on vowel formants shows considerable variability within specific age-sex groups, pointing to the need for standardized procedures.
Collapse
Affiliation(s)
- Raymond D Kent
- Waisman Center, University of Wisconsin-Madison, United States.
| | | |
Collapse
|
15
|
Jacewicz E, Fox RA. Regional Variation in Fundamental Frequency of American English Vowels. PHONETICA 2018; 75:273-309. [PMID: 29649804 DOI: 10.1159/000484610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 10/04/2017] [Indexed: 06/08/2023]
Abstract
We examined whether the fundamental frequency (f0) of vowels is influenced by regional variation, aiming to (1) establish how the relationship between vowel height and f0 ("intrinsic f0") is utilized in regional vowel systems and (2) determine whether regional varieties differ in their implementation of the effects of phonetic context on f0 variations. An extended set of acoustic measures explored f0 in vowels in isolated tokens (experiment 1) and in connected speech (experiment 2) from 36 women representing 3 different varieties of American English. Regional differences were found in f0 shape in isolated tokens, in the magnitude of intrinsic f0 difference between high and low vowels, in the nature of f0 contours in stressed vowels, and in the completion of f0 contours in the context of coda voicing. Regional varieties utilize f0 control in vowels in different ways, including regional f0 ranges and variation in f0 shape.
Collapse
|
16
|
Abstract
OBJECTIVE The goal of the Arizona Child Acoustic Database project was to obtain a large set of acoustic recordings, primarily vowels, collected from a cohort of children over a critical period of growth and development. METHOD Data was recorded longitudinally from 63 children between the ages of 2;0 and 7;0 at 3-month intervals. The protocol included individual American English vowels and diphthongs, nonsense multi-vowel transitions, word level multi-vowel sequences (e.g., Hawaii), single-syllable words targeting each American English vowel, short sentences, and conversation. RESULTS Acoustic files are available for download through the University of Arizona Library Repository for use in future research projects. CONCLUSION Longitudinal recordings may be of interest because they allow tracking of acoustic characteristics produced by an individual child during a period of rapid growth and speech development.
Collapse
Affiliation(s)
- Kate Bunton
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, AZ, USA
| | | |
Collapse
|