1
|
Schlegel P, Berry DA, Moffatt C, Zhang Z, Chhetri DK. Register transitions in an in vivo canine model as a function of intrinsic laryngeal muscle stimulation, fundamental frequency, and sound pressure level. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2139-2150. [PMID: 38498507 PMCID: PMC10954347 DOI: 10.1121/10.0025135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/09/2024] [Accepted: 02/16/2024] [Indexed: 03/20/2024]
Abstract
Phonatory instabilities and involuntary register transitions can occur during singing. However, little is known regarding the mechanisms which govern such transitions. To investigate this phenomenon, we systematically varied laryngeal muscle activation and airflow in an in vivo canine larynx model during phonation. We calculated voice range profiles showing average nerve activations for all combinations of fundamental frequency (F0) and sound pressure level (SPL). Further, we determined closed-quotient (CQ) and minimum-posterior-area (MPA) based on high-speed video recordings. While different combinations of muscle activation favored different combinations of F0 and SPL, in the investigated larynx there was a consistent region of instability at about 400 Hz which essentially precluded phonation. An explanation for this region may be a larynx specific coupling between sound source and subglottal tract or an effect based purely on larynx morphology. Register transitions crossed this region, with different combinations of cricothyroid and thyroarytenoid muscle (TA) activation stabilizing higher or lower neighboring frequencies. Observed patterns in CQ and MPA dependent on TA activation reproduced patterns found in singers in previous work. Lack of control of TA stimulation may result in phonation instabilities, and enhanced control of TA stimulation may help to avoid involuntary register transitions, especially in the singing voice.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - Clare Moffatt
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - Zhaoyan Zhang
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
2
|
Sol J, Aaen M, Sadolin C, Ten Bosch L. Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier. J Voice 2023:S0892-1997(23)00281-3. [PMID: 37953088 DOI: 10.1016/j.jvoice.2023.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/07/2023] [Indexed: 11/14/2023]
Abstract
Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.
Collapse
Affiliation(s)
- Jeroen Sol
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, the Netherlands
| | - Mathias Aaen
- Research & Development, Complete Vocal Institute, Copenhagen K, Denmark; Nottingham University Hospitals, NHS Trust, Queen's Medical, ENT Department, Nottingham, United Kingdom.
| | - Cathrine Sadolin
- Research & Development, Complete Vocal Institute, Copenhagen K, Denmark
| | - Louis Ten Bosch
- Department of Language and Communication, Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
3
|
Herbst CT, Story BH, Meyer D. Acoustical Theory of Vowel Modification Strategies in Belting. J Voice 2023:S0892-1997(23)00004-8. [PMID: 37080890 DOI: 10.1016/j.jvoice.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 04/22/2023]
Abstract
Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.
Collapse
Affiliation(s)
- Christian T Herbst
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia; Department of Vocal Studies, Mozarteum University, Salzburg, Austria.
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia
| |
Collapse
|
4
|
Herbst CT, Story BH. Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3548. [PMID: 36586864 DOI: 10.1121/10.0014421] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 09/13/2022] [Indexed: 06/17/2023]
Abstract
A well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances ( fR1, fR2) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1, fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.
Collapse
Affiliation(s)
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| |
Collapse
|
5
|
Schlegel P, Berry DA, Chhetri DK. Analysis of vibratory mode changes in symmetric and asymmetric activation of the canine larynx. PLoS One 2022; 17:e0266910. [PMID: 35421159 PMCID: PMC9009716 DOI: 10.1371/journal.pone.0266910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 03/29/2022] [Indexed: 12/02/2022] Open
Abstract
Investigations of neuromuscular control of voice production have primarily focused on the roles of muscle activation levels, posture, and stiffness at phonation onset. However, little work has been done investigating the stability of the phonation process in regards to spontaneous changes in vibratory mode of vocal fold oscillation as a function of neuromuscular activation. We evaluated 320 phonatory conditions representing combinations of superior and recurrent laryngeal nerve (SLN and RLN) activations in an in vivo canine model of phonation. At each combination of neuromuscular input, airflow was increased linearly to reach phonation onset and beyond from 300 to 1400 mL/s. High-speed video and acoustic data were recorded during phonation, and spectrograms and glottal-area-based parameters were calculated. Vibratory mode changes were detected based on sudden increases or drops of local fundamental frequency. Mode changes occurred only when SLNs were concurrently stimulated and were more frequent for higher, less asymmetric RLN stimulation. A slight increase in amplitude and cycle length perturbation usually preceded the changes in the vibratory mode. However, no inherent differences between signals with mode changes and signals without were found.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States of America
- * E-mail:
| | - David A. Berry
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States of America
| | - Dinesh K. Chhetri
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States of America
| |
Collapse
|
6
|
de Souza GVS, Duarte JMT, Viegas F, Simões-Zenari M, Nemr K. An Acoustic Examination of Pitch Variation in Soprano Singing. J Voice 2019; 34:648.e41-648.e49. [PMID: 30717888 DOI: 10.1016/j.jvoice.2018.12.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 12/10/2018] [Accepted: 12/10/2018] [Indexed: 11/30/2022]
Abstract
INTRODUCTION The ability to perform acoustic inspection of data and to correlate the results with perceptual and physiological aspects facilitates vocal behavior analysis. The singing voice has specific characteristics and parameters that are involved during the phonation mechanism, which may be analyzed acoustically. OBJECTIVE To describe and analyze the fundamental frequency and formants in pitch variation in the /a/ vowel in sopranos. METHODS The sample consisted of 30 female participants between the ages of 20 to 45 years without vocal complaints. All sustained vowel sounds were recorded with the /a/ vowel sustained for 5 seconds, with three replications at low (C4-261 Hz), medium (Eb4-622 Hz), and high (Bb4-932 Hz) frequencies that were comfortable for the voice classification. In total, 90 samples were analyzed with digital extraction of the fundamental frequency (f0) and the first five formants (F1, F2, F3, F4, and F5) and manual confirmation. The middle segment was considered for analysis, whereas the onset and offset segments were not considered. Subsequently, FFT (fast Fourier transform) plots, LPC (linear predictive coding) graphs, and tube diagrams were created. The Shapiro-Wilks test was applied for adherence and the Friedman test was applied for comparison of paired samples. RESULTS For vocalizations at low and medium pitches, higher values were observed for the first five formant frequencies than for the f0 value. Overlaying the LPC and FFT graphs revealed a similarity between F1 and F2 at the two pitches, with clustered harmonics in the F3, F4, and F5 region in the low pitch. At the medium pitch, there was similarity between F3 and F4, an F5 peak, and tuned harmonics. However, in the high-pitch vocalizations, there was an increase in the F2, F3, F4, and F5 values in relation to f0, and there was similarity between them along with synchrony between f0 and F1, H2 and F2, H3 and F3, H4 and F4, and H5 and F5. CONCLUSIONS Pitch changes indicate differences in the behavior of the fundamental frequency and sound formants in sopranos. The comparison of the sustained vowels sounds in f0 at the three pitches revealed specific vocal tract changes on the LPC curve and FFT harmonics, with an extra gain range at 261 Hz, synchrony between peaks of formants and harmonics at 622 Hz, and equivalence of f0 and F1 at 932 Hz.
Collapse
Affiliation(s)
| | | | - Flávia Viegas
- Universidade Federal Fluminense, Nova Friburgo, RJ, Brazil
| | | | - Kátia Nemr
- Faculdade de Medicina FMUSP, Universidade de Sao Paulo, São Paulo, SP, Brazil
| |
Collapse
|
7
|
Samlan RA, Story BH. Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1267-83. [PMID: 21498582 PMCID: PMC3184371 DOI: 10.1044/1092-4388(2011/10-0195)] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
PURPOSE To relate vocal fold structure and kinematics to 2 acoustic measures: cepstral peak prominence (CPP) and the amplitude of the first harmonic relative to the second (H1-H2). METHOD The authors used a computational, kinematic model of the medial surfaces of the vocal folds to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: degree of vocal fold adduction, surface bulging, vibratory nodal point, and supraglottal constriction. CPP and H1-H2 were measured from simulated glottal area, glottal flow, and acoustic waveforms and were related to the underlying vocal fold kinematics. RESULTS CPP decreased with increased separation of the vocal processes, whereas the nodal point location had little effect. H1-H2 increased as a function of separation of the vocal processes in the range of 1.0 mm to 1.5 mm and decreased with separation > 1.5 mm. CONCLUSIONS CPP is generally a function of vocal process separation. H1*-H2* (see paragraph 6 of article text for an explanation of the asterisks) will increase or decrease with vocal process separation on the basis of vocal fold shape, pivot point for the rotational mode, and supraglottal vocal tract shape, limiting its utility as an indicator of breathy voice. Future work will relate the perception of breathiness to vocal fold kinematics and acoustic measures.
Collapse
Affiliation(s)
- Robin A Samlan
- Speech Acoustics Laboratory, University of Arizona, Tucson, USA.
| | | |
Collapse
|
8
|
Henrich N, Smith J, Wolfe J. Vocal tract resonances in singing: Strategies used by sopranos, altos, tenors, and baritones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1024-1035. [PMID: 21361458 DOI: 10.1121/1.3518766] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The first two vocal tract resonances (R1 and R2) of 22 classically trained sopranos, altos, tenors, and baritones were measured while they sang four different vowels over their normal pitch range. The resonances of the tract and the harmonics of the voice were measured simultaneously by injecting a broadband acoustic current into the tract at their mouth. Sopranos were found to tune R1 close to the fundamental frequency f(0) (R1:f(0) tuning) over at least part of their upper range for all vowels studied, particularly when f(0) was around or above the value of R1 for speech. Additionally, most sopranos employed R2:2f(0) tuning over some of their range, often simultaneously with R1:f(0) tuning. Altos used R1:f(0) tuning for vowels having lower values of R1 in speech, but can switch to R1:2f(0) tuning in the lower part of their range. Tenors and baritones generally used R1:2f(0) and R1:3f(0) tunings over part of their range and employed a number of different tunings to higher harmonics at lower pitch. These results indicate that singers can repeatedly tune their resonances with precision, and that there can be considerable differences in the resonance strategies used by individual singers, particularly for voices in the lower ranges.
Collapse
Affiliation(s)
- Nathalie Henrich
- Department of Speech and Cognition, GIPSA-lab, UMR5216: CNRS, INPG, University Stendhal, UJF, Grenoble, France
| | | | | |
Collapse
|
9
|
Abstract
This study searched for perceptual, acoustic, and physiological correlates of support in singing. Seven trained professional singers (four women and three men) sang repetitions of the syllable [pa:] at varying pitch and sound levels (1) habitually (with support) and (2) simulating singing without support. Estimate of subglottic pressure was obtained from oral pressure during [p]. Vocal fold vibration was registered with dual-channel electroglottography. Acoustic analyses were made on the recorded samples. All samples were also evaluated by the singers and other listeners, who were trained singers, singing students, and voice specialists without singing education (a total of 63 listeners). We rated both the overall voice quality and the amount of support. According to the results, it seemed impossible to observe any auditory differences between supported singing and good singing voice quality. The acoustic and physiological correlates of good voice quality in absolute values seem to be gender and task dependent, whereas the relative optimum seems to be reached at intermediate parameter values.
Collapse
Affiliation(s)
- A Sonninen
- Department of Communication, University of Jyväskylä, Jyväskylä, Finland.
| | | | | | | |
Collapse
|
10
|
Bergan CC, Titze IR, Story B. The perception of two vocal qualities in a synthesized vocal utterance: ring and pressed voice. J Voice 2004; 18:305-17. [PMID: 15331103 DOI: 10.1016/j.jvoice.2003.09.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2003] [Indexed: 11/29/2022]
Abstract
Two vocal qualities, ring quality and pressed quality, were analyzed perceptually. Listeners were asked to rate (on a scale from 0 to 10) the "amount of ring" in one listening and the "amount of pressedness" in another listening. The stimulus was the synthesized utterance /ya-ya-ya-ya-ya/. In the continuum representation of ring, the skewing quotient and the cross section of the epilaryngeal tube area were systematically varied, independently and by a covariation rule. In the continuum representation of pressed, the flow amplitude and open quotient were similarly varied. Results indicated that the crossover point between ring and no ring occurred with an epilaryngeal area of around 1.0 cm2, and the crossover point between pressed and not pressed quality occurred at an open quotient of about 0.4. Fundamental frequency also had an effect on the perceptions, with a higher fundamental frequency receiving higher ratings of ring and pressed for otherwise the same parameters. Listeners demonstrated highly variable perceptions in both continua with poor intersubject, intrasubject, and intergroup reliability.
Collapse
Affiliation(s)
- Christine C Bergan
- Department of Speech Pathology and Audiology, The University of Iowa, Iowa City, IA 52240, USA.
| | | | | |
Collapse
|
11
|
Henrich N, Sundin G, Ambroise D, d'Alessandro C, Castellengo M, Doval B. Just noticeable differences of open quotient and asymmetry coefficient in singing voice. J Voice 2004; 17:481-94. [PMID: 14740930 DOI: 10.1067/s0892-1997(03)00005-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
This study aims to explore the perceptual relevance of the variations of glottal flow parameters and to what extent a small variation can be detected. Just Noticeable Differences (JNDs) have been measured for three values of open quotient (0.4, 0.6, and 0.8) and two values of asymmetry coefficient (2/3 and 0.8), and the effect of changes of vowel, pitch, vibrato, and amplitude parameters has been tested. Two main groups of subjects have been analyzed: a group of 20 untrained subjects and a group of 10 trained subjects. The results show that the JND for open quotient is highly dependent on the target value: an increase of the JND is noticed when the open quotient target value is increased. The relative JND is constant: deltaOq/Oq = 14% for the untrained and 10% for the trained. In the same way, the JND for asymmetry coefficient is also slightly dependent on the target value--an increase of the asymmetry coefficient value leads to a decrease of the JND. The results show that there is no effect from the selected vowel or frequency (two values have been tested), but that the addition of a vibrato has a small effect on the JND of open quotient. The choice of an amplitude parameter also has a great effect on the JND of open quotient.
Collapse
|
12
|
Titze IR, Bergan CC, Hunter EJ, Story B. Source and filter adjustments affecting the perception of the vocal qualities twang and yawn. LOGOP PHONIATR VOCO 2003; 28:147-55. [PMID: 14686543 DOI: 10.1080/14015430310018874] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Two vocal qualities, twang and yawn, were synthesized and rated perceptually. The stimuli consisted of synthesized vocal productions of a sentence-length utterance 'ya ya ya ya ya,' which had speech-like intonation. In a continuum transformation from normal to twang, the area in the pharynx was gradually decreased, along with vocal tract shortening and a decreased open quotient in the glottal airflow. In a continuum transformation toward yawn, the area in the pharynx was gradually increased, along with vocal tract lengthening and an increased open quotient. The normal (untransformed) vocal tract area was pre-determined by earlier studies involving MRI scans of a human subject's vocal tract. Listeners were asked to rate (on a scale from 1-10) the 'amount of twang' in one listening session and the 'amount of yawn' in another listening session. Overall, the perception of twang increased directly with pharyngeal area narrowing, vocal tract shortening, and decreased open quotient. The perception of yawn increased with pharyngeal area widening, vocal tract lengthening, and increased open quotient. Adjustments of one parameter alone yielded less significant perceptual changes than the above combinations, with open quotient showing the greatest effect in isolation. Listeners demonstrated variable perceptions in both continua with poor inter-subject, intra-subject, and inter-group reliability.
Collapse
Affiliation(s)
- Ingo R Titze
- Department of Speech Pathology and Audiology, National Center for Voice and Speech, The University of Iowa, Iowa City 52242, USA
| | | | | | | |
Collapse
|
13
|
Erickson ML, D'Alfonso AE. A comparison of two methods of formant frequency estimation for high-pitched voices. J Voice 2002; 16:147-71. [PMID: 12150369 DOI: 10.1016/s0892-1997(02)00086-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This study sought to compare formant frequencies estimated from natural phonation to those estimated using two methods of artificial laryngeal stimulation: (1) stimulation of the vocal tract using an artificial larynx placed on the neck and (2) stimulation of the vocal tract using an artificial larynx with an attached tube placed in the oral cavity. Twenty males between the ages of 18 and 45 performed the following three tasks on the vowels /a/ and /i/: (1) 4 seconds of sustained vowel, (2) 2 seconds of sustained vowel followed by 2 seconds of artificial phonation via a neck placement, and (3) 4 seconds of sustained vowel, the last two of which were accompanied by artificial phonation via an oral placement. Frequencies for formants 1-4 were measured for each task at second 1 and second 3 using linear predictive coding. These measures were compared across second 1 and second 3, as well as across all three tasks. Neither of the methods of artificial laryngeal stimulation tested in this study yielded formant frequency estimates that consistently agreed with those obtained from natural phonation for both vowels and all formants. However, when estimating mean formant frequency data for samples of large N, each of the methods agreed with mean estimations obtained from natural phonation for specific vowels and formants. The greatest agreement was found for a neck placement of the artificial larynx on the vowel /a/.
Collapse
Affiliation(s)
- Molly L Erickson
- Department of Audiology and Speech Pathology, University of Tennessee, Knoxville 37996, USA.
| | | |
Collapse
|
14
|
Tom K, Titze IR. Vocal intensity in falsetto phonation of a countertenor: an analysis by synthesis approach. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 110:1667-1676. [PMID: 11572375 DOI: 10.1121/1.1396331] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
An analysis by synthesis paradigm was implemented to model glottal airflow and vocal tract acoustics for the falsetto phonation of a trained countertenor. Changes in vocal intensity were measured as a function of subglottal pressure, open quotient of the time-varying glottal airflow pulse, and formant tuning. The contributions of laryngeal adduction (open quotient of the glottal flow pulse) and of formant tuning to intensity change were derived from modeled data. The findings were: (1) Subglottal pressure accounted for almost 90% of the variation in SPL in falsetto phonation. (2) The open quotient of the glottal flow pulse was remarkably constant in these falsetto phonations, and thus did not affect vocal intensity significantly. (3) Formant tuning occurred in two out of nine possibilities for the vowel /a/. These instances did not support the concept of systematic exploitation of formant tuning.
Collapse
Affiliation(s)
- K Tom
- Department of Speech Communication, California State University Fullerton, 92831, USA
| | | |
Collapse
|
15
|
Bachorowski JA, Smoski MJ, Owren MJ. The acoustic features of human laughter. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 110:1581-1597. [PMID: 11572368 DOI: 10.1121/1.1391244] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Remarkably little is known about the acoustic features of laughter. Here, acoustic outcomes are reported for 1024 naturally produced laugh bouts recorded from 97 young adults as they watched funny video clips. Analyses focused on temporal features, production modes, source- and filter-related effects, and indexical cues to laugher sex and individual identity. Although a number of researchers have previously emphasized stereotypy in laughter, its acoustics were found now to be variable and complex. Among the variety of findings reported, evident diversity in production modes, remarkable variability in fundamental frequency characteristics, and consistent lack of articulation effects in supralaryngeal filtering are of particular interest. In addition, formant-related filtering effects were found to be disproportionately important as acoustic correlates of laugher sex and individual identity. These outcomes are examined in light of existing data concerning laugh acoustics, as well as a number of hypotheses and conjectures previously advanced about this species-typical vocal signal.
Collapse
Affiliation(s)
- J A Bachorowski
- Department of Psychology, Vanderbilt University, Nashville, Tennessee 37203, USA.
| | | | | |
Collapse
|
16
|
Abstract
Pitch and roughness were rated according to the extent of amplitude modulation (AM) and frequency modulation (FM) of a subharmonic [fundamental frequency (F0)/2]. The objective was to determine the identification boundaries for pitch and roughness and to discover how both kinds of modulation affect these boundaries. Another objective was to judge the reliability between subjects when identifying subharmonic-related pitch and roughness. Three procedures were used: ABX comparisons, method of adjustment, and rating of roughness. Results indicated that the crossover point to the lower pitch (associated with the subharmonic) occurred between 10% and 30% modulation, depending on modulation type and F0. Subjects demonstrated highly variable perceptions of pitch and roughness, with poor intersubject reliability.
Collapse
Affiliation(s)
- C C Bergan
- Department of Speech Pathology and Audiology, National Center for Voice and Speech, The University of Iowa, Iowa City 52242, USA
| | | |
Collapse
|
17
|
Tom K, Titze IR, Hoffman EA, Story BH. Three-dimensional vocal tract imaging and formant structure: varying vocal register, pitch, and loudness. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 109:742-747. [PMID: 11248978 DOI: 10.1121/1.1332380] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Although advances in techniques for image acquisition and analysis have facilitated the direct measurement of three-dimensional vocal tract air space shapes associated with specific speech phonemes, little information is available with regard to changes in three-dimensional (3-D) vocal tract shape as a function of vocal register, pitch, and loudness. In this study, 3-D images of the vocal tract during falsetto and chest register phonations at various pitch and loudness conditions were obtained using electron beam computed tomography (EBCT). Detailed measurements and differences in vocal tract configuration and formant characteristics derived from the eight measured vocal tract shapes are reported.
Collapse
Affiliation(s)
- K Tom
- Department of Speech Communication, California State University Fullerton 92831, USA
| | | | | | | |
Collapse
|
18
|
Abstract
Questions exist as to the intelligibility of vowels sung at extremely high fundamental frequencies and, especially, when the fundamental frequency (F0) produced is above the region where the first vowel formant (F1) would normally occur. Can such vowels be correctly identified and, if so, does context provide the necessary information or are acoustical elements also operative? To this end, 18 professional singers (5 males and 13 females) were recorded when singing 3 isolated vowels at high and low pitches at both loud and soft levels. Aural-perceptual studies employing four types of auditors were carried out to determine the identity of these vowels, and the nature of the confusions with other vowels. Subsequent acoustical analysis focused on the actual fundamental frequencies sung plus those defining the first 2 vowel formants. It was found that F0 change had a profound effect on vowel perception; one of the more important observations was that the target tended to shift toward vowels with an F1 just above the sung frequency.
Collapse
Affiliation(s)
- H Hollien
- Institute for Advanced Studies in Communication Processes, University of Florida, Gainesville 32611, USA
| | | | | |
Collapse
|
19
|
Abstract
In singing, F0 sometimes is much higher than the typical frequency value of F1. According to previous investigations, sopranos raise F1 to a frequency near F0 by widening the jaw opening in such cases. In the present study, the jaw opening was measured in 10 professional singers of different categorizations whose task was to sing an ascending two-octave scale on different vowels. Their normal F1 values for these vowels were determined at a low F0. Only for the vowels /a/ and /a/ did the singers widen the jaw opening when F0 approached the F1 value measured at a low pitch. For the other vowels, jaw opening was widened, beginning at a higher F0. It is assumed that for these vowels the singers used other articulatory means to increase F1.
Collapse
Affiliation(s)
- J Sundberg
- Department of Speech, Music, and Hearing, KTH, Stockholm, Sweden
| | | |
Collapse
|
20
|
Fitch WT, Hauser MD. Vocal production in nonhuman primates: Acoustics, physiology, and functional constraints on “honest” advertisement. Am J Primatol 1995; 37:191-219. [DOI: 10.1002/ajp.1350370303] [Citation(s) in RCA: 189] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/1994] [Accepted: 02/07/1995] [Indexed: 11/06/2022]
|