1
|
Fujiki RB, Venkatraman A, Heller Murray ES. The Pediatric Vocal Mechanism: Structure and Function. J Voice 2025:S0892-1997(25)00118-3. [PMID: 40187973 DOI: 10.1016/j.jvoice.2025.03.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 03/11/2025] [Accepted: 03/12/2025] [Indexed: 04/07/2025]
Abstract
The pediatric vocal mechanism is dynamic and complex. Effective treatment of voice disorders in children requires accurate knowledge of typical laryngeal structure and function. This is particularly crucial given that the pediatric vocal mechanism differs in structure and function from that found in adults. Yet, it can be difficult to find data specific to pediatric voices. This article describes three aspects of vocal function across childhood: 1) laryngeal anatomical structure, 2) quantitative voice measures across childhood, and 3) voice disorder risk across childhood. The influence of dysphonia on quality of life across childhood is also considered. The goal of this review is to enhance the diagnosis of pediatric voice disorders as well as to highlight areas for future inquiry.
Collapse
Affiliation(s)
- Robert Brinton Fujiki
- Department of Otolaryngology - Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, IN.
| | | | - Elizabeth S Heller Murray
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, PA
| |
Collapse
|
2
|
Littlejohn M, Harvey Woodnorth G, Hseu A, Nuss R, Heller Murray E. Voiced-Voiceless Consonant Distinction in Children With Vocal Fold Nodules: A Preliminary Study. J Voice 2025:S0892-1997(25)00092-X. [PMID: 40157856 DOI: 10.1016/j.jvoice.2025.02.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 02/25/2025] [Accepted: 02/26/2025] [Indexed: 04/01/2025]
Abstract
OBJECTIVES/HYPOTHESIS The relationship between articulatory and vocal development is not well-understood in children with vocal fold nodules (VFN). Because these children have differences in their vocal system at a time when vocal-articulatory control is developing, it is important to understand this relationship. This study examined relationships between voiced and voiceless voice onset time (VOT) measures and cepstral peak prominence (CPP) in children with VFN (3-7 years old). STUDY DESIGN Retrospective. METHODS Acoustic data were gathered from a retrospective dataset from children with VFN ages 3-5, 5-6, and 6-7 years old using words with initial /b/ and /p/ consonants. Correlations were completed for each age group and phoneme combination to examine the relationships between CPP and VOT percent overshoot, accuracy, range, variability, and discreteness. Additionally, Wilcoxon Signed-Rank tests were completed to facilitate interpretation of the data. RESULTS No relationship was found between CPP and VOT overshoot, accuracy, range, or variability. Significant findings indicated that children ages 5-6 and 6-7 with more dysphonia had less discreteness between /p/ and /b/. Wilcoxon Signed-Rank tests indicated significantly less discreteness for the 5-6-year-old group. CONCLUSIONS Findings suggest that children with VFN and increased dysphonia may demonstrate decreased motor control, as evidenced by the relationship between CPP and discreteness. Future research can build on these findings by using a sample with more children, prospectively designed tokens, and a control group without VFN.
Collapse
Affiliation(s)
- Meghan Littlejohn
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, PA
| | | | - Anne Hseu
- Otolaryngology and Communication Enhancement, Boston Children's Hospital, Boston, MA
| | - Roger Nuss
- Otolaryngology and Communication Enhancement, Boston Children's Hospital, Boston, MA
| | | |
Collapse
|
3
|
Ibarra EJ, Galindo GE, Alzamendi GA, Cortes JP, Castro C, Manriquez R, Testart A, Zanartu M. Empirical Distribution of Glottal Edges (EDGE): A Statistical Assessment of Vocal Fold Kinematics Using High-Speed Videoendoscopy. IEEE J Biomed Health Inform 2025; 29:1087-1100. [PMID: 39288042 DOI: 10.1109/jbhi.2024.3462632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Although laryngeal high-speed videoendoscopy (HSV) is crucial for studying vocal fold vibrations, its translation to clinical practice has been hindered by the large volume of data it produces and the difficulty in interpreting current analysis methods. Although image processing techniques have been developed to map spatial-temporal data into two-dimensional representations, they alter the geometrical construction of the glottis and do not provide standard quantitative features, thus challenging clinical interpretation. In response, we propose a new visualization and analysis framework for assessing the dynamics of vocal folds based on the empirical distribution of the glottal edge using HSV. This procedure analyzes vocal fold oscillations by preserving the shape of the glottis and quantifying the asymmetry between right and left vocal fold displacements along the anterior-posterior axis. This method was evaluated on four groups of participants: ten with normal voices, ten with vocal fold nodules, ten with muscle tension dysphonia, and two with unilateral vocal fold paralysis. The proposed method produces distinct representations for normal and pathological vocal fold vibratory behaviors and derived features based on amplitude and phase asymmetry metrics that show statistically significant differences between normal and pathological groups. Comparative analysis with state-of-the-art techniques indicates that our proposed method can complement the assessment of vocal fold vibration and enhance the clinical translation of HSV.
Collapse
|
4
|
Patel RR, Döllinger M, Semmler M. 3D reconstruction of vocal fold dynamics with laser high-speed videoendoscopy in children. Laryngoscope Investig Otolaryngol 2024; 9:e70024. [PMID: 39445174 PMCID: PMC11497175 DOI: 10.1002/lio2.70024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 09/01/2024] [Accepted: 10/05/2024] [Indexed: 10/25/2024] Open
Abstract
Objective The objective of this study is to evaluate three-dimensional vertical motion of the superior surface of the vocal folds in vivo in (a) typically developing children as a function of vocal frequency variations and (b) a child with vocal nodules. Methods A custom developed laser endoscope coupled with high-speed videoendoscopy was used to obtain 3D parameters from 2 healthy children, one child with vocal nodules, and 23 vocally healthy adults (females = 11, males = 12). Parameters of amplitude (mm), maximum opening/closing velocity (mm/s), and mean opening/closing velocity (mm/s) were computed for the lateral and vertical vibratory motion along the anterior, middle, and posterior sections of the vocal folds were computed. Results We provide for the first time, absolute measurements of vertical amplitude and maximum/ mean velocity during the opening and closing phases, in vivo in children. Overall, the vertical motion was larger in vocally normal children compared with the lateral motion, especially along the visible posterior section of the vocal folds and during low pitch phonation. The opening phase dynamics were consistently large along the posterior section in the child with vocal nodules. Conclusions The study findings establish the feasibility of capturing 3D motion in a clinical setting and provide proof of concept for the application of the proposed 3D laser in the pediatric population. Future large sample size studies are needed to establish the diagnostic potential of examining the closing phase vertical motion to evaluate vibratory development in children with normal voice and investigating the opening phase vertical motion in children with nodules. Level of Evidence N/A.
Collapse
Affiliation(s)
- Rita R. Patel
- Department of Otolaryngology Head and Neck SurgeryIndiana UniversityIndianapolisIndianaUSA
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck SurgeryUniversity Hospital Erlangen, Friedrich‐Alexander‐Universität Erlangen‐NürnbergErlangenGermany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck SurgeryUniversity Hospital Erlangen, Friedrich‐Alexander‐Universität Erlangen‐NürnbergErlangenGermany
| |
Collapse
|
5
|
Yamauchi A, Imagawa H, Yokonishi H, Sakakibara KI, Tayama N. Multivariate Analysis of Vocal Fold Vibrations in Normal Speakers Using High-Speed Digital Imaging. J Voice 2024; 38:10-17. [PMID: 34470706 DOI: 10.1016/j.jvoice.2021.08.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/30/2021] [Accepted: 08/02/2021] [Indexed: 11/18/2022]
Abstract
INTRODUCTION Little is known about the normal variations in vocal fold vibrations. We conducted a prospective study on normal subjects using high-speed digital imaging (HSDI) to elucidate key parameters regarding age/gender-related normal variations. METHODS Forty-six healthy adult volunteers were divided into young (aged ≤35 years) male, young female, elderly (aged ≥65 years) male, and elderly female subgroups. HSDI data of sustained phonation of /i/ at a comfortable pitch and loudness were obtained, and vibratory parameters were calculated using the visual-perceptual rating, laryngotopography, digital kymography, and glottal area waveform. Multivariate analysis was then performed on these parameters to clarify the subgroup-specific key parameters. RESULTS Four key parameters were identified from a total of 83: one from visual perceptual rating and three from laryngotopography. Subgroup analyses showed that posterior-to-anterior longitudinal phase difference (PD) and high fundamental frequency (F0) were specific to young female participants. A low F0 was specific to young male participants. Large anterior-to-posterior longitudinal PD and its left-right difference were specific to elderly male participants. There were no key parameters for elderly female participants. CONCLUSIONS Methods that can assess F0 and longitudinal PD, such as visual-perceptual rating and laryngotopography, were effective in the evaluation of normal vocal fold vibrations and their variations.
Collapse
Affiliation(s)
- Akihito Yamauchi
- Department of Otolaryngology, The University of Tokyo Hospital, Bunkyo-Ku, Tokyo, Japan.
| | - Hiroshi Imagawa
- Department of Otolaryngology, The University of Tokyo Hospital, Bunkyo-Ku, Tokyo, Japan
| | - Hisayuki Yokonishi
- Department of Otolaryngology, Tokyo Metropolitan Bokutoh Hospital, Sumida-Ku, Tokyo, Japan
| | - Ken-Ichi Sakakibara
- Department of Communication Disorders, Health Sciences University of Hokkaido, Ishikari-Gun, Hokkaido, Japan
| | - Niro Tayama
- Department of Otolaryngology and Tracheo-esophagology, National Center for Global Health and Medicine, Shinjuku-Ku, Tokyo, Japan
| |
Collapse
|
6
|
Semmler M, Kniesburges S, Pelka F, Ensthaler M, Wendler O, Schützenberger A. Influence of Reduced Saliva Production on Phonation in Patients With Ectodermal Dysplasia. J Voice 2023; 37:913-923. [PMID: 34353685 DOI: 10.1016/j.jvoice.2021.06.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 05/28/2021] [Accepted: 06/02/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVE Patients with ectodermal dysplasia (ED) suffer from an inherited disorder in the development of the ectodermal structures. Besides the main symptoms, i.e. significantly reduced formation/expression of teeth, hair and sweat glands, a decreased saliva production is objectively accounted. In addition to difficulties with chewing/swallowing, ED patients frequently report on the subjective impression of rough and hoarse voices. A correlation between the reduced production of saliva and an affliction of the voice has not yet been investigated objectively for this rare disease. METHODS Following an established measurement protocol, a study has been conducted on 31 patients with ED and 47 controls (no ED, healthy voice). Additionally, the vocal fold oscillations were recorded by high-speed videoendoscopy (HSV@4 kHz). The glottal area waveform was determined by segmentation and objective glottal dynamic parameters were calculated. The generated acoustic signal was evaluated by objective and subjective measures. The individual impairment was documented by a standardized questionnaire (VHI). Additionally, the amount of generated saliva was measured for a defined period of time. RESULTS ED patients displayed a significantly reduced saliva production compared to the control group. Furthermore, the auditory-perceptual evaluation yielded significantly higher ratings for breathiness and hoarseness in the voices of male ED patients compared to male controls. The majority of male ED patients (67%) indicated at least minor impairment in the self-evaluation. Objective acoustic measures like Jitter and Shimmer confirmed the decreased acoustic quality in male ED patients, whereas none of the investigated HSV parameters showed significant differences between the test groups. Statistical analysis did not confirm a statistically significant correlation between reduced voice quality and amount of saliva. CONCLUSIONS An objective impairment of the acoustic outcome was demonstrated for male ED patients. However, the vocal folds dynamics in HSV recordings seem unaffected.
Collapse
Affiliation(s)
- Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany.
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Franziska Pelka
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Maria Ensthaler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Olaf Wendler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| |
Collapse
|
7
|
Heller Murray ES, Chao A. The Relationships Among Vocal Variability, Vocal-Articulatory Coordination, and Dysphonia in Children. J Voice 2023; 37:969.e43-969.e49. [PMID: 34272144 DOI: 10.1016/j.jvoice.2021.06.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/01/2021] [Accepted: 06/10/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVE The purpose of this study was to evaluate the relationship between vocal variability and variability of vocal-articulatory coordination in children. Furthermore, this study examined if this relationship was impacted by pediatric dysphonia. STUDY DESIGN Retrospective analysis of speech samples in the Arizona Child Acoustic Database. METHODS Speech samples from children 2-7 years of age were selected for analysis. Vocal variability was defined as the coefficient of variation (CoV) of fundamental frequency, taken from the center of sustained vowels. Variability of vocal-articulatory coordination was defined as the CoV of voice onset time (VOT) of voiceless stop consonants. Both objective and subjective measures of dysphonia were completed for each participant. RESULTS Children had a negative correlation between VOT variability and vocal variability. Further analysis indicated that this relationship was present in children with typical developmental levels of dysphonia but absent for children with moderate to severe dysphonia. Increased dysphonia severity was associated with increased vocal variability. CONCLUSION Increased VOT variability was associated with decreased vocal variability in children with dysphonia severities consistent with typical vocal development. However, this relationship was not present in children with moderate to severe dysphonia. This study suggests that future work is needed to examine the relationships between the vocal system and vocal-articulatory coordination in children with and without diagnosed voice disorders.
Collapse
Affiliation(s)
| | - Andie Chao
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, Pennsylvania
| |
Collapse
|
8
|
Colletti L, Heller Murray E. Voice Onset Time in Children With and Without Vocal Fold Nodules. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1467-1478. [PMID: 36940476 PMCID: PMC10457081 DOI: 10.1044/2023_jslhr-22-00463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/21/2022] [Accepted: 01/16/2023] [Indexed: 05/11/2023]
Abstract
PURPOSE Voice onset time (VOT) of voiceless consonants provides information on the coordination of the vocal and articulatory systems. This study examined whether vocal-articulatory coordination is affected by the presence of vocal fold nodules (VFNs) in children. METHOD The voices of children with VFNs (6-12 years) and age- and gender-matched vocally healthy controls were examined. VOT was calculated as the time between the voiceless stop consonant burst and the vocal onset of the vowel. Measures of the average VOT and VOT variability, defined as the coefficient of variation, were calculated. The acoustic measure of dysphonia, cepstral peak prominence (CPP), was also calculated. CPP provides information about the overall periodicity of the signal, with more dysphonic voices having lower CPP values. RESULTS There were no significant differences in either average VOT or VOT variability between the VFN and control groups. VOT variability and average VOT were both significantly predicted by the interaction between Group and CPP. There was a significant negative correlation between CPP and VOT variability in the VFN group, but no significant relationship was found in the control group. CONCLUSIONS Unlike previous studies with adults, there were no group differences in average VOT or VOT variability in this study. However, children with VFNs who were more dysphonic had increased VOT variability, suggestive of a relationship between dysphonia severity and control of vocal onset during speech production.
Collapse
Affiliation(s)
- Lauren Colletti
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, PA
| | - Elizabeth Heller Murray
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, PA
| |
Collapse
|
9
|
Feinstein H, Daşdöğen Ü, Awan JA, Awan SN, Abbott KV. Comparative Analysis of Two Methods of Perceptual Voice Assessment. J Voice 2023:S0892-1997(23)00005-X. [PMID: 36907680 PMCID: PMC10492895 DOI: 10.1016/j.jvoice.2023.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/04/2023] [Accepted: 01/05/2023] [Indexed: 03/13/2023]
Abstract
OBJECTIVES The primary aim was to compare two methods for perceptual evaluation of voice - paired comparison (PC) and visual analog scale (VAS) ratings. Secondary aims were to assess the correspondence between two dimensions of voice- overall severity of voice quality and resonant voice, and to investigate the influence of rater experience on perceptual rating scores and rating confidence scores. STUDY DESIGN Experimental design. METHODS Voice samples from six children (pre and post therapy) were rated by 15 Speech-Language Pathologists specialized in voice. Raters completed four tasks corresponding to the two rating methods and voice qualities: PC-severity, PC-resonance, VAS-severity, and VAS-resonance. For PC tasks, raters chose the better of two voice samples (better voice quality or better resonance, depending on the task) and indicated the degree of confidence in each choice. Rating and confidence score were combined to produce a number on a 1-10 scale (PC-confidence adjusted). VAS ratings involved rating voices on a scale for degree of severity and resonance, respectively. RESULTS PC-confidence adjusted and VAS ratings were moderately correlated for overall severity and also vocal resonance. VAS ratings were normally distributed and had greater rater consistency than PC-confidence adjusted ratings. VAS scores reliably predicted binary PC choices (choice of voice sample only). Overall severity and vocal resonance were weakly correlated and rater experience was not linearly related to rating scores or confidence. CONCLUSIONS Results suggest that the VAS rating method holds advantages over PC, including normally distributed ratings, superior consistency of ratings, and the ability to provide more finely grained detail regarding the auditory perception of voice. Overall severity and vocal resonance were not redundant in the current data set, suggesting that resonant voice and overall severity are not isomorphic. Finally, the number of years of clinical experience was not linearly related to perceptual ratings or rating confidence.
Collapse
Affiliation(s)
- Hagar Feinstein
- Department of Communication Sciences & Disorders, University of Delaware, Newark, Delaware.
| | - Ümit Daşdöğen
- Department of Communication Sciences & Disorders, University of Delaware, Newark, Delaware
| | - Jordan A Awan
- Department of Statistics, Purdue University, West Lafayette, Indiana
| | - Shaheen N Awan
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, Florida
| | - Katherine Verdolini Abbott
- Department of Communication Sciences & Disorders, University of Delaware, Newark, Delaware; Department of Linguistics & Cognitive Science, University of Delaware, Newark, Delaware
| |
Collapse
|
10
|
Quantitative Analysis of Vocal Fold Vibration using High-Speed Videoendoscopy in Children with and without Bilateral Lesions. J Voice 2022; 36:176-182. [PMID: 32712076 PMCID: PMC7854946 DOI: 10.1016/j.jvoice.2020.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/04/2020] [Accepted: 05/07/2020] [Indexed: 11/22/2022]
Abstract
OBJECTIVE To provide data on the measurable vocal fold vibratory differences in children with and without vocal fold lesions using high-speed videoendoscopy. DESIGN Prospective study, 24 participants (8 healthy; 16 with lesions) between the ages of 5 and 10. METHODS Rigid high-speed videoendoscopy at the rate of 8,000 frames per second was used to examine participants. Four objective vocal fold phase linearity measures were obtained to establish anterior-posterior contact and separation vibratory patterns. RESULTS All objective measures showed a difference between nonlesion and bilateral vocal fold lesion groups. Contact-separation patterns in all nonlesion girls and young pre-pubertal boys exhibited an anterior-to-posterior contact and posterior-to-anterior separation; while older boys differed. The objective measures of open quotient, left-right relative phase asymmetry and speed index, showed linear anterior-posterior patterns within the nonlesion group; while the bilateral vocal fold lesion group displayed nonlinear patterns. Patterns in the posterior region of the vocal fold were similar in both groups; while patterns in the anterior region differed. CONCLUSIONS This study suggests lesions have an effect on the anterior aspect of vocal fold vibratory patterns specifically anterior to the lesions. Age-related differences for males are also evidenced, prompting further investigation of laryngeal development in males and females from childhood to adulthood. This study could serve as a basis for the development of objective clinical measurements of vocal fold vibration in presence of lesions. Further findings could help redefine the theoretical framework of pediatric voice.
Collapse
|
11
|
Patel RR, Ternström S. Quantitative and Qualitative Electroglottographic Wave Shape Differences in Children and Adults Using Voice Map-Based Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2977-2995. [PMID: 34319772 DOI: 10.1044/2021_jslhr-20-00717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Purpose The purpose of this study is to identify the extent to which various measurements of contacting parameters differ between children and adults during habitual range and overlap vocal frequency/intensity, using voice map-based assessment of noninvasive electroglottography (EGG). Method EGG voice maps were analyzed from 26 adults (22-45 years) and 22 children (4-8 years) during connected speech and vowel /a/ over the habitual range and the overlap vocal frequency/intensity from the voice range profile task on the vowel /a/. Mean and standard deviations of contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were obtained. Group differences were evaluated using the linear mixed model analysis for the habitual range connected speech and the vowel, whereas analysis of covariance was conducted for the overlap vocal frequency/intensity from the voice range profile task. Presence of a "knee" on the EGG wave shape was determined by visual inspection of the presence of convexity along the decontacting slope of the EGG pulse and the presence of the second derivative zero-crossing. Results The contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were significantly different in children compared to (a) adult males for habitual range and (b) adult males and adult females for the overlap vocal frequency/intensity. None of the children had a "knee" on the decontacting slope of the EGG slope. Conclusion EGG parameters of contact quotient by integration, normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and absence of a "knee" on the decontacting slope characterize the wave shape differences between children and adults, whereas the normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and presence of a "knee" on the downward pulse slope characterize the wave shape differences between adult males and adult females. Supplemental Material https://doi.org/10.23641/asha.15057345.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Sten Ternström
- Division of Speech, Music, and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
12
|
Kist AM, Gómez P, Dubrovskiy D, Schlegel P, Kunduk M, Echternach M, Patel R, Semmler M, Bohr C, Dürr S, Schützenberger A, Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1889-1903. [PMID: 34000199 DOI: 10.1044/2021_jslhr-20-00498] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Denis Dubrovskiy
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Germany
| | - Rita Patel
- Department of Speech, Language and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Christopher Bohr
- Klinik und Poliklinik für Hals-Nasen-Ohren-Heilkunde Universitätsklinikum Regensburg, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| |
Collapse
|
13
|
Falk S, Kniesburges S, Schoder S, Jakubaß B, Maurerlehner P, Echternach M, Kaltenbacher M, Döllinger M. 3D-FV-FE Aeroacoustic Larynx Model for Investigation of Functional Based Voice Disorders. Front Physiol 2021; 12:616985. [PMID: 33762964 PMCID: PMC7982522 DOI: 10.3389/fphys.2021.616985] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 02/09/2021] [Indexed: 12/02/2022] Open
Abstract
For the clinical analysis of underlying mechanisms of voice disorders, we developed a numerical aeroacoustic larynx model, called simVoice, that mimics commonly observed functional laryngeal disorders as glottal insufficiency and vibrational left-right asymmetries. The model is a combination of the Finite Volume (FV) CFD solver Star-CCM+ and the Finite Element (FE) aeroacoustic solver CFS++. simVoice models turbulence using Large Eddy Simulations (LES) and the acoustic wave propagation with the perturbed convective wave equation (PCWE). Its geometry corresponds to a simplified larynx and a vocal tract model representing the vowel /a/. The oscillations of the vocal folds are externally driven. In total, 10 configurations with different degrees of functional-based disorders were simulated and analyzed. The energy transfer between the glottal airflow and the vocal folds decreases with an increasing glottal insufficiency and potentially reflects the higher effort during speech for patients being concerned. This loss of energy transfer may also have an essential influence on the quality of the sound signal as expressed by decreasing sound pressure level (SPL), Cepstral Peak Prominence (CPP), and Vocal Efficiency (VE). Asymmetry in the vocal fold oscillations also reduces the quality of the sound signal. However, simVoice confirmed previous clinical and experimental observations that a high level of glottal insufficiency worsens the acoustic signal quality more than oscillatory left-right asymmetry. Both symptoms in combination will further reduce the quality of the sound signal. In summary, simVoice allows for detailed analysis of the origins of disordered voice production and hence fosters the further understanding of laryngeal physiology, including occurring dependencies. A current walltime of 10 h/cycle is, with a prospective increase in computing power, auspicious for a future clinical use of simVoice.
Collapse
Affiliation(s)
- Sebastian Falk
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stefan Schoder
- Institute of Fundamentals and Theory in Electrical Engineering, Division Vibro- and Aeroacoustics, Graz University of Technology, Graz, Austria
| | - Bernhard Jakubaß
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Paul Maurerlehner
- Institute of Fundamentals and Theory in Electrical Engineering, Division Vibro- and Aeroacoustics, Graz University of Technology, Graz, Austria
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Manfred Kaltenbacher
- Institute of Fundamentals and Theory in Electrical Engineering, Division Vibro- and Aeroacoustics, Graz University of Technology, Graz, Austria
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
14
|
Semmler M, Berry DA, Schützenberger A, Döllinger M. Fluid-structure-acoustic interactions in an ex vivo porcine phonation model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1657. [PMID: 33765793 PMCID: PMC7952141 DOI: 10.1121/10.0003602] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 01/29/2021] [Accepted: 02/07/2021] [Indexed: 05/02/2023]
Abstract
In the clinic, many diagnostic and therapeutic procedures focus on the oscillation patterns of the vocal folds (VF). Dynamic characteristics of the VFs, such as symmetry, periodicity, and full glottal closure, are considered essential features for healthy phonation. However, the relevance of these individual factors in the complex interaction between the airflow, laryngeal structures, and the resulting acoustics has not yet been quantified. Sustained phonation was induced in nine excised porcine larynges without vocal tract (supraglottal structures had been removed above the ventricular folds). The multimodal setup was designed to simultaneously control and monitor key aspects of phonation in the three essential parts of the larynx. More specifically, measurements will comprise (1) the subglottal pressure signal, (2) high-speed recordings in the glottal plane, and (3) the acoustic signal in the supraglottal region. The automated setup regulates glottal airflow, asymmetric arytenoid adduction, and the pre-phonatory glottal gap. Statistical analysis revealed a beneficial influence of VF periodicity and glottal closure on the signal quality of the subglottal pressure and the supraglottal acoustics, whereas VF symmetry only had a negligible influence. Strong correlations were found between the subglottal and supraglottal signal quality, with significant improvement of the acoustic quality for high levels of periodicity and glottal closure.
Collapse
Affiliation(s)
- Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - David A Berry
- Laryngeal Dynamics Laboratory, Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, California 90024, USA
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| |
Collapse
|
15
|
Schlegel P, Kist AM, Kunduk M, Dürr S, Döllinger M, Schützenberger A. Interdependencies between acoustic and high-speed videoendoscopy parameters. PLoS One 2021; 16:e0246136. [PMID: 33529244 PMCID: PMC7853476 DOI: 10.1371/journal.pone.0246136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≥ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California, United States of America
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
- * E-mail:
| | - Andreas M. Kist
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Melda Kunduk
- Dep. of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Stephan Dürr
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
16
|
Murtola T, Alku P. Indicators of anterior-posterior phase difference in glottal opening measured from natural production of vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL141. [PMID: 32873022 DOI: 10.1121/10.0001722] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 07/22/2020] [Indexed: 06/11/2023]
Abstract
Voiced speech is generated by the glottal flow interacting with vocal fold vibrations. However, the details of vibrations in the anterior-posterior direction (the so-called zipper-effect) and their correspondence with speech and other glottal signals are not fully understood due to challenges in direct measurements of vocal fold vibrations. In this proof-of-concept study, the potential of four parameters extracted from high-speed videoendoscopy (HSV), electroglottography, and speech signals to indicate the presence of a zipper-type glottal opening is investigated. Comparison with manual labeling of the HSV videos highlighted the importance of multiple parameter-signal pairs in indicating the presence of a zipper-type glottal opening.
Collapse
Affiliation(s)
- Tiina Murtola
- Department of Signal Processing and Acoustics, Aalto University, Espoo, ,
| | - Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, Espoo, ,
| |
Collapse
|
17
|
Schlegel P, Kniesburges S, Dürr S, Schützenberger A, Döllinger M. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Sci Rep 2020; 10:10517. [PMID: 32601277 PMCID: PMC7324600 DOI: 10.1038/s41598-020-66405-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 11/13/2022] Open
Abstract
In voice research and clinical assessment, many objective parameters are in use. However, there is no commonly used set of parameters that reflect certain voice disorders, such as functional dysphonia (FD); i.e. disorders with no visible anatomical changes. Hence, 358 high-speed videoendoscopy (HSV) recordings (159 normal females (NF), 101 FD females (FDF), 66 normal males (NM), 32 FD males (FDM)) were analyzed. We investigated 91 quantitative HSV parameters towards their significance. First, 25 highly correlated parameters were discarded. Second, further 54 parameters were discarded by using a LogitBoost decision stumps approach. This yielded a subset of 12 parameters sufficient to reflect functional dysphonia. These parameters separated groups NF vs. FDF and NM vs. FDM with fair accuracy of 0.745 or 0.768, respectively. Parameters solely computed from the changing glottal area waveform (1D-function called GAW) between the vocal folds were less important than parameters describing the oscillation characteristics along the vocal folds (2D-function called Phonovibrogram). Regularity of GAW phases and peak shape, harmonic structure and Phonovibrogram-based vocal fold open and closing angles were mainly important. This study showed the high degree of redundancy of HSV-voice-parameters but also affirms the need of multidimensional based assessment of clinical data.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.
| | - Stefan Kniesburges
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stephan Dürr
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
18
|
Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S, Echternach M, Johnson AM, Kniesburges S, Kunduk M, Maryn Y, Schützenberger A, Verguts M, Döllinger M. BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation. Sci Data 2020; 7:186. [PMID: 32561845 PMCID: PMC7305104 DOI: 10.1038/s41597-020-0526-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 05/15/2020] [Indexed: 02/06/2023] Open
Abstract
Laryngeal videoendoscopy is one of the main tools in clinical examinations for voice disorders and voice research. Using high-speed videoendoscopy, it is possible to fully capture the vocal fold oscillations, however, processing the recordings typically involves a time-consuming segmentation of the glottal area by trained experts. Even though automatic methods have been proposed and the task is particularly suited for deep learning methods, there are no public datasets and benchmarks available to compare methods and to allow training of generalizing deep learning models. In an international collaboration of researchers from seven institutions from the EU and USA, we have created BAGLS, a large, multihospital dataset of 59,250 high-speed videoendoscopy frames with individually annotated segmentation masks. The frames are based on 640 recordings of healthy and disordered subjects that were recorded with varying technical equipment by numerous clinicians. The BAGLS dataset will allow an objective comparison of glottis segmentation methods and will enable interested researchers to train their own models and compare their methods.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Aaron M Johnson
- NYU Voice Center, Department of Otolaryngology - Head and Neck Surgery, New York University School of Medicine, New York, New York, USA
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Youri Maryn
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Speech, Language and Hearing sciences, University of Ghent, Ghent, Belgium
- Faculty of Education, Health and Social Work, University College Ghent, Ghent, Belgium
- Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Monique Verguts
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| |
Collapse
|
19
|
Heller Murray ES, Segina RK, Woodnorth GH, Stepp CE. Relative Fundamental Frequency in Children With and Without Vocal Fold Nodules. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:361-371. [PMID: 32073342 PMCID: PMC7210445 DOI: 10.1044/2019_jslhr-19-00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Purpose Relative fundamental frequency (RFF) is an acoustic measure that is sensitive to functional voice differences in adults. The aim of the current study was to evaluate RFF in children, as there are known structural and functional differences between the pediatric and adult vocal mechanisms. Method RFF was analyzed in 28 children with vocal fold nodules (CwVN, M = 9.0 years) and 28 children with typical voices (CwTV, M = 8.9 years). RFF is the instantaneous fundamental frequency (f 0) of the 10 vocalic cycles during devoicing (vocal offset) and 10 vocalic cycles during the revoicing (vocal onset) of the vowels that surround a voiceless consonant. Each cycle's f 0 was normalized to a steady-state portion of the vowel. RFF values for the cycles closest to the voiceless consonant, that is, Offset Cycle 10 and Onset Cycle 1, were examined. Results Average RFF values for Offset Cycle 10 and Onset Cycle 1 did not differ between CwVN and CwTV; however, within-subject variability of Offset Cycle 10 was decreased in CwVN. Across both groups, male children had lower Offset Cycle 10 RFF values as compared to female children. Additionally, Onset Cycle 1 values were decreased in younger children as compared to those of older children. Conclusions Unlike previous work with adults, CwVN did not have significantly different RFF values than CwTV. Younger children had lower RFF values for Onset Cycle 1 than older children, suggesting that vocal onset f 0 may provide information on the maturity of the laryngeal motor system.
Collapse
Affiliation(s)
- Elizabeth S. Heller Murray
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Otolaryngology and Communication Enhancement, Boston Children's Hospital, MA
| | - Roxanne K. Segina
- Department of Speech, Language & Hearing Sciences, Boston University, MA
| | | | - Cara E. Stepp
- Department of Speech, Language & Hearing Sciences, Boston University, MA
- Department of Otolaryngology—Head & Neck Surgery, Boston University School of Medicine, MA
- Department of Biomedical Engineering, Boston University, MA
| |
Collapse
|
20
|
Maryn Y, Verguts M, Demarsin H, van Dinther J, Gomez P, Schlegel P, Döllinger M. Intersegmenter Variability in High-Speed Laryngoscopy-Based Glottal Area Waveform Measures. Laryngoscope 2019; 130:E654-E661. [PMID: 31840827 DOI: 10.1002/lary.28475] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/26/2019] [Indexed: 12/31/2022]
Abstract
OBJECTIVES/HYPOTHESIS High-speed videoendoscopy (HSV) has potential to objectively quantify vibratory vocal fold characteristics during phonation. Glottal Analysis Tools (GAT) version 2018, developed in Erlangen, Germany, is software for determining various glottal area waveform (GAW) quantities. Before having GAT analyze HSV videos, segmenters have to define glottis manually across videos in a semiautomatic segmentation protocol. Such interventions are hypothesized to induce variability of subsequent GAW measure computation across segmenters and may attenuate GAT measures' reliability to a certain point. This study explored intersegmenter variability in GAT's GAW measures based on semiautomatic image processing. STUDY DESIGN Cohort study of rater reliability. METHODS In total, 20 HSV videos from normophonic and dysphonic subjects with various laryngeal disorders were selected for this study and segmented by three trained segmenters. They separately segmented glottis areas in the same frame sets of the videos. Upon analysis of GAW, GAT offers 46 measures related to topologic GAW dynamic characteristics, GAW periodicity and perturbation characteristics, and GAW harmonic components. To address GAT's reliability, intersegmenter-based variability in these measures was examined with intraclass correlation coefficient (ICC). RESULTS In general, ICC behavior of the 46 GAW measures across three raters was highly acceptable. ICC of one parameter was moderate (0.5 < ICC < 0.75), good for seven parameters (0.75 < ICC < 0.9), and excellent for 38 parameters (0.9 < ICC). CONCLUSIONS Overall, high ICC values confirm clinical applicability of GAT for objective and quantitative assessment of HSV. Small intersegmenter differences with actual small parameter differences suggest that manual or semiautomatic segmentation in GAT does not noticeably influence clinical assessment outcome. To guarantee the software's performance, we suggest segmentation training before clinical application. LEVEL OF EVIDENCE 2b Laryngoscope, 130:E654-E661, 2020.
Collapse
Affiliation(s)
- Youri Maryn
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium.,Department of Speech, Language, and Hearing Sciences, University of Ghent, Ghent, Belgium.,Faculty of Education, Health, and Social Work, University College of Ghent, Ghent, Belgium.,Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium.,Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium.,Phonanium, Lokeren, Belgium
| | - Monique Verguts
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium.,Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Hannelore Demarsin
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium
| | - Joost van Dinther
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium
| | - Pablo Gomez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
21
|
Diaz-Cadiz M, McKenna VS, Vojtech JM, Stepp CE. Adductory Vocal Fold Kinematic Trajectories During Conventional Versus High-Speed Videoendoscopy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:1685-1706. [PMID: 31181175 PMCID: PMC6808372 DOI: 10.1044/2019_jslhr-s-18-0405] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Objective Prephonatory vocal fold angle trajectories may supply useful information about the laryngeal system but were examined in previous studies using sigmoidal curves fit to data collected at 30 frames per second (fps). Here, high-speed videoendoscopy (HSV) was used to investigate the impacts of video frame rate and sigmoidal fitting strategy on vocal fold adductory patterns for voicing onsets. Method Twenty-five participants with healthy voices performed /ifi/ sequences under flexible nasendoscopy at 1,000 fps. Glottic angles were extracted during adduction for voicing onset; resulting vocal fold trajectories (i.e., changes in glottic angle over time) were down-sampled to simulate different frame rate conditions (30-1,000 fps). Vocal fold adduction data were fit with asymmetric sigmoids using 5 fitting strategies with varying parameter restrictions. Adduction trajectories and maximum adduction velocities were compared between the fits and the actual HSV data. Adduction trajectory errors between HSV data and fits were evaluated using root-mean-square error and maximum angular velocity error. Results Simulated data were generally well fit by sigmoid models; however, when compared to the actual 1,000-fps data, sigmoid fits were found to overestimate maximum angle velocities. Errors decreased as frame rate increased, reaching a plateau by 120 fps. Conclusion In healthy adults, vocal fold kinematic behavior during adduction is generally sigmoidal, although such fits can produce substantial errors when data are acquired at frame rates lower than 120 fps.
Collapse
Affiliation(s)
- Manuel Diaz-Cadiz
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
| | | | - Jennifer M. Vojtech
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
| | - Cara E. Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology–Head and Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
22
|
Gómez P, Semmler M, Schützenberger A, Bohr C, Döllinger M. Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network. Med Biol Eng Comput 2019; 57:1451-1463. [DOI: 10.1007/s11517-019-01965-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
|
23
|
Gómez P, Schützenberger A, Semmler M, Döllinger M. Laryngeal Pressure Estimation With a Recurrent Neural Network. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2018; 7:2000111. [PMID: 30680252 PMCID: PMC6331197 DOI: 10.1109/jtehm.2018.2886021] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 10/24/2018] [Accepted: 11/30/2018] [Indexed: 11/24/2022]
Abstract
Quantifying the physical parameters of voice production is essential for understanding the process of phonation and can aid in voice research and diagnosis. As an alternative to invasive measurements, they can be estimated by formulating an inverse problem using a numerical forward model. However, high-fidelity numerical models are often computationally too expensive for this. This paper presents a novel approach to train a long short-term memory network to estimate the subglottal pressure in the larynx at massively reduced computational cost using solely synthetic training data. We train the network on synthetic data from a numerical two-mass model and validate it on experimental data from 288 high-speed ex vivo video recordings of porcine vocal folds from a previous study. The training requires significantly fewer model evaluations compared with the previous optimization approach. On the test set, we maintain a comparable performance of 21.2% versus previous 17.7% mean absolute percentage error in estimating the subglottal pressure. The evaluation of one sample requires a vanishingly small amount of computation time. The presented approach is able to maintain estimation accuracy of the subglottal pressure at significantly reduced computational cost. The methodology is likely transferable to estimate other parameters and training with other numerical models. This improvement should allow the adoption of more sophisticated, high-fidelity numerical models of the larynx. The vast speedup is a critical step to enable a future clinical application and knowledge of parameters such as the subglottal pressure will aid in diagnosis and treatment selection.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| |
Collapse
|
24
|
Chou A, Schrof C, Polce E, Braden M, McMurray J, Jiang J. Comparing the Nonlinear Dynamic Acoustic Parameters of Healthy Adult and Pediatric Voices. Ann Otol Rhinol Laryngol 2018; 127:937-945. [DOI: 10.1177/0003489418803394] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Objectives: The aims of this study were to compare nondysphonic adult and pediatric voices using linear and nonlinear acoustic parameters and to evaluate the ability of adult spectrum convergence ratio (SCR) and rate of divergence (ROD) reference values to correctly identify a pediatric voice type as periodic or aperiodic. Methods: Twenty adult and 36 pediatric nondysphonic voice samples were collected and analyzed using linear and nonlinear acoustic parameters. Absence of voice disorder was confirmed using perceptual acoustic and spectral analysis. Mean values for jitter, shimmer, SCR, and ROD were compared between adults and children, across specific age groups, and within genders. Using adult reference values for SCR and ROD, samples were classified as primarily periodic or aperiodic and typed using spectral analysis. Rates of accurate typing were also compared between subject groups. Results: Overall, jitter and shimmer were similar among the adult and pediatric age groups. ROD was significantly different among the 3 pediatric and 1 adult group; the pediatric age groups were similar to one another. Adult SCR was also significantly different from all of the pediatric age groups. In adult men, ROD and SCR were significantly different from all of the pediatric age groups; the pediatric age groups were similar to one another. In female subjects, ROD was significantly different among all age groups. The ROD and SCR reference values were significantly better at categorizing adult voice types compared with pediatric voice types. Conclusions: In healthy subjects, SCR and ROD have discriminatory power for identifying adult versus pediatric voices, while jitter and shimmer cannot differentiate between the 2 groups. However, age- and gender-specific pediatric reference values must be determined to accurately classify voice types using SCR and ROD.
Collapse
Affiliation(s)
- Adriana Chou
- Department of Surgery, Division of Otolaryngology–Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| | - Colin Schrof
- Department of Surgery, Division of Otolaryngology–Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| | - Evan Polce
- Department of Surgery, Division of Otolaryngology–Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| | - Maia Braden
- Department of Surgery, Division of Otolaryngology–Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| | - James McMurray
- Department of Surgery, Division of Otolaryngology–Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| | - Jack Jiang
- Department of Surgery, Division of Otolaryngology–Head and Neck Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| |
Collapse
|
25
|
Semmler M, Döllinger M, Patel RR, Ziethe A, Schützenberger A. Clinical relevance of endoscopic three-dimensional imaging for quantitative assessment of phonation. Laryngoscope 2018. [DOI: 10.1002/lary.27165] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences; Indiana University; Bloomington Indiana U.S.A
| | - Anke Ziethe
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| |
Collapse
|
26
|
Gómez P, Schützenberger A, Kniesburges S, Bohr C, Döllinger M. Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework. Biomech Model Mechanobiol 2017; 17:777-792. [DOI: 10.1007/s10237-017-0992-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/30/2017] [Indexed: 11/28/2022]
|
27
|
Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PLoS One 2017; 12:e0187486. [PMID: 29121085 PMCID: PMC5679561 DOI: 10.1371/journal.pone.0187486] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/18/2017] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters. METHODS The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed. RESULTS AND CONCLUSION The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
28
|
Endoscopic Laser-Based 3D Imaging for Functional Voice Diagnostics. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7060600] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
29
|
Zacharias SRC, Brehm SB, Weinrich B, Kelchner L, Tabangin M, de Alarcon A. Feasibility of Clinical Endoscopy and Stroboscopy in Children With Bilateral Vocal Fold Lesions. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2016; 25:598-604. [PMID: 27893084 DOI: 10.1044/2016_ajslp-15-0071] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Accepted: 04/10/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE The purpose of this study was to examine the utility of flexible and rigid endoscopy and stroboscopy for the identification of anatomical and physiological features in children with bilateral vocal fold lesions. The secondary purpose was to describe the age distribution of patients who could tolerate use of the different types of endoscopes. METHOD This cross-sectional clinic-based study included 38 children (ages 5 to 12 years) diagnosed with bilateral vocal fold lesions via videoendoscopy. Vocal fold vibratory characteristics (e.g., mucosal wave) were rated by 4 clinicians by consensus. RESULTS Bilateral vocal fold lesions could be well described anatomically after visualization with both flexible and rigid endoscopes and were most commonly described as symmetrical and broad based. However, the clinicians' confidence in the accuracy of stroboscopy for rating vocal fold vibratory characteristics was limited for both flexible and rigid stroboscopes. CONCLUSIONS Videoendoscopy was adequate for viewing and characterizing anatomical structures of bilateral vocal fold lesions in pediatric patients; however, vibratory characteristics were often not fully visualized with videostroboscopy. In view of the importance of visualizing vocal fold vibration in the differential diagnosis and treatment of vocal fold lesions, other imaging modalities, such as high-speed videoendoscopy, may provide more accurate descriptions of vocal fold vibratory characteristics in this population.
Collapse
Affiliation(s)
- Stephanie R C Zacharias
- Center for Pediatric Voice Disorders, Cincinnati Children's Hospital Medical Center, OHDivision of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, OHUniversity of Cincinnati, OHDivision of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children's Hospital Medical Center, OH
| | - Susan Baker Brehm
- Center for Pediatric Voice Disorders, Cincinnati Children's Hospital Medical Center, OHDivision of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, OHMiami University, Oxford, OH
| | - Barbara Weinrich
- Center for Pediatric Voice Disorders, Cincinnati Children's Hospital Medical Center, OHDivision of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, OHMiami University, Oxford, OH
| | - Lisa Kelchner
- Center for Pediatric Voice Disorders, Cincinnati Children's Hospital Medical Center, OHDivision of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, OHUniversity of Cincinnati, OH
| | - Meredith Tabangin
- Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, OH
| | - Alessandro de Alarcon
- Center for Pediatric Voice Disorders, Cincinnati Children's Hospital Medical Center, OHUniversity of Cincinnati, OHDivision of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children's Hospital Medical Center, OH
| |
Collapse
|
30
|
Patel RR. Vibratory onset and offset times in children: A laryngeal imaging study. Int J Pediatr Otorhinolaryngol 2016; 87:11-7. [PMID: 27368436 PMCID: PMC4930831 DOI: 10.1016/j.ijporl.2016.05.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 05/10/2016] [Accepted: 05/12/2016] [Indexed: 10/21/2022]
Abstract
OBJECTIVES The aim of the study was to evaluate the differences in vibratory onset and offset times across age (adult males, adult females, and children) and waveform types (total glottal area waveform, left glottal area waveform, and right glottal area waveform) using high-speed videoendoscopy. METHODS In this prospective study, vibratory onset and offset times were evaluated in a total of 86 participants. Forty-three children (23 girls, 18 boys) between 5 and 11 years and 43 gender matched vocally normal young adults (23 females and 18 males) in the age range (21-45 years) were recruited. Vibratory onset and offset times were calculated in milliseconds from the total, left, and right Glottal Area Waveform (GAW). A two-factor analysis of variance was used to compare the means among the subject groups (children, adult male, and adult female) and waveform type (total GAW, left GAW, right GAW) for onset and offset variables. Post hoc analyses were performed using the Fishers Least Significant Different test with Bonferroni correction for multiple comparisons. RESULTS Children exhibited significantly shorter vibratory onset and offset times compared to adult males and females. Differences in vibratory onset and offset times were not statistically significant between adult males and females. Across all waveform types (i.e. total GAW, left GAW, and right GAW), no statistical significance was observed among the subject groups. CONCLUSION This is the first study reporting vibratory onset and offset times in the pediatric population. The study findings lay the foundation for the development of a large age- and gender-based database of the pediatric population to aid the study of the effects of maturation of vocal fold vibration in adulthood. The findings from this study may also provide the basis for evaluating the impact of numerous lesions on tissue pliability, and thereby has potential utility for the clinical differentiation of various lesions.
Collapse
Affiliation(s)
- Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University
| |
Collapse
|
31
|
Wang JS, Olszewski E, Devine EE, Hoffman MR, Zhang Y, Shao J, Jiang JJ. Extension and Application of High-Speed Digital Imaging Analysis Via Spatiotemporal Correlation and Eigenmode Analysis of Vocal Fold Vibration Before and After Polyp Excision. Ann Otol Rhinol Laryngol 2016; 125:660-6. [PMID: 27164969 DOI: 10.1177/0003489416644618] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
OBJECTIVES/HYPOTHESIS To evaluate the spatiotemporal correlation of vocal fold vibration using eigenmode analysis before and after polyp removal and explore the potential clinical relevance of spatiotemporal analysis of correlation length and entropy as quantitative voice parameters. We hypothesized that increased order in the vibrating signal after surgical intervention would decrease the eigenmode-based entropy and increase correlation length. STUDY DESIGN Prospective case series. METHODS Forty subjects (23 males, 17 females) with unilateral (n = 24) or bilateral (n = 16) polyps underwent polyp removal. High-speed videoendoscopy was performed preoperatively and 2 weeks postoperatively. Spatiotemporal analysis was performed to determine entropy, quantification of signal disorder, correlation length, size, and spatially ordered structure of vocal fold vibration in comparison to full spatial consistency. The signal analyzed consists of the vibratory pattern in space and time derived from the high-speed video glottal area contour. RESULTS Entropy decreased (Z = -3.871, P < .001) and correlation length increased (t = -8.913, P < .001) following polyp excision. The intraclass correlation coefficients (ICC) for correlation length and entropy were 0.84 and 0.93. CONCLUSION Correlation length and entropy are sensitive to mass lesions. These parameters could potentially be used to augment subjective visualization after polyp excision when evaluating procedural efficacy.
Collapse
Affiliation(s)
- Jun-Sheng Wang
- Fudan University, Department of Otolaryngology-Head and Neck Surgery, EENT Hospital, Shanghai, China Jun-Sheng Wang and Emily Olszewski are co-first authors of the article
| | - Emily Olszewski
- University of Wisconsin-Madison School of Medicine and Public Health, Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, Madison, Wisconsin, USA Jun-Sheng Wang and Emily Olszewski are co-first authors of the article
| | - Erin E Devine
- University of Wisconsin-Madison School of Medicine and Public Health, Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, Madison, Wisconsin, USA
| | - Matthew R Hoffman
- University of Wisconsin-Madison School of Medicine and Public Health, Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, Madison, Wisconsin, USA
| | - Yu Zhang
- Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, Xiamen University, Xiamen, P. R. China
| | - Jun Shao
- Fudan University, Department of Otolaryngology-Head and Neck Surgery, EENT Hospital, Shanghai, China
| | - Jack J Jiang
- Fudan University, Department of Otolaryngology-Head and Neck Surgery, EENT Hospital, Shanghai, China University of Wisconsin-Madison School of Medicine and Public Health, Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, Madison, Wisconsin, USA
| |
Collapse
|
32
|
Patel RR, Unnikrishnan H, Donohue KD. Effects of Vocal Fold Nodules on Glottal Cycle Measurements Derived from High-Speed Videoendoscopy in Children. PLoS One 2016; 11:e0154586. [PMID: 27124157 PMCID: PMC4849744 DOI: 10.1371/journal.pone.0154586] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 04/17/2016] [Indexed: 11/18/2022] Open
Abstract
The goal of this study is to quantify the effects of vocal fold nodules on vibratory motion in children using high-speed videoendoscopy. Differences in vibratory motion were evaluated in 20 children with vocal fold nodules (5–11 years) and 20 age and gender matched typically developing children (5–11 years) during sustained phonation at typical pitch and loudness. Normalized kinematic features of vocal fold displacements from the mid-membranous vocal fold point were extracted from the steady-state high-speed video. A total of 12 kinematic features representing spatial and temporal characteristics of vibratory motion were calculated. Average values and standard deviations (cycle-to-cycle variability) of the following kinematic features were computed: normalized peak displacement, normalized average opening velocity, normalized average closing velocity, normalized peak closing velocity, speed quotient, and open quotient. Group differences between children with and without vocal fold nodules were statistically investigated. While a moderate effect size was observed for the spatial feature of speed quotient, and the temporal feature of normalized average closing velocity in children with nodules compared to vocally normal children, none of the features were statistically significant between the groups after Bonferroni correction. The kinematic analysis of the mid-membranous vocal fold displacement revealed that children with nodules primarily differ from typically developing children in closing phase kinematics of the glottal cycle, whereas the opening phase kinematics are similar. Higher speed quotients and similar opening phase velocities suggest greater relative forces are acting on vocal fold in the closing phase. These findings suggest that future large-scale studies should focus on spatial and temporal features related to the closing phase of the glottal cycle for differentiating the kinematics of children with and without vocal fold nodules.
Collapse
Affiliation(s)
- Rita R. Patel
- Department of Speech & Hearing Sciences, Indiana University, Bloomington, Indiana, United States of America
- * E-mail:
| | - Harikrishnan Unnikrishnan
- Department of Electrical and Computer Engineering, University of Kentucky, Lexington, Kentucky, United States of America
| | - Kevin D. Donohue
- Department of Electrical and Computer Engineering, University of Kentucky, Lexington, Kentucky, United States of America
| |
Collapse
|
33
|
Unger J, Schuster M, Hecker DJ, Schick B, Lohscheller J. A generalized procedure for analyzing sustained and dynamic vocal fold vibrations from laryngeal high-speed videos using phonovibrograms. Artif Intell Med 2015; 66:15-28. [PMID: 26597002 DOI: 10.1016/j.artmed.2015.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 09/28/2015] [Accepted: 10/20/2015] [Indexed: 12/01/2022]
Abstract
OBJECTIVE This work presents a computer-based approach to analyze the two-dimensional vocal fold dynamics of endoscopic high-speed videos, and constitutes an extension and generalization of a previously proposed wavelet-based procedure. While most approaches aim for analyzing sustained phonation conditions, the proposed method allows for a clinically adequate analysis of both dynamic as well as sustained phonation paradigms. MATERIALS AND METHODS The analysis procedure is based on a spatio-temporal visualization technique, the phonovibrogram, that facilitates the documentation of the visible laryngeal dynamics. From the phonovibrogram, a low-dimensional set of features is computed using a principle component analysis strategy that quantifies the type of vibration patterns, irregularity, lateral symmetry and synchronicity, as a function of time. Two different test bench data sets are used to validate the approach: (I) 150 healthy and pathologic subjects examined during sustained phonation. (II) 20 healthy and pathologic subjects that were examined twice: during sustained phonation and a glissando from a low to a higher fundamental frequency. In order to assess the discriminative power of the extracted features, a Support Vector Machine is trained to distinguish between physiologic and pathologic vibrations. The results for sustained phonation sequences are compared to the previous approach. Finally, the classification performance of the stationary analyzing procedure is compared to the transient analysis of the glissando maneuver. RESULTS For the first test bench the proposed procedure outperformed the previous approach (proposed feature set: accuracy: 91.3%, sensitivity: 80%, specificity: 97%, previous approach: accuracy: 89.3%, sensitivity: 76%, specificity: 96%). Comparing the classification performance of the second test bench further corroborates that analyzing transient paradigms provides clear additional diagnostic value (glissando maneuver: accuracy: 90%, sensitivity: 100%, specificity: 80%, sustained phonation: accuracy: 75%, sensitivity: 80%, specificity: 70%). CONCLUSIONS The incorporation of parameters describing the temporal evolvement of vocal fold vibration clearly improves the automatic identification of pathologic vibration patterns. Furthermore, incorporating a dynamic phonation paradigm provides additional valuable information about the underlying laryngeal dynamics that cannot be derived from sustained conditions. The proposed generalized approach provides a better overall classification performance than the previous approach, and hence constitutes a new advantageous tool for an improved clinical diagnosis of voice disorders.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany.
| | - Maria Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, Marchioninistr. 13, 81366 München, Germany
| | - Dietmar J Hecker
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany
| |
Collapse
|
34
|
Patel R, Donohue KD, Unnikrishnan H, Kryscio RJ. Kinematic measurements of the vocal-fold displacement waveform in typical children and adult populations: quantification of high-speed endoscopic videos. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:227-40. [PMID: 25652615 PMCID: PMC4675116 DOI: 10.1044/2015_jslhr-s-13-0056] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Revised: 10/24/2013] [Accepted: 12/22/2014] [Indexed: 05/20/2023]
Abstract
PURPOSE This article presents a quantitative method for assessing instantaneous and average lateral vocal-fold motion from high-speed digital imaging, with a focus on developmental changes in vocal-fold kinematics during childhood. METHOD Vocal-fold vibrations were analyzed for 28 children (aged 5-11 years) and 28 adults (aged 21-45 years) without voice disorders. The following kinematic features were analyzed from the vocal-fold displacement waveforms: relative velocity-based features (normalized average and peak opening and closing velocities), relative acceleration-based features (normalized peak opening and closing accelerations), speed quotient, and normalized peak displacement. RESULTS Children exhibited significantly larger normalized peak displacements, normalized average and peak opening velocities, normalized average and peak closing velocities, peak opening and closing accelerations, and speed quotient compared to adult women. Values of normalized average closing velocity and speed quotient were higher in children compared to adult men. CONCLUSIONS When compared to adult men, developing children typically have higher estimates of kinematic features related to normalized displacement and its derivatives. In most cases, the kinematic features of children are closer to those of adult men than adult women. Even though boys experience greater changes in glottal length and pitch as they mature, results indicate that girls experience greater changes in kinematic features compared to boys.
Collapse
|
35
|
Bohr C, Kräck A, Dubrovskiy D, Eysholdt U, Svec J, Psychogios G, Ziethe A, Döllinger M. Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1148-1161. [PMID: 24686496 DOI: 10.1044/2014_jslhr-s-12-0076] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE The aim of this study was to identify parameters that would differentiate healthy from pathological organic-based vocal fold vibrations to emphasize clinical usefulness of high-speed imaging. METHOD Fifty-five men (M age = 36 years, SD = 20 years) were examined and separated into 4 groups: 1 healthy (26 individuals) and 3 pathological (10 individuals with contact granuloma, 12 with polyps, and 7 with cysts). Vocal fold vibrations were recorded using a high-speed camera during sustained phonation. Twenty objective glottal area waveform and 24 phonovibrogram parameters representing spatiotemporal characteristics were analyzed. Statistical group comparisons were performed to document spatiotemporal changes for organic lesions that cannot be determined visually. To look for specific pattern profiles within organic lesions, the authors performed linear discriminant analysis. RESULTS Thirteen parameters showed significant differences between the healthy group and at least 1 pathological group. The differences occurred more in temporal than in spatial parameters. Contact granuloma showed the fewest statistical differences (3 parameters), followed by cysts (9 parameters), and polyps (10 parameters). Linear discriminant analysis achieved accuracy performance of 76% (all groups separated) and 82% (healthy vs. pathological). CONCLUSION The results suggest that for males, the differences between healthy voices and organic voice disorders may be more pronounced within temporal characteristics that cannot be visually detected without high-speed imaging.
Collapse
|
36
|
Patel R, Dubrovskiy D, Döllinger M. Characterizing vibratory kinematics in children and adults with high-speed digital imaging. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:S674-86. [PMID: 24686982 PMCID: PMC7315516 DOI: 10.1044/2014_jslhr-s-12-0278] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
PURPOSE The aim of this study is to quantify and identify characteristic vibratory motion in typically developing prepubertal children and young adults using high-speed digital imaging. METHOD The vibrations of the vocal folds were recorded from 27 children (ages 5-9 years) and 35 adults (ages 21-45 years), with high speed at 4,000 frames per second for sustained phonation. Kinematic features of amplitude periodicity, time periodicity, phase asymmetry, spatial symmetry, and glottal gap index were analyzed from the glottal area waveform across mean and standard deviation (i.e., intercycle variability) for each measure. RESULTS Children exhibited lower mean amplitude periodicity compared to men and women and lower time periodicity compared to men. Children and women exhibited greater variability in amplitude periodicity, time periodicity, phase asymmetry, and glottal gap index compared to men. Women had lower mean values of amplitude periodicity and time periodicity compared to men. CONCLUSION Children differed both spatially but more temporally in vocal fold motion, suggesting the need for the development of children-specific kinematic norms. Results suggest more uncontrolled vibratory motion in children, reflecting changes in the vocal fold layered structure and aero-acoustic source mechanisms.
Collapse
|
37
|
Patel RR, Dubrovskiy D, Döllinger M. Measurement of glottal cycle characteristics between children and adults: physiological variations. J Voice 2014; 28:476-86. [PMID: 24629646 DOI: 10.1016/j.jvoice.2013.12.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Accepted: 12/16/2013] [Indexed: 10/25/2022]
Abstract
OBJECTIVES The aim of this study was to quantify phases of the vibratory cycle using measurements of glottal cycle quotients and glottal cycle derivatives, in typically developing prepubertal children and young adults with the use of high-speed digital imaging (HSDI). METHODS Vocal fold vibrations were recorded from 27 children (age range 5-9 years) and 35 adults (age range 21-45 years), with HSDI at 4000 frames per second for sustained phonation. Glottal area waveform measures of Open Quotient, Closing Quotient, Speed Index (SI), Rate Quotient, and Asymmetry Quotient (AsyQ) were computed. Glottal cycle derivatives of Amplitude Quotient (AQ) and Maximum Area Declination Rate (MADR) were also computed. Group differences (adult females, adult males, and children) were statistically investigated for mean and standard deviation values of the glottal cycle quotients and glottal cycle derivatives. RESULTS Children exhibited higher values of SI and AsyQ and lower values of MADR compared with adult males. Children exhibited the highest mean value and lowest variability in AQ compared with adult males and females. Adult males showed lower values of SI, AsyQ, AQ, and higher values of MADR compared with adult females. CONCLUSIONS Glottal cycle vibratory motion in children is functionally different compared with adult males and females, suggesting the need for development of children specific norms for both normal and disordered voice qualities.
Collapse
Affiliation(s)
- Rita R Patel
- Department Speech and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington, Indiana.
| | - Denis Dubrovskiy
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Michael Döllinger
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Erlangen, Germany
| |
Collapse
|