1
|
Andrade-Miranda G, Chatzipapas K, Arias-Londoño JD, Godino-Llorente JI. GIRAFE: Glottal imaging dataset for advanced segmentation, analysis, and facilitative playbacks evaluation. Data Brief 2025; 59:111376. [PMID: 40027255 PMCID: PMC11872128 DOI: 10.1016/j.dib.2025.111376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Revised: 01/30/2025] [Accepted: 02/03/2025] [Indexed: 03/05/2025] Open
Abstract
The advances in the development of Facilitative Playbacks extracted from High-Speed videoendo- scopic sequences of the vocal folds are hindered by a notable lack of publicly available datasets annotated with the semantic segmentations corresponding to the area of the glottal gap. This fact also limits the reproducibility and further exploration of existing research in this field. To address this gap, GIRAFE (Glottal Imaging Repository for Advanced Segmentation, Analysis, and Facilitative Playbacks Evaluation) is a data repository designed to facilitate the devel- opment of advanced techniques for the semantic segmentation, analysis, and fast evaluation of High-Speed videoendoscopic sequences of the vocal folds. The repository includes 65 high-speed videoendoscopic recordings from a cohort of 50 patients (30 female, 20 male). The dataset com- prises 15 recordings from healthy controls, 26 from patients with diagnosed voice disorders, and 24 with an unknown health condition. All of them were manually annotated by an expert, including the masks corresponding to the semantic segmentation of the glottal gap. The repository is also complemented with the automatic segmentation of the glottal area using different state-of-the-art approaches. This data set has already supported several studies, which demonstrates its usefulness for the development of new glottal gap segmentation algorithms from High-Speed-Videoendoscopic sequences to improve or create new Facilitative Playbacks. Despite these advances and others in the field, the broader challenge of performing an accurate and completely automatic semantic segmentation method of the glottal area remains open.
Collapse
Affiliation(s)
- Gustavo Andrade-Miranda
- Laboratoire de Traitement de l'Information Médicale (LaTIM), UMR 1101, INSERM, University of Brest, 29200, Brest, France
- EuroMov Digital Health in Motion, Université de Montpellier, IMT Mines Ales, Ales, France
| | - Konstantinos Chatzipapas
- Laboratoire de Traitement de l'Information Médicale (LaTIM), UMR 1101, INSERM, University of Brest, 29200, Brest, France
- 3dmi Research Group, Department of Medical Physics, School of Medicine, University of Patras, 26504, Rion, Greece
| | - Julián D. Arias-Londoño
- Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Av. Complutense, 30, 28040, Madrid, Spain
| | - Juan I. Godino-Llorente
- Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Av. Complutense, 30, 28040, Madrid, Spain
| |
Collapse
|
2
|
Kist AM, Gómez P, Dubrovskiy D, Schlegel P, Kunduk M, Echternach M, Patel R, Semmler M, Bohr C, Dürr S, Schützenberger A, Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1889-1903. [PMID: 34000199 DOI: 10.1044/2021_jslhr-20-00498] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Denis Dubrovskiy
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Germany
| | - Rita Patel
- Department of Speech, Language and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Christopher Bohr
- Klinik und Poliklinik für Hals-Nasen-Ohren-Heilkunde Universitätsklinikum Regensburg, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| |
Collapse
|
3
|
Schlegel P, Kist AM, Kunduk M, Dürr S, Döllinger M, Schützenberger A. Interdependencies between acoustic and high-speed videoendoscopy parameters. PLoS One 2021; 16:e0246136. [PMID: 33529244 PMCID: PMC7853476 DOI: 10.1371/journal.pone.0246136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≥ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California, United States of America
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
- * E-mail:
| | - Andreas M. Kist
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Melda Kunduk
- Dep. of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Stephan Dürr
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
4
|
Schlegel P, Kniesburges S, Dürr S, Schützenberger A, Döllinger M. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Sci Rep 2020; 10:10517. [PMID: 32601277 PMCID: PMC7324600 DOI: 10.1038/s41598-020-66405-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 11/13/2022] Open
Abstract
In voice research and clinical assessment, many objective parameters are in use. However, there is no commonly used set of parameters that reflect certain voice disorders, such as functional dysphonia (FD); i.e. disorders with no visible anatomical changes. Hence, 358 high-speed videoendoscopy (HSV) recordings (159 normal females (NF), 101 FD females (FDF), 66 normal males (NM), 32 FD males (FDM)) were analyzed. We investigated 91 quantitative HSV parameters towards their significance. First, 25 highly correlated parameters were discarded. Second, further 54 parameters were discarded by using a LogitBoost decision stumps approach. This yielded a subset of 12 parameters sufficient to reflect functional dysphonia. These parameters separated groups NF vs. FDF and NM vs. FDM with fair accuracy of 0.745 or 0.768, respectively. Parameters solely computed from the changing glottal area waveform (1D-function called GAW) between the vocal folds were less important than parameters describing the oscillation characteristics along the vocal folds (2D-function called Phonovibrogram). Regularity of GAW phases and peak shape, harmonic structure and Phonovibrogram-based vocal fold open and closing angles were mainly important. This study showed the high degree of redundancy of HSV-voice-parameters but also affirms the need of multidimensional based assessment of clinical data.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.
| | - Stefan Kniesburges
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stephan Dürr
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
5
|
Schlegel P, Kist AM, Semmler M, Döllinger M, Kunduk M, Dürr S, Schützenberger A. Determination of Clinical Parameters Sensitive to Functional Voice Disorders Applying Boosted Decision Stumps. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2020; 8:2100511. [PMID: 32518739 PMCID: PMC7274815 DOI: 10.1109/jtehm.2020.2985026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 02/21/2020] [Accepted: 03/28/2020] [Indexed: 12/30/2022]
Abstract
BACKGROUND Various voice assessment tools, such as questionnaires and aerodynamic voice characteristics, can be used to assess vocal function of individuals. However, not much is known about the best combinations of these parameters in identification of functional dysphonia in clinical settings. METHODS This study investigated six scores from clinically commonly used questionnaires and seven acoustic parameters. 514 females and 277 males were analyzed. The subjects were divided into three groups: one healthy group (N01) (49 females, 50 males) and two disordered groups with perceptually hoarse (FD23) (220 females, 96 males) and perceptually not hoarse (FD01) (245 females, 131 males) sounding voices. A tree stumps Adaboost approach was applied to find the subset of parameters that best separates the groups. Subsequently, it was determined if this parameter subset reflects treatment outcome for 120 female and 51 male patients by pairwise pre- and post-treatment comparisons of parameters. RESULTS The questionnaire "Voice-related-quality-of-Life" and three objective parameters ("maximum fundamental frequency", "maximum Intensity" and "Jitter Percent") were sufficient to separate the groups (accuracy ranging from 0.690 (FD01 vs. FD23, females) to 0.961 (N01 vs. FD23, females)). Our study suggests that a reduced parameter subset (4 out of 13) is sufficient to separate these three groups. All parameters reflected treatment outcome for patients with hoarse voices, Voice-related-quality-of-Life showed improvement for the not hoarse group (FD01). CONCLUSION Results show that single parameters are insufficient to separate voice disorders but a set of several well-chosen parameters is. These findings will help to optimize and reduce clinical assessment time.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Otorhinolaryngology Head and Neck SurgeryDivision of Phoniatrics and Pediatric AudiologyUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg91054ErlangenGermany
| | - Andreas M. Kist
- Department of Otorhinolaryngology Head and Neck SurgeryDivision of Phoniatrics and Pediatric AudiologyUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg91054ErlangenGermany
| | - Marion Semmler
- Department of Otorhinolaryngology Head and Neck SurgeryDivision of Phoniatrics and Pediatric AudiologyUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg91054ErlangenGermany
| | - Michael Döllinger
- Department of Otorhinolaryngology Head and Neck SurgeryDivision of Phoniatrics and Pediatric AudiologyUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg91054ErlangenGermany
| | - Melda Kunduk
- Department of Communication Sciences and DisordersLouisiana State UniversityBaton RougeLA70803USA
| | - Stephan Dürr
- Department of Otorhinolaryngology Head and Neck SurgeryDivision of Phoniatrics and Pediatric AudiologyUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg91054ErlangenGermany
| | - Anne Schützenberger
- Department of Otorhinolaryngology Head and Neck SurgeryDivision of Phoniatrics and Pediatric AudiologyUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg91054ErlangenGermany
| |
Collapse
|
6
|
Abstract
This review provides a comprehensive compilation, from a digital image processing point of view of the most important techniques currently developed to characterize and quantify the vibration behaviour of the vocal folds, along with a detailed description of the laryngeal image modalities currently used in the clinic. The review presents an overview of the most significant glottal-gap segmentation and facilitative playbacks techniques used in the literature for the mentioned purpose, and shows the drawbacks and challenges that still remain unsolved to develop robust vocal folds vibration function analysis tools based on digital image processing.
Collapse
|
7
|
Turkmen HI, Karsligil ME. Advanced computing solutions for analysis of laryngeal disorders. Med Biol Eng Comput 2019; 57:2535-2552. [DOI: 10.1007/s11517-019-02031-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 08/13/2019] [Indexed: 11/29/2022]
|
8
|
Sielska-Badurek EM, Jędra K, Sobol M, Niemczyk K, Osuch-Wójcikiewicz E. Laryngeal stroboscopy-Normative values for amplitude, open quotient, asymmetry and phase difference in young adults. Clin Otolaryngol 2018; 44:158-165. [PMID: 30353981 DOI: 10.1111/coa.13247] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2017] [Revised: 05/10/2018] [Accepted: 10/18/2018] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To provide the normative values for laryngeal stroboscopy (LS) concerning amplitude, open quotient, asymmetry and phase difference in healthy, young subjects. STUDY DESIGN Prospective case-control study. SETTING Patients treated at a single institute. METHODS A total of 68 healthy subjects were included in the study (35 women, 33 men), aged 18-35 years. After obtaining LS recordings, image processing was performed to attain parameters of vocal fold vibration. RESULTS In women, the location of the maximum vibration amplitude is approximately in the 1/3 posterior part of the glottis, while in men, the location is moved to the glottis centre. In males, the relative amplitude vibration of the vocal folds in the 1/3 anterior part of the glottis was significantly higher than in females (P = 0.029). Women showed significantly higher open quotients (OQ) at the posterior part of the glottis than the male subjects (P < 0.001) and men presented significantly higher OQ at the anterior part of the glottis than the females (P < 0.001). The average OQ values for both sexes were almost the same. Females showed significantly higher relative glottal gap area (P = 0.044). Women presented a significantly lower amplitude asymmetry than men (P = 0.002). The weighted absolute left-right phase difference reached up to 24° and remained insignificantly higher in the men than the women (P = 0.142). CONCLUSIONS The study provides normative values for LS in young adults for the measurement of therapy outcomes in patients with voice disorders and realisation of evidence-based medicine. The LS parametrisation is easy to perform in clinical practice.
Collapse
Affiliation(s)
| | - Katarzyna Jędra
- Department of Otolaryngology, Medical University of Warsaw, Warsaw, Poland
| | - Maria Sobol
- Department of Biophysics and Human Physiology, Medical University of Warsaw, Warsaw, Poland
| | - Kazimierz Niemczyk
- Department of Otolaryngology, Medical University of Warsaw, Warsaw, Poland
| | | |
Collapse
|
9
|
Semmler M, Döllinger M, Patel RR, Ziethe A, Schützenberger A. Clinical relevance of endoscopic three-dimensional imaging for quantitative assessment of phonation. Laryngoscope 2018. [DOI: 10.1002/lary.27165] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences; Indiana University; Bloomington Indiana U.S.A
| | - Anke Ziethe
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery; University Hospital Erlangen Medical School; Erlangen Germany
| |
Collapse
|
10
|
Glottal Gap tracking by a continuous background modeling using inpainting. Med Biol Eng Comput 2017; 55:2123-2141. [DOI: 10.1007/s11517-017-1652-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 04/19/2017] [Indexed: 11/26/2022]
|
11
|
Andrade-Miranda G, Henrich Bernardoni N, Godino-Llorente JI. Synthesizing the motion of the vocal folds using optical flow based techniques. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2017.01.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Aichinger P, Roesner I, Leonhard M, Schneider-Stickler B, Denk-Linnert DM, Bigenzahn W, Fuchs AK, Hagmüller M, Kubin G. Comparison of an audio-based and a video-based approach for detecting diplophonia. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2014.10.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Panek D, Skalski A, Zielinski T, Deliyski DD. Voice pathology classification based on High-Speed Videoendoscopy. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:735-8. [PMID: 26736367 DOI: 10.1109/embc.2015.7318467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This work presents a method for automatical and objective classification of patients with healthy and pathological vocal fold vibration impairments using High-Speed Videoendoscopy of the larynx. We used an image segmentation and extraction of a novel set of numerical parameters describing the spatio-temporal dynamics of vocal folds to classification according to the normal and pathological cases and achieved 73,3% cross-validation classification accuracy. This approach is promising to develop an automatic diagnosis tool of voice disorders.
Collapse
|
14
|
Unger J, Schuster M, Hecker DJ, Schick B, Lohscheller J. A generalized procedure for analyzing sustained and dynamic vocal fold vibrations from laryngeal high-speed videos using phonovibrograms. Artif Intell Med 2015; 66:15-28. [PMID: 26597002 DOI: 10.1016/j.artmed.2015.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 09/28/2015] [Accepted: 10/20/2015] [Indexed: 12/01/2022]
Abstract
OBJECTIVE This work presents a computer-based approach to analyze the two-dimensional vocal fold dynamics of endoscopic high-speed videos, and constitutes an extension and generalization of a previously proposed wavelet-based procedure. While most approaches aim for analyzing sustained phonation conditions, the proposed method allows for a clinically adequate analysis of both dynamic as well as sustained phonation paradigms. MATERIALS AND METHODS The analysis procedure is based on a spatio-temporal visualization technique, the phonovibrogram, that facilitates the documentation of the visible laryngeal dynamics. From the phonovibrogram, a low-dimensional set of features is computed using a principle component analysis strategy that quantifies the type of vibration patterns, irregularity, lateral symmetry and synchronicity, as a function of time. Two different test bench data sets are used to validate the approach: (I) 150 healthy and pathologic subjects examined during sustained phonation. (II) 20 healthy and pathologic subjects that were examined twice: during sustained phonation and a glissando from a low to a higher fundamental frequency. In order to assess the discriminative power of the extracted features, a Support Vector Machine is trained to distinguish between physiologic and pathologic vibrations. The results for sustained phonation sequences are compared to the previous approach. Finally, the classification performance of the stationary analyzing procedure is compared to the transient analysis of the glissando maneuver. RESULTS For the first test bench the proposed procedure outperformed the previous approach (proposed feature set: accuracy: 91.3%, sensitivity: 80%, specificity: 97%, previous approach: accuracy: 89.3%, sensitivity: 76%, specificity: 96%). Comparing the classification performance of the second test bench further corroborates that analyzing transient paradigms provides clear additional diagnostic value (glissando maneuver: accuracy: 90%, sensitivity: 100%, specificity: 80%, sustained phonation: accuracy: 75%, sensitivity: 80%, specificity: 70%). CONCLUSIONS The incorporation of parameters describing the temporal evolvement of vocal fold vibration clearly improves the automatic identification of pathologic vibration patterns. Furthermore, incorporating a dynamic phonation paradigm provides additional valuable information about the underlying laryngeal dynamics that cannot be derived from sustained conditions. The proposed generalized approach provides a better overall classification performance than the previous approach, and hence constitutes a new advantageous tool for an improved clinical diagnosis of voice disorders.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany.
| | - Maria Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, Marchioninistr. 13, 81366 München, Germany
| | - Dietmar J Hecker
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany
| |
Collapse
|
15
|
Andrade-Miranda G, Godino-Llorente JI, Moro-Velázquez L, Gómez-García JA. An automatic method to detect and track the glottal gap from high speed videoendoscopic images. Biomed Eng Online 2015; 14:100. [PMID: 26510707 PMCID: PMC4625946 DOI: 10.1186/s12938-015-0096-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/20/2015] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration. METHODS In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed. RESULTS Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds. CONCLUSIONS The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.
Collapse
Affiliation(s)
- Gustavo Andrade-Miranda
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, Madrid, Spain.
| | - Juan I Godino-Llorente
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, Madrid, Spain.
| | - Laureano Moro-Velázquez
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, Madrid, Spain.
| | - Jorge Andrés Gómez-García
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, Madrid, Spain.
| |
Collapse
|
16
|
Abstract
Every phonosurgical procedure alters endolaryngeal anatomy; be it by removing tissue, or injection or implantation of autologous or foreign material. However, the effect that an altered airflow cross section and changed soft tissue elasticity will have on the voice cannot be predicted. With the aim of promoting rational indications for phonosurgery, the current article explains the biomechanisms of the normal and the disordered voice, including the complex interdependence of tissue viscoelasticity, glottal airstream and sound production. According to European Laryngological Society (ELS) recommendations, five - not entirely mutually independent - evaluation criteria form the basis of indication assessments: self-rating (by the patient), proxy rating (by the physician), technical signal analysis (computerized), aerodynamics (spirometry) and vibration analysis (stroboscopy). The ELS evaluation standards agreed upon in 2001 enable indications and - by virtue of pre- and postoperative comparisons - therapeutic successes to be assessed. The 10-year-old ELS protocol has been updated by a real-time method for visualizing vocal fold vibrations: the phonovibrogram (PVG) has replaced stroboscopy. Independently of the morphological anatomic details of the larynx, PVG visualizes the symmetry and regularity of vocal fold motion, thus allowing preoperative estimation of tissue elasticity.
Collapse
Affiliation(s)
- U Eysholdt
- Abteilung für Phoniatrie und Pädaudiologie, Universitätsklinikum Erlangen, Bohlenplatz 21, 91054, Erlangen, Deutschland,
| |
Collapse
|
17
|
Correlation between the quantitative video laryngostroboscopic measurements and parameters of multidimensional voice assessment. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2014.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
18
|
Gonçalves VM, Delamaro ME, Nunes FDLDS. A systematic review on the evaluation and characteristics of computer-aided diagnosis systems. ACTA ACUST UNITED AC 2014. [DOI: 10.1590/1517-3151.0517] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
19
|
Unger J, Lohscheller J, Reiter M, Eder K, Betz CS, Schuster M. A Noninvasive Procedure for Early-Stage Discrimination of Malignant and Precancerous Vocal Fold Lesions Based on Laryngeal Dynamics Analysis. Cancer Res 2014; 75:31-9. [DOI: 10.1158/0008-5472.can-14-1458] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
20
|
Bohr C, Kräck A, Dubrovskiy D, Eysholdt U, Svec J, Psychogios G, Ziethe A, Döllinger M. Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1148-1161. [PMID: 24686496 DOI: 10.1044/2014_jslhr-s-12-0076] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE The aim of this study was to identify parameters that would differentiate healthy from pathological organic-based vocal fold vibrations to emphasize clinical usefulness of high-speed imaging. METHOD Fifty-five men (M age = 36 years, SD = 20 years) were examined and separated into 4 groups: 1 healthy (26 individuals) and 3 pathological (10 individuals with contact granuloma, 12 with polyps, and 7 with cysts). Vocal fold vibrations were recorded using a high-speed camera during sustained phonation. Twenty objective glottal area waveform and 24 phonovibrogram parameters representing spatiotemporal characteristics were analyzed. Statistical group comparisons were performed to document spatiotemporal changes for organic lesions that cannot be determined visually. To look for specific pattern profiles within organic lesions, the authors performed linear discriminant analysis. RESULTS Thirteen parameters showed significant differences between the healthy group and at least 1 pathological group. The differences occurred more in temporal than in spatial parameters. Contact granuloma showed the fewest statistical differences (3 parameters), followed by cysts (9 parameters), and polyps (10 parameters). Linear discriminant analysis achieved accuracy performance of 76% (all groups separated) and 82% (healthy vs. pathological). CONCLUSION The results suggest that for males, the differences between healthy voices and organic voice disorders may be more pronounced within temporal characteristics that cannot be visually detected without high-speed imaging.
Collapse
|
21
|
Echternach M, Dippold S, Richter B. High-speed imaging using rigid laryngoscopy for the analysis of register transitions in professional operatic tenors. LOGOP PHONIATR VOCO 2014; 41:1-8. [DOI: 10.3109/14015439.2014.936499] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
22
|
Unger J, Hecker DJ, Kunduk M, Schuster M, Schick B, Lohscheller J. Quantifying spatiotemporal properties of vocal fold dynamics based on a multiscale analysis of phonovibrograms. IEEE Trans Biomed Eng 2014; 61:2422-33. [PMID: 24771562 DOI: 10.1109/tbme.2014.2318774] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to objectively assess the laryngeal vibratory behavior, endoscopic high-speed cameras capture several thousand frames per second of the vocal folds during phonation. However, judging all inherent clinically relevant features is a challenging task and requires well-founded expert knowledge. In this study, an automated wavelet-based analysis of laryngeal high-speed videos based on phonovibrograms is presented. The phonovibrogram is an image representation of the spatiotemporal pattern of vocal fold vibration and constitutes the basis for a computer-based analysis of laryngeal dynamics. The features extracted from the wavelet transform are shown to be closely related to a basic set of video-based measurements categorized by the European Laryngological Society for a subjective assessment of pathologic voices. The wavelet-based analysis further offers information about irregularity and lateral asymmetry and asynchrony. It is demonstrated in healthy and pathologic subjects as well as for a surgical group that was examined before and after the removal of a vocal fold polyp. The features were found to not only classify glottal closure characteristics but also quantify the impact of pathologies on the vibratory behavior. The interpretability and the discriminative power of the proposed feature set show promising relevance for a computer-assisted diagnosis and classification of voice disorders.
Collapse
|
23
|
Moisik SR, Esling JH. Modeling the biomechanical influence of epilaryngeal stricture on the vocal folds: a low-dimensional model of vocal-ventricular fold coupling. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:S687-S704. [PMID: 24687007 DOI: 10.1044/2014_jslhr-s-12-0279] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE Physiological and phonetic studies suggest that, at moderate levels of epilaryngeal stricture, the ventricular folds impinge upon the vocal folds and influence their dynamical behavior, which is thought to be responsible for constricted laryngeal sounds. In this work, the authors examine this hypothesis through biomechanical modeling. METHOD The dynamical response of a low-dimensional, lumped-element model of the vocal folds under the influence of vocal-ventricular fold coupling was evaluated. The model was assessed for F0 and cover-mass phase difference. Case studies of simulations of different constricted phonation types and of glottal stop illustrate various additional aspects of model performance. RESULTS Simulated vocal-ventricular fold coupling lowers F0 and perturbs the mucosal wave. It also appears to reinforce irregular patterns of oscillation, and it can enhance laryngeal closure in glottal stop production. CONCLUSION The effects of simulated vocal-ventricular fold coupling are consistent with sounds, such as creaky voice, harsh voice, and glottal stop, that have been observed to involve epilaryngeal stricture and apparent contact between the vocal folds and ventricular folds. This supports the view that vocal-ventricular fold coupling is important in the vibratory dynamics of such sounds and, furthermore, suggests that these sounds may intrinsically require epilaryngeal stricture.
Collapse
|
24
|
Unger J, Schuster M, Hecker DJ, Schick B, Lohscheller J. A multiscale product approach for an automatic classification of voice disorders from endoscopic high-speed videos. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:7360-3. [PMID: 24111445 DOI: 10.1109/embc.2013.6611258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Direct observation of vocal fold vibration is indispensable for a clinical diagnosis of voice disorders. Among current imaging techniques, high-speed videoendoscopy constitutes a state-of-the-art method capturing several thousand frames per second of the vocal folds during phonation. Recently, a method for extracting descriptive features from phonovibrograms, a two-dimensional image containing the spatio-temporal pattern of vocal fold dynamics, was presented. The derived features are closely related to a clinically established protocol for functional assessment of pathologic voices. The discriminative power of these features for different pathologic findings and configurations has not been assessed yet. In the current study, a collective of 220 subjects is considered for two- and multi-class problems of healthy and pathologic findings. The performance of the proposed feature set is compared to conventional feature reduction routines and was found to clearly outperform these. As such, the proposed procedure shows great potential for diagnostical issues of vocal fold disorders.
Collapse
|
25
|
Abstract
PURPOSE OF REVIEW Kymographic imaging is a modern method for displaying and evaluating vibratory behaviour of the vocal folds which is crucial for voice production. This review summarizes the state of the art of this method, and focuses on the progress in this area within the last 5 years. RECENT FINDINGS Videokymography, using a special videocamera, offers high-speed (video)kymographic images in real time, which is advantageous in daily clinical practice. Two other methods use software to create kymograms retrospectively: digital kymography processes high-speed videolaryngoscopic recordings and offers numerous research possibilities, whereas strobovideokymography processes videostroboscopic recordings, and its use is limited to regular vibration patterns. Current studies reveal that high-speed kymographic images allow more reliable visual evaluation of vibrations than by watching video recordings. Image analysis procedures have been advanced to quantify the vibration properties of the vocal folds. New information has been obtained on asymmetry, mucosal waves, irregularities, phonation onset, and nonlinear dynamic phenomena in voice disorders, as well as in singing. SUMMARY High-speed kymography visualizes vibratory features which are not simply observable via traditional methods. It shows large potential in better understanding the functional origin of hoarseness and unsteady phonatory states. Further research in this area is envisioned.
Collapse
|
26
|
Bohr C, Kraeck A, Eysholdt U, Ziethe A, Döllinger M. Quantitative analysis of organic vocal fold pathologies in females by high-speed endoscopy. Laryngoscope 2013; 123:1686-93. [PMID: 23649746 DOI: 10.1002/lary.23783] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Revised: 08/23/2012] [Accepted: 09/18/2012] [Indexed: 11/11/2022]
Abstract
OBJECTIVES/HYPOTHESIS Quantitative analysis of endoscopic high-speed video recordings of vocal fold vibrations has been growing in importance in recent years. The videos have mainly been analyzed using subjective evaluation, but this is examiner dependent, and the results show inadequate interobserver agreement. The aims of this study were therefore to identify appropriate objective parameters for analyzing high-speed recordings to differentiate healthy voice production from organic disorders. STUDY DESIGN METHODS A total of 152 females were examined, divided into 77 healthy and 75 with four different pathological conditions: laryngeal epithelial thickening, Reinke edema, vocal fold polyps, and vocal fold cysts. Vocal fold vibrations were recorded with a high-speed camera (4,000 Hz, 256 × 256 pixels) during sustained phonation. Parameters computed from the glottal area waveform (GAW) and from phonovibrogram (PVG) were analyzed. Multiparametric linear discriminant analysis was performed to classify pathological conditions versus the healthy group. RESULTS Twenty of 44 parameters were identified that are capable of distinguishing between the individual types of pathology. PVG parameters showed better performance than GAW parameters. Parameters representing vibrational periodicity via standard deviation showed better performance than absolute parameters. In addition, linear discriminant analysis achieved reliable differentiation between healthy and pathological vocal fold vibrations: 72% for the five-class problem (all groups separately) and 88% for the two-class problem (healthy vs. all pathologies taken as one class). CONCLUSIONS The study succeeded in defining objective parameters for analyzing endoscopic high-speed videos and suggesting first parameters for differentiation between healthy dynamics and dynamics of organic pathologies.
Collapse
Affiliation(s)
- Christopher Bohr
- Department of Otorhinolaryngology, Head and Neck Surgery, Erlangen University Hospital, Erlangen, Germany.
| | | | | | | | | |
Collapse
|
27
|
Unger J, Meyer T, Herbst CT, Fitch WTS, Döllinger M, Lohscheller J. Phonovibrographic wavegrams: visualizing vocal fold kinematics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:1055-1064. [PMID: 23363121 DOI: 10.1121/1.4774378] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Recently, endoscopic high-speed laryngoscopy has been established for commercial use as a state-of-the-art technique to examine vocal fold kinematics. Since modern cameras provide sampling rates of several thousand frames per second, a high volume of data has to be considered for visual and objective analysis. A method for visualizing endoscopic high speed videos in three-dimensional cycle-based graphs combining and extending the approaches of phonovibrograms and electroglottographic wavegrams is presented. To build a phonovibrographic wavegram, individual cycles of a phonovibrogram are segmented, normalized in cycle duration, and concatenated over time. For analyzing purposes, the emerging three-dimensional scalar field is visualized with different rendering techniques providing information of different aspects of vocal fold kinematics. The phonovibrographic wavegram incorporates information about the glottal closure type, size, and location of the amplitudes, symmetry, periodicity, and phase information. The potential of the approach to visualize the characteristics of vocal fold vibration in a compact and intuitive way is demonstrated within two healthy and three pathologic subjects. The phonovibrographic wavegram allows a comprehensive analysis of vocal fold kinematics and reveals information that remains hidden with other visualization techniques.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, University of Applied Science Trier, Schneidershof, 54293 Trier, Germany.
| | | | | | | | | | | |
Collapse
|
28
|
Unger J, Meyer T, Doellinger M, Hecker DJ, Schick B, Lohscheller J. A wavelet-based approach for a continuous analysis of phonovibrograms. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2012:4410-3. [PMID: 23366905 DOI: 10.1109/embc.2012.6346944] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recently, endoscopic high-speed laryngoscopy has been established for commercial use and constitutes a state-of-the-art technique to examine vocal fold dynamics. Despite overcoming many limitations of commonly applied stroboscopy it has not gained widespread clinical application, yet. A major drawback is a missing methodology of extracting valuable features to support visual assessment or computer-aided diagnosis. In this paper a compact and descriptive feature set is presented. The feature extraction routines are based on two-dimensional color graphs called phonovibrograms (PVG). These graphs contain the full spatio-temporal pattern of vocal fold dynamics and are therefore suited to derive features that comprehensively describe the vibration pattern of vocal folds. Within our approach, clinically relevant features such as glottal closure type, symmetry and periodicity are quantified in a set of 10 descriptive features. The suitability for classification tasks is shown using a clinical data set comprising 50 healthy and 50 paralytic subjects. A classification accuracy of 93.2% has been achieved.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, University of Applied Science Trier, Trier, Germany
| | | | | | | | | | | |
Collapse
|
29
|
Döllinger M, Dubrovskiy D, Patel R. Spatiotemporal analysis of vocal fold vibrations between children and adults. Laryngoscope 2012; 122:2511-8. [PMID: 22965771 DOI: 10.1002/lary.23568] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/11/2012] [Indexed: 11/10/2022]
Abstract
OBJECTIVES/HYPOTHESIS Aim of the study is to quantify differences in spatiotemporal features of vibratory motion in typically developing prepubertal children and adults with use of high speed digital imaging. STUDY DESIGN Prospective case-control study. METHODS Vocal fold oscillations of 31 children and 35 adults were analyzed. Endoscopic high-speed imaging was performed during sustained phonation at typical pitch and loudness. Quantitative technique of Phonovibrogram was used to compute spatiotemporal features. Spatial features are represented by opening and closing angles along the anterior and posterior parts of the vocal folds, as well as by left-right symmetry ratio. Temporal features are represented by the cycle-to-cycle variability of the spatial features. Group differences (adult females, adult males, and children) were statistically investigated. RESULTS Statistical differences were more pronounced in the temporal behavior compared to the spatial behavior. Children demonstrated greater cycle-to-cycle variability in oscillations compared to adults. Most differences between children and adults were found for temporal characteristics along the anterior parts during closing phase. The spatiotemporal features differed more between children and males than between children and females. Both adults and children showed equally high left-right symmetry. CONCLUSIONS Results suggest a more unstable phonation in children than in adults, yielding increased perturbation in periodicity. Children demonstrated longer phase delay in the anterior/posterior and medio-lateral parts during the opening phase compared to adults. The data presented may provide the bases for differentiating normal vibratory characteristics from the disordered in the pediatric population, and eventually assist in aiding the clinical utility of high speed imaging.
Collapse
Affiliation(s)
- Michael Döllinger
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Erlangen, Germany.
| | | | | |
Collapse
|
30
|
Kunduk M, Döllinger M, McWhorter AJ, Švec JG, Lohscheller J. Vocal Fold Vibratory Behavior Changes following Surgical Treatment of Polyps Investigated with High-Speed Videoendoscopy and Phonovibrography. Ann Otol Rhinol Laryngol 2012; 121:355-63. [DOI: 10.1177/000348941212100601] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Objectives: The goal of this study was to objectively quantify the changes in vocal fold vibratory characteristics before and after surgery with high-speed videoendoscopy and the image analysis tool phonovibrography. Methods: High-speed videoendoscopic data, audio recordings, and Voice Handicap Index scores were collected from 8 subjects with a diagnosis of unilateral vocal fold polyps, before operation and at 1 week and 1 to 3 months after operation. We then analyzed the objective phonovibrographic patterns and parameters describing the vocal fold vibratory behavior. Results: On phonovibrography, the visual representations of the vocal fold vibratory characteristics, from both the individual and the group data, demonstrated very different patterns before surgery and both 1 week and 1 to 3 months after surgery. The individual phonovibrograms obtained from the left and right true vocal folds clearly demonstrated the lesion site and its effects on the vocal fold vibratory characteristics for each subject. The improvements in amplitude and symmetry (relative vibratory amplitude and vibration amplitude symmetry) of vocal fold vibration were quantified; the difference was greatest between data from before surgery and data from 1 week after surgery. Conclusions: The visual phonovibrographic patterns and quantitative data revealed marked changes in vocal fold vibratory patterns after operation and continued improvement at 1 to 3 months.
Collapse
|
31
|
Watts CR, Awan SN. Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1525-1537. [PMID: 22180020 DOI: 10.1044/1092-4388(2011/10-0209)] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE In this study, the authors evaluated the diagnostic value of spectral/cepstral measures to differentiate dysphonic from nondysphonic voices using sustained vowels and continuous speech samples. METHODOLOGY Thirty-two age- and gender-matched individuals (16 participants with dysphonia and 16 controls) were recorded reading a standard passage (The Rainbow Passage; Fairbanks, 1960) and sustaining the vowel /α/. Recorded voices were analyzed with custom software that calculated 4 spectral/cepstral measures. RESULTS Measures of cepstral peak prominence (CPP) and low-high spectral ratio (L/H ratio) were significantly different between groups in both speaking conditions; the standard deviation of the CPP was significantly different between groups in continuous speech only. In differentiating dysphonic individuals with a hypofunctional etiology from nondysphonic individuals, receiver operating characteristic (ROC) analyses demonstrated (a) high sensitivity and high specificity for the CPP in the sustained vowel condition and (b) high sensitivity and moderate specificity for the CPP in the speech condition. CONCLUSIONS In a sample of dysphonic speakers (hypofunctional etiologies) versus typical speakers, spectral/cepstral measures of CPP and L/H ratio were able to differentiate these groups from one another in both vowel prolongation and continuous speech contexts with high sensitivity and specificity. The results of this study support the growing body of literature documenting the significant value of cepstral and other spectral-based acoustic measures to the clinical evaluation and management processes.
Collapse
|
32
|
Current World Literature. Curr Opin Otolaryngol Head Neck Surg 2011; 19:229-30. [DOI: 10.1097/moo.0b013e328347afd0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
33
|
Voigt D, Döllinger M, Yang A, Eysholdt U, Lohscheller J. Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2010; 99:275-288. [PMID: 20138386 DOI: 10.1016/j.cmpb.2010.01.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Revised: 11/02/2009] [Accepted: 01/09/2010] [Indexed: 05/28/2023]
Abstract
The clinical diagnosis of voice disorders is based on examination of the rapidly moving vocal folds during phonation (f0: 80-300Hz) with state-of-the-art endoscopic high-speed cameras. Commonly, analysis is performed in a subjective and time-consuming manner via slow-motion video playback and exhibits low inter- and intra-rater reliability. In this study an objective method to overcome this drawback is presented being based on Phonovibrography, a novel image analysis technique. For a collective of 45 normophonic and paralytic voices the laryngeal dynamics were captured by specialized Phonovibrogram features and analyzed with different machine learning algorithms. Classification accuracies reached 93% for 2-class and 73% for 3-class discrimination. The results were validated by subjective expert ratings given the same diagnostic criteria. The automatic Phonovibrogram analysis approach exceeded the experienced raters' classifications by 9%. The presented method holds a lot of potential for providing reliable vocal fold diagnosis support in the future.
Collapse
Affiliation(s)
- Daniel Voigt
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Bohlenplatz 21, D-91054 Erlangen, Germany.
| | | | | | | | | |
Collapse
|