1
|
Alter IL, Dias C, Briano J, Rameau A. Digital health technologies in swallowing care from screening to rehabilitation: A narrative review. Auris Nasus Larynx 2025; 52:319-326. [PMID: 40403345 DOI: 10.1016/j.anl.2025.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2025] [Revised: 05/14/2025] [Accepted: 05/16/2025] [Indexed: 05/24/2025]
Abstract
OBJECTIVES Digital health technologies (DHTs) have rapidly advanced in the past two decades, through developments in mobile and wearable devices and most recently with the explosion of artificial intelligence (AI) capabilities and subsequent extension into the health space. DHT has myriad potential applications to deglutology, many of which have undergone promising investigations and developments in recent years. We present the first literature review on applications of DHT in swallowing health, from screening to therapeutics. Public health interventions for swallowing care are increasingly needed in the setting of aging populations in the West and East Asia, and DHT may offer a scalable and low-cost solution. METHODS A narrative review was performed using PubMed and Google Scholar to identify recent research on applications of AI and digital health in swallow practice. Database searches, conducted in September 2024, included terms such as "digital," "AI," "machine learning," "tools" in combination with "deglutition," "Otolaryngology," "Head and Neck," "speech language pathology," "swallow," and "dysphagia." Primary literature pertaining to digital health in deglutology was included for review. RESULTS We review the various applications of DHT in swallowing care, including prevention, screening, diagnosis, treatment planning and rehabilitation. CONCLUSION DHT may offer innovative and scalable solutions for swallowing care as public health needs grow and in the setting of limited specialized healthcare workforce. These technological advances are also being explored as time and resource saving solutions at many points of care in swallow practice. DHT could bring affordable and accurate information for self-management of dysphagia to broader patient populations that otherwise lack access to expert providers.
Collapse
Affiliation(s)
- Isaac L Alter
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, NY, NY 10022, USA
| | - Carla Dias
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, NY, NY 10022, USA
| | - Jack Briano
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, NY, NY 10022, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, NY, NY 10022, USA.
| |
Collapse
|
2
|
Sarmet M, Kaczmarek E, Fauveau A, Steer K, Velasco AA, Smith A, Kennedy M, Shideler H, Wallace S, Stroud T, Blilie M, Mayerl CJ. A Machine Learning Pipeline for Automated Bolus Segmentation and Area Measurement in Swallowing Videofluoroscopy Images of an Infant Pig Model. Dysphagia 2025:10.1007/s00455-025-10829-z. [PMID: 40293507 DOI: 10.1007/s00455-025-10829-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 04/09/2025] [Indexed: 04/30/2025]
Abstract
Feeding efficiency and safety are often driven by bolus volume, which is one of the most common clinical measures of assessing swallow performance. However, manual measurement of bolus area is time-consuming and suffers from high levels of inter-rater variability. This study proposes a machine learning (ML) pipeline using ilastik, an accessible bioimage analysis tool, to automate the measurement of bolus area during swallowing. The pipeline was tested on 336 swallows from videofluoroscopic recordings of 8 infant pigs during bottle feeding. Eight trained raters manually measured bolus area in ImageJ and also used ilastik's autocontext pixel-level labeling and object classification tools to train ML models for automated bolus segmentation and area calculation. The ML pipeline trained in 1h42min and processed the dataset in 2 min 48s, a 97% time saving compared to manual methods. The model exhibited strong performance, achieving a high Dice Similarity Coefficient (0.84), Intersection over Union (0.76), and inter-rater reliability (intraclass correlation coefficient = 0.79). The bolus areas from the two methods were highly correlated (R² = 0.74 overall, 0.78 without bubbles, 0.67 with bubbles), with no significant difference in measured bolus area between the methods. Our ML pipeline, requiring no ML expertise, offers a reliable and efficient method for automatically measuring bolus area. While human confirmation remains valuable, this pipeline accelerates analysis and improves reproducibility compared to manual methods. Future refinements can further enhance precision and broaden its application in dysphagia research.
Collapse
Affiliation(s)
- Max Sarmet
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA.
- Graduate Department of Health Science and Technology, University of Brasilia, Brasilia, 70910-900, Brazil.
| | - Elska Kaczmarek
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Alexane Fauveau
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Kendall Steer
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Alex-Ann Velasco
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Ani Smith
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Maressa Kennedy
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Hannah Shideler
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Skyler Wallace
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Thomas Stroud
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Morgan Blilie
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Christopher J Mayerl
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| |
Collapse
|
3
|
Cubero L, Tessier C, Castelli J, Robert K, de Crevoisier R, Jégoux F, Pascau J, Acosta O. Automated dysphagia characterization in head and neck cancer patients using videofluoroscopic swallowing studies. Comput Biol Med 2025; 187:109759. [PMID: 39914196 DOI: 10.1016/j.compbiomed.2025.109759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 01/24/2025] [Accepted: 01/27/2025] [Indexed: 02/21/2025]
Abstract
BACKGROUND Dysphagia is one of the most common toxicities following head and neck cancer (HNC) radiotherapy (RT). Videofluoroscopic Swallowing Studies (VFSS) are the gold standard for diagnosing and assessing dysphagia, but current evaluation methods are manual, subjective, and time-consuming. This study introduces a novel framework for the automated analysis of VFSS to characterize dysphagia in HNC patients. METHOD The proposed methodology integrates three key steps: (i) a deep learning-based labeling framework, trained iteratively to identify ten regions of interest; (ii) extraction of 23 swallowing dynamic parameters, followed by comparison across diverse cohorts; and (iii) machine learning (ML) classification of the extracted parameters into four dysphagia-related impairments. RESULTS The labeling framework achieved high accuracy, with a mean error of 1.6 pixels across the ten regions of interest in an independent test dataset. Analysis of the extracted parameters revealed significant differences in swallowing dynamics between healthy individuals, HNC patients before and after RT, and patients with non-HNC-related dysphagia. The ML classifiers achieved accuracies ranging from 0.60 to 0.87 for the four dysphagia-related impairments. CONCLUSIONS Despite challenges related to dataset size and VFSS variability, our framework demonstrates substantial potential for automatically identifying ten regions of interest and four dysphagia-related impairments from VFSS. This work sets the foundation for future research aimed at refining dysphagia analysis and characterization using VFSS, particularly in the context of HNC RT.
Collapse
Affiliation(s)
- Lucía Cubero
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France; Departamento de Bioingeniería, Universidad Carlos III de Madrid, Madrid, Spain.
| | - Christophe Tessier
- Service d'ORL et Chirurgie Maxillo-Faciale, CHU Pontchaillou, Université Rennes, 35033, Rennes, France
| | - Joël Castelli
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France
| | - Kilian Robert
- Service d'ORL et Chirurgie Maxillo-Faciale, CHU Pontchaillou, Université Rennes, 35033, Rennes, France
| | - Renaud de Crevoisier
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France
| | - Franck Jégoux
- Service d'ORL et Chirurgie Maxillo-Faciale, CHU Pontchaillou, Université Rennes, 35033, Rennes, France
| | - Javier Pascau
- Departamento de Bioingeniería, Universidad Carlos III de Madrid, Madrid, Spain; Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain.
| | - Oscar Acosta
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France
| |
Collapse
|
4
|
Torborg SR, Kim AYE, Rameau A. New developments in the application of artificial intelligence to laryngology. Curr Opin Otolaryngol Head Neck Surg 2024; 32:391-397. [PMID: 39146248 PMCID: PMC11613154 DOI: 10.1097/moo.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
PURPOSE OF REVIEW The purpose of this review is to summarize the existing literature on artificial intelligence technology utilization in laryngology, highlighting recent advances and current barriers to implementation. RECENT FINDINGS The volume of publications studying applications of artificial intelligence in laryngology has rapidly increased, demonstrating a strong interest in utilizing this technology. Vocal biomarkers for disease screening, deep learning analysis of videolaryngoscopy for lesion identification, and auto-segmentation of videofluoroscopy for detection of aspiration are a few of the new ways in which artificial intelligence is poised to transform clinical care in laryngology. Increasing collaboration is ongoing to establish guidelines and standards for the field to ensure generalizability. SUMMARY Artificial intelligence tools have the potential to greatly advance laryngology care by creating novel screening methods, improving how data-heavy diagnostics of laryngology are analyzed, and standardizing outcome measures. However, physician and patient trust in artificial intelligence must improve for the technology to be successfully implemented. Additionally, most existing studies lack large and diverse datasets, external validation, and consistent ground-truth references necessary to produce generalizable results. Collaborative, large-scale studies will fuel technological innovation and bring artificial intelligence to the forefront of patient care in laryngology.
Collapse
Affiliation(s)
- Stefan R. Torborg
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine
- Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program, New York, New York, USA
| | - Ashley Yeo Eun Kim
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine
| |
Collapse
|
5
|
Li W, Mao S, Mahoney AS, Coyle JL, Sejdić E. Automatic Tracking of Hyoid Bone Displacement and Rotation Relative to Cervical Vertebrae in Videofluoroscopic Swallow Studies Using Deep Learning. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1922-1932. [PMID: 38383805 PMCID: PMC11300761 DOI: 10.1007/s10278-024-01039-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/17/2024] [Accepted: 02/01/2024] [Indexed: 02/23/2024]
Abstract
The hyoid bone displacement and rotation are critical kinematic events of the swallowing process in the assessment of videofluoroscopic swallow studies (VFSS). However, the quantitative analysis of such events requires frame-by-frame manual annotation, which is labor-intensive and time-consuming. Our work aims to develop a method of automatically tracking hyoid bone displacement and rotation in VFSS. We proposed a full high-resolution network, a deep learning architecture, to detect the anterior and posterior of the hyoid bone to identify its location and rotation. Meanwhile, the anterior-inferior corners of the C2 and C4 vertebrae were detected simultaneously to automatically establish a new coordinate system and eliminate the effect of posture change. The proposed model was developed by 59,468 VFSS frames collected from 1488 swallowing samples, and it achieved an average landmark localization error of 2.38 pixels (around 0.5% of the image with 448 × 448 pixels) and an average angle prediction error of 0.065 radians in predicting C2-C4 and hyoid bone angles. In addition, the displacement of the hyoid bone center was automatically tracked on a frame-by-frame analysis, achieving an average mean absolute error of 2.22 pixels and 2.78 pixels in the x-axis and y-axis, respectively. The results of this study support the effectiveness and accuracy of the proposed method in detecting hyoid bone displacement and rotation. Our study provided an automatic method of analyzing hyoid bone kinematics during VFSS, which could contribute to early diagnosis and effective disease management.
Collapse
Affiliation(s)
- Wuqi Li
- Edward S. Rogers Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Shitong Mao
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amanda S Mahoney
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - James L Coyle
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ervin Sejdić
- Edward S. Rogers Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada.
- North York General Hospital, Toronto, ON, Canada.
| |
Collapse
|
6
|
Park D, Kim Y, Kang H, Lee J, Choi J, Kim T, Lee S, Son S, Kim M, Kim I. PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference. Comput Biol Med 2024; 172:108241. [PMID: 38489987 DOI: 10.1016/j.compbiomed.2024.108241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/30/2024] [Accepted: 02/27/2024] [Indexed: 03/17/2024]
Abstract
Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN). PEN enhances the sharpness and contrast of the VFSS image by combining multiple preprocessing algorithms in a learnable way. CIN reduces ambiguity in bolus segmentation by using context from other regions through cascaded inference. Moreover, CIN prevents undesirable side effects from unreliably segmented regions by referring to the context in an asymmetric way. In experiments, PECI-Net exhibited higher performance than four recently developed baseline models, outperforming TernausNet, the best among the baseline models, by 4.54% and the widely used UNet by 10.83%. The results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.
Collapse
Affiliation(s)
- Dougho Park
- Pohang Stroke and Spine Hospital, Pohang, Republic of Korea; School of Convergence Science and Technology, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Younghun Kim
- School of CSEE, Handong Global University, Pohang, Republic of Korea
| | - Harim Kang
- School of CSEE, Handong Global University, Pohang, Republic of Korea
| | - Junmyeoung Lee
- School of CSEE, Handong Global University, Pohang, Republic of Korea
| | - Jinyoung Choi
- School of CSEE, Handong Global University, Pohang, Republic of Korea
| | - Taeyeon Kim
- Pohang Stroke and Spine Hospital, Pohang, Republic of Korea
| | - Sangeok Lee
- Pohang Stroke and Spine Hospital, Pohang, Republic of Korea
| | - Seokil Son
- Pohang Stroke and Spine Hospital, Pohang, Republic of Korea
| | - Minsol Kim
- Pohang Stroke and Spine Hospital, Pohang, Republic of Korea
| | - Injung Kim
- School of CSEE, Handong Global University, Pohang, Republic of Korea.
| |
Collapse
|