Lee JH, Seok J, Kim JY, Kim HC, Kwon TK. Evaluating the Diagnostic Potential of Connected Speech for Benign Laryngeal Disease Using Deep Learning Analysis.
J Voice 2024:S0892-1997(24)00018-3. [PMID:
38350806 DOI:
10.1016/j.jvoice.2024.01.015]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/26/2024] [Accepted: 01/26/2024] [Indexed: 02/15/2024]
Abstract
OBJECTIVES
This study aimed to evaluate the performance of artificial intelligence (AI) models using connected speech and vowel sounds in detecting benign laryngeal diseases.
STUDY DESIGN
Retrospective.
METHODS
Voice samples from 772 patients, including 502 with normal voices and 270 with vocal cord polyps, cysts, or nodules, were analyzed. We employed deep learning architectures, including convolutional neural networks (CNNs) and time series models, to process the speech data. The primary endpoint was the area under the receiver's operating characteristic curve for binary classification.
RESULTS
CNN models analyzing speech segments significantly outperformed those using vowel sounds in distinguishing patients with and without benign laryngeal diseases. The best-performing CNN model achieved areas under the receiver operating characteristic curve of 0.895 and 0.845 for speech and vowel sounds, respectively. Correlations between AI-generated disease probabilities and perceptual assessments were more pronounced in the connected-speech analyses. However, the time series models performed worse than the CNNs.
CONCLUSION
Connected speech analysis is more effective than traditional vowel sound analysis for the diagnosis of laryngeal voice disorders. This study highlights the potential of AI technologies in enhancing the diagnostic capabilities of speech, advocating further exploration, and validation in this field.
Collapse