1
|
Chang D. Vocal performance evaluation of the intelligent note recognition method based on deep learning. Sci Rep 2025; 15:13927. [PMID: 40263420 PMCID: PMC12015219 DOI: 10.1038/s41598-025-99357-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 04/18/2025] [Indexed: 04/24/2025] Open
Abstract
This study aims to optimize the ability of note recognition and improve the accuracy of vocal performance evaluation. Firstly, the basic theory of music is analyzed. Secondly, the convolutional neural network (CNN) in deep learning (DL) is selected to integrate gated recurrent units for optimization. Moreover, the attention mechanism is added to the optimized model to implement an intelligent note recognition model, and the results of note recognition are compared with those of common models. Finally, according to the results of audio signal classification, a vocal performance evaluation model optimized based on the attention mechanism is constructed. The accuracy of the model under different feature inputs is compared. The results indicate that different models show obvious differences in F-value, accuracy, precision, and recall. The attention mechanism-gated recurrent convolutional neural network (A-GRCNN) model performs best on all indicators. Specifically, this model's accuracy, recall, F-value, and precision reach 0.961, 0.958, 0.963, and 0.970. The incorporation of multiple feature inputs can remarkably enhance the accuracy of vocal performance evaluation, especially the combination of constant Q Transform features, which is the most outstanding. This study improves the accuracy and reliability of music information processing, promotes the application of DL technology in music, and contributes to optimizing vocal performance evaluation.
Collapse
Affiliation(s)
- Dongyun Chang
- School of Music, Qinghai Normal University, Xining, China.
| |
Collapse
|
2
|
Wei R, Liang Y, Geng L, Wang W, Wei M. A non-local dual-stream fusion network for laryngoscope recognition. Am J Otolaryngol 2025; 46:104565. [PMID: 39729791 DOI: 10.1016/j.amjoto.2024.104565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 12/15/2024] [Indexed: 12/29/2024]
Abstract
PURPOSE To use deep learning technology to design and implement a model that can automatically classify laryngoscope images and assist doctors in diagnosing laryngeal diseases. MATERIALS AND METHODS The experiment was based on 3057 images (normal, glottic cancer, granuloma, Reinke's Edema, vocal cord cyst, leukoplakia, nodules and polyps) from the dataset Laryngoscope8. A classification model based on deep neural networks was developed and tested. Model performance was verified by a variety of evaluation measures, including accuracy, recall, specificity, F1-Score and area under the receiver operating characteristic curve. In addition, the Grad-Cam technology was used to visualize the feature map of the model to improve the interpretation of the network. RESULTS The model has high classification accuracy and robustness, and can accurately classify various types of laryngoscope images. In the test set of independent individuals, the overall accuracy reaches 86.51 %, and the average area under curve value is 0.954. The performance of the model is significantly better than other existing algorithms. CONCLUSION This paper proposes a deep learning based automatic classification model for laryngoscope images. By integrating the output features of deep neural network ResNet and Transformer, eight laryngeal diseases can be accurately classified. This indicates that the proposed method can be effectively applied to the study of laryngeal diseases.
Collapse
Affiliation(s)
- Ran Wei
- Department of Rehabilitation Engineering, Beijing College of Social Administration (Training Center of the Ministry of Civil Affairs), Beijing 102600, China
| | - Yan Liang
- School of Electronic and Information Engineering, Tiangong University, Tianjin 300387, China; Tianjin Key Laboratory of Optoelectronic Detection Technology and Systems, Tianjin 300387, China
| | - Lei Geng
- Tianjin Key Laboratory of Optoelectronic Detection Technology and Systems, Tianjin 300387, China; School of Life Sciences, Tiangong University, Tianjin 300387, China.
| | - Wei Wang
- Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin 300192, China; Institute of Otolaryngology of Tianjin, Tianjin, China; Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China; Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China; Otolaryngology Clinical Quality Control Centre, Tianjin, China
| | - Mei Wei
- Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin 300192, China; Institute of Otolaryngology of Tianjin, Tianjin, China; Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China; Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China; Otolaryngology Clinical Quality Control Centre, Tianjin, China
| |
Collapse
|
3
|
Pandey A, Kaur J, Kaushal D. Transforming ENT Healthcare: Advancements and Implications of Artificial Intelligence. Indian J Otolaryngol Head Neck Surg 2024; 76:4986-4996. [PMID: 39376323 PMCID: PMC11456104 DOI: 10.1007/s12070-024-04885-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 07/01/2024] [Indexed: 10/09/2024] Open
Abstract
This systematic literature review aims to study the role and impact of artificial intelligence (AI) in transforming Ear, Nose, and Throat (ENT) healthcare. It aims to compare and analyse literature that applied AI algorithms for ENT disease prediction and detection based on their effectiveness, methods, dataset, and performance. We have also discussed ENT specialists' challenges and AI's role in solving them. This review also discusses the challenges faced by AI researchers. This systematic review was completed using PRISMA guidelines. Data was extracted from several reputable digital databases, including PubMed, Medline, SpringerLink, Elsevier, Google Scholar, ScienceDirect, and IEEExplore. The search criteria included studies recently published between 2018 and 2024 related to the application of AI for ENT healthcare. After removing duplicate studies and quality assessments, we reviewed eligible articles and responded to the research questions. This review aims to provide a comprehensive overview of the current state of AI applications in ENT healthcare. Among the 3257 unique studies, 27 were selected as primary studies. About 62.5% of the included studies were effective in providing disease predictions. We found that Pretrained DL models are more in application than CNN algorithms when employed for ENT disease predictions. The accuracy of models ranged between 75 and 97%. We also observed the effectiveness of conversational AI models such as ChatGPT in the ENT discipline. The research in AI for ENT is advancing rapidly. Most of the models have achieved accuracy above 90%. However, the lack of good-quality data and data variability limits the overall ability of AI models to perform better for ENT disease prediction. Further research needs to be conducted while considering factors such as external validation and the issue of class imbalance.
Collapse
Affiliation(s)
- Ayushmaan Pandey
- Department of Computer Science and Engineering, Dr B R Ambedkar National Institute of Technology, G. T. Road, Jalandhar, Punjab 144008 India
| | - Jagdeep Kaur
- Department of Computer Science and Engineering, Dr B R Ambedkar National Institute of Technology, G. T. Road, Jalandhar, Punjab 144008 India
| | - Darwin Kaushal
- Department of Otorhinolaryngology and Head Neck Surgery, All India Institute of Medical Sciences, Vijaypur, Jammu, Jammu and Kashmir 180001 India
| |
Collapse
|
4
|
Barlow J, Sragi Z, Rivera-Rivera G, Al-Awady A, Daşdöğen Ü, Courey MS, Kirke DN. The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review. Otolaryngol Head Neck Surg 2024; 170:1531-1543. [PMID: 38168017 DOI: 10.1002/ohn.636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/30/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024]
Abstract
OBJECTIVE To summarize the use of deep learning in the detection of voice disorders using acoustic and laryngoscopic input, compare specific neural networks in terms of accuracy, and assess their effectiveness compared to expert clinical visual examination. DATA SOURCES Embase, MEDLINE, and Cochrane Central. REVIEW METHODS Databases were screened through November 11, 2023 for relevant studies. The inclusion criteria required studies to utilize a specified deep learning method, use laryngoscopy or acoustic input, and measure accuracy of binary classification between healthy patients and those with voice disorders. RESULTS Thirty-four studies met the inclusion criteria, with 18 focusing on voice analysis, 15 on imaging analysis, and 1 both. Across the 18 acoustic studies, 21 programs were used for identification of organic and functional voice disorders. These technologies included 10 convolutional neural networks (CNNs), 6 multilayer perceptrons (MLPs), and 5 other neural networks. The binary classification systems yielded a mean accuracy of 89.0% overall, including 93.7% for MLP programs and 84.5% for CNNs. Among the 15 imaging analysis studies, a total of 23 programs were utilized, resulting in a mean accuracy of 91.3%. Specifically, the twenty CNNs achieved a mean accuracy of 92.6% compared to 83.0% for the 3 MLPs. CONCLUSION Deep learning models were shown to be highly accurate in the detection of voice pathology, with CNNs most effective for assessing laryngoscopy images and MLPs most effective for assessing acoustic input. While deep learning methods outperformed expert clinical exam in limited comparisons, further studies integrating external validation are necessary.
Collapse
Affiliation(s)
- Joshua Barlow
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Zara Sragi
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Gabriel Rivera-Rivera
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Abdurrahman Al-Awady
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Ümit Daşdöğen
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Mark S Courey
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Diana N Kirke
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| |
Collapse
|
5
|
Kryukov AI, Sudarev PA, Romanenko SG, Kurbanova DI, Lesogorova EV, Krasilnikova EN, Pavlikhin OG, Ivanova AA, Osadchiy AP, Shevyrina NG. [Diagnosis of benign laryngeal tumors using neural network]. Vestn Otorinolaringol 2024; 89:24-28. [PMID: 39104269 DOI: 10.17116/otorino20248903124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
The article describes our experience in developing and training an artificial neural network based on artificial intelligence algorithms for recognizing the characteristic features of benign laryngeal tumors and variants of the norm of the larynx based on the analysis of laryngoscopy pictures obtained during the examination of patients. During the preparation of data for training the neural network, a dataset was collected, labeled and loaded, consisting of 1471 images of the larynx in digital formats (jpg, bmp). Next, the neural network was trained and tested in order to recognize images of the norm and neoplasms of the larynx. The developed and trained artificial neural network demonstrated an accuracy of 86% in recognizing of benign laryngeal tumors and variants of the norm of the larynx. The proposed technology can be further used in practical healthcare to control and improve the quality of diagnosis of laryngeal pathologies.
Collapse
Affiliation(s)
- A I Kryukov
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - P A Sudarev
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - S G Romanenko
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - D I Kurbanova
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - E V Lesogorova
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - E N Krasilnikova
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - O G Pavlikhin
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | | | | | | |
Collapse
|
6
|
Semmler M, Lasar S, Kremer F, Reinwald L, Wittig F, Peters G, Schraut T, Wendler O, Seyferth S, Schützenberger A, Dürr S. Extent and Effect of Covering Laryngeal Structures with Synthetic Laryngeal Mucus via Two Different Administration Techniques. J Voice 2023:S0892-1997(23)00228-X. [PMID: 37648625 DOI: 10.1016/j.jvoice.2023.07.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/20/2023] [Accepted: 07/21/2023] [Indexed: 09/01/2023]
Abstract
OBJECTIVE The first goal of this study was to investigate the coverage of laryngeal structures using two potential administration techniques for synthetic mucus: inhalation and lozenge ingestion. As a second research question, the study investigated the potential effects of these techniques on standardized voice assessment parameters. METHODS Fluorescein was added to throat lozenges and to an inhalation solution to visualize the coverage of laryngeal structures through blue light imaging. The study included 70 vocally healthy subjects. Fifty subjects underwent administration via lozenge ingestion and 20 subjects performed the inhalation process. For the first research question, the recordings from the blue light imaging system were categorized to compare the extent of coverage on individual laryngeal structures objectively. Secondly, a standardized voice evaluation protocol was performed before and after each administration to determine any measurable effects of typical voice parameters. RESULTS The administration via inhalation demonstrated complete coverage of all laryngeal structures, including the vocal folds, ventricular folds, and arytenoid cartilages, as visualized by the fluorescent dye. In contrast, the application of the lozenge predominantly covered the pharynx and laryngeal surface toward the aryepiglottic fold, but not the inferior structures. All in all, the comparison before and after administration showed no clear effect, although a minor deterioration of the acoustic signal was noted in the shimmer and cepstral peak prominence after the inhalation. CONCLUSIONS Our findings indicate that the inhalation process is a more effective technique for covering deeper laryngeal structures such as the vocal folds and ventricular folds with synthetic mucus. This knowledge enables further in vivo studies on the role of laryngeal mucus in phonation in general, and how it can be substituted or supplemented for patients with reduced glandular activity as well as for heavy voice users.
Collapse
Affiliation(s)
- Marion Semmler
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Sarina Lasar
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Franziska Kremer
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Laura Reinwald
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Fiori Wittig
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Gregor Peters
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Tobias Schraut
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Olaf Wendler
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Stefan Seyferth
- Department of Chemistry and Pharmacy, Chair of Pharmaceutics, Friedrich-Alexander-University Erlangen-Nürnberg, Cauerstr. 4, 91058 Erlangen, Germany.
| | - Anne Schützenberger
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Stephan Dürr
- University Hospital Regensburg, Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany.
| |
Collapse
|
7
|
Kim GH, Hwang YJ, Lee H, Sung ES, Nam KW. Convolutional neural network-based vocal cord tumor classification technique for home-based self-prescreening purpose. Biomed Eng Online 2023; 22:81. [PMID: 37596652 PMCID: PMC10439563 DOI: 10.1186/s12938-023-01139-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 07/20/2023] [Indexed: 08/20/2023] Open
Abstract
BACKGROUND In this study, we proposed a deep learning technique that can simultaneously detect suspicious positions of benign vocal cord tumors in laparoscopic images and classify the types of tumors into cysts, granulomas, leukoplakia, nodules and polyps. This technique is useful for simplified home-based self-prescreening purposes to detect the generation of tumors around the vocal cord early in the benign stage. RESULTS We implemented four convolutional neural network (CNN) models (two Mask R-CNNs, Yolo V4, and a single-shot detector) that were trained, validated and tested using 2183 laryngoscopic images. The experimental results demonstrated that among the four applied models, Yolo V4 showed the highest F1-score for all tumor types (0.7664, cyst; 0.9875, granuloma; 0.8214, leukoplakia; 0.8119, nodule; and 0.8271, polyp). The model with the lowest false-negative rate was different for each tumor type (Yolo V4 for cysts/granulomas and Mask R-CNN for leukoplakia/nodules/polyps). In addition, the embedded-operated Yolo V4 model showed an approximately equivalent F1-score (0.8529) to that of the computer-operated Yolo-4 model (0.8683). CONCLUSIONS Based on these results, we conclude that the proposed deep-learning-based home screening techniques have the potential to aid in the early detection of tumors around the vocal cord and can improve the long-term survival of patients with vocal cord tumors.
Collapse
Affiliation(s)
- Gun Ho Kim
- Medical Research Institute, Pusan National University, Yangsan, Korea
- Department of Biomedical Engineering, Pusan National University Yangsan Hospital, Yangsan, Korea
| | - Young Jun Hwang
- Department of Biomedical Engineering, School of Medicine, Pusan National University, 49, Busandaehak-Ro, Mulgeum-Eup, Yangsan, 50629, Korea
| | - Hongje Lee
- Department of Nuclear Medicine, Dongnam Institute of Radiological & Medical Sciences, Busan, Korea
| | - Eui-Suk Sung
- Department of Otolaryngology-Head and Neck Surgery, Pusan National University Yangsan Hospital, Yangsan, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Pusan National University, Yangsan, Korea.
- Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Korea.
| | - Kyoung Won Nam
- Department of Biomedical Engineering, Pusan National University Yangsan Hospital, Yangsan, Korea.
- Department of Biomedical Engineering, School of Medicine, Pusan National University, 49, Busandaehak-Ro, Mulgeum-Eup, Yangsan, 50629, Korea.
- Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Korea.
| |
Collapse
|