1
|
Xia Z, Yuan R, Cao Y, Sun T, Xiong Y, Xu K. A systematic review of the application of machine learning techniques to ultrasound tongue imaging analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:1796-1819. [PMID: 39287468 DOI: 10.1121/10.0028610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 08/23/2024] [Indexed: 09/19/2024]
Abstract
B-mode ultrasound has emerged as a prevalent tool for observing tongue motion in speech production, gaining traction in speech therapy applications. However, the effective analysis of ultrasound tongue image frame sequences (UTIFs) encounters many challenges, such as the presence of high levels of speckle noise and obscured views. Recently, the application of machine learning, especially deep learning techniques, to UTIF interpretation has shown promise in overcoming these hurdles. This paper presents a thorough examination of the existing literature, focusing on UTIF analysis. The scope of our work encompasses four key areas: a foundational introduction to deep learning principles, an exploration of motion tracking methodologies, a discussion of feature extraction techniques, and an examination of cross-modality mapping. The paper concludes with a detailed discussion of insights gleaned from the comprehensive literature review, outlining potential trends and challenges that lie ahead in the field.
Collapse
Affiliation(s)
- Zhen Xia
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Ruicheng Yuan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Yuan Cao
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Tao Sun
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Yunsheng Xiong
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Kele Xu
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| |
Collapse
|
2
|
Wang X, Lu W, Liu H, Zhang W, Li Q. DAFT-Net: Dual Attention and Fast Tongue Contour Extraction Using Enhanced U-Net Architecture. ENTROPY (BASEL, SWITZERLAND) 2024; 26:482. [PMID: 38920489 PMCID: PMC11202898 DOI: 10.3390/e26060482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 06/27/2024]
Abstract
In most silent speech research, continuously observing tongue movements is crucial, thus requiring the use of ultrasound to extract tongue contours. Precisely and in real-time extracting ultrasonic tongue contours presents a major challenge. To tackle this challenge, the novel end-to-end lightweight network DAFT-Net is introduced for ultrasonic tongue contour extraction. Integrating the Convolutional Block Attention Module (CBAM) and Attention Gate (AG) module with entropy-based optimization strategies, DAFT-Net establishes a comprehensive attention mechanism with dual functionality. This innovative approach enhances feature representation by replacing traditional skip connection architecture, thus leveraging entropy and information-theoretic measures to ensure efficient and precise feature selection. Additionally, the U-Net's encoder and decoder layers have been streamlined to reduce computational demands. This process is further supported by information theory, thus guiding the reduction without compromising the network's ability to capture and utilize critical information. Ablation studies confirm the efficacy of the integrated attention module and its components. The comparative analysis of the NS, TGU, and TIMIT datasets shows that DAFT-Net efficiently extracts relevant features, and it significantly reduces extraction time. These findings demonstrate the practical advantages of applying entropy and information theory principles. This approach improves the performance of medical image segmentation networks, thus paving the way for real-world applications.
Collapse
Affiliation(s)
- Xinqiang Wang
- Tianjin Key Lab of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China;
- School of Software and Communication, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China
| | - Wenhuan Lu
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Hengxin Liu
- School of Microelectronics, Tianjin University, Tianjin 300072, China
| | - Wei Zhang
- Nanjing Research Institute of Electronic Engineering, Nanjing 210023, China
| | - Qiang Li
- School of Microelectronics, Tianjin University, Tianjin 300072, China
| |
Collapse
|
3
|
Roon KD, Chen WR, Iwasaki R, Kang J, Kim B, Shejaeya G, Tiede MK, Whalen DH. Comparison of auto-contouring and hand-contouring of ultrasound images of the tongue surface. CLINICAL LINGUISTICS & PHONETICS 2022; 36:1112-1131. [PMID: 34974782 PMCID: PMC9250540 DOI: 10.1080/02699206.2021.1998633] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 10/13/2021] [Accepted: 10/13/2021] [Indexed: 06/04/2023]
Abstract
Contours traced by trained phoneticians have been considered to be the most accurate way to identify the midsagittal tongue surface from ultrasound video frames. In this study, inter-measurer reliability was evaluated using measures that quantified both how closely human-placed contours approximated each other as well as how consistent measurers were in defining the start and end points of contours. High reliability across three measurers was found for all measures, consistent with treating contours placed by trained phoneticians as the 'gold standard.' However, due to the labour-intensive nature of hand-placing contours, automatic algorithms that detect the tongue surface are increasingly being used to extract tongue-surface data from ultrasound videos. Contours placed by six automatic algorithms (SLURP, EdgeTrak, EPCS, and three different configurations of the algorithm provided in Articulate Assistant Advanced) were compared to human-placed contours, with the same measures used to evaluate the consistency of the trained phoneticians. We found that contours defined by SLURP, EdgeTrak, and two of the AAA configurations closely matched the hand-placed contours along sections of the image where the algorithms and humans agreed that there was a discernible contour. All of the algorithms were much less reliable than humans in determining the anterior (tongue-tip) edge of tongue contours. Overall, the contours produced by SLURP, EdgeTrak, and AAA should be useable in a variety of clinical applications, subject to spot-checking. Additional practical considerations of these algorithms are also discussed.
Collapse
Affiliation(s)
- Kevin D. Roon
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | | | - Rion Iwasaki
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | - Jaekoo Kang
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | - Boram Kim
- Haskins Laboratories, New Haven, Connecticut, USA
- Program in Linguistics, CUNY Graduate Center, New York, USA
| | - Ghada Shejaeya
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | | | - D. H. Whalen
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
- Department of Linguistics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
4
|
Al-hammuri K, Gebali F, Thirumarai Chelvan I, Kanan A. Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review. Diagnostics (Basel) 2022; 12:diagnostics12112811. [PMID: 36428870 PMCID: PMC9689563 DOI: 10.3390/diagnostics12112811] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/07/2022] [Accepted: 11/13/2022] [Indexed: 11/18/2022] Open
Abstract
Lingual ultrasound imaging is essential in linguistic research and speech recognition. It has been used widely in different applications as visual feedback to enhance language learning for non-native speakers, study speech-related disorders and remediation, articulation research and analysis, swallowing study, tongue 3D modelling, and silent speech interface. This article provides a comparative analysis and review based on quantitative and qualitative criteria of the two main streams of tongue contour segmentation from ultrasound images. The first stream utilizes traditional computer vision and image processing algorithms for tongue segmentation. The second stream uses machine and deep learning algorithms for tongue segmentation. The results show that tongue tracking using machine learning-based techniques is superior to traditional techniques, considering the performance and algorithm generalization ability. Meanwhile, traditional techniques are helpful for implementing interactive image segmentation to extract valuable features during training and postprocessing. We recommend using a hybrid approach to combine machine learning and traditional techniques to implement a real-time tongue segmentation tool.
Collapse
Affiliation(s)
- Khalid Al-hammuri
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 2Y2, Canada
- Correspondence:
| | - Fayez Gebali
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 2Y2, Canada
| | | | - Awos Kanan
- Department of Computer Engineering, Princess Sumaya University for Technology, Amman 11941, Jordan
| |
Collapse
|
5
|
Wrench A, Balch-Tomes J. Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut. SENSORS 2022; 22:s22031133. [PMID: 35161879 PMCID: PMC8838804 DOI: 10.3390/s22031133] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 01/18/2023]
Abstract
Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56 mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.
Collapse
Affiliation(s)
- Alan Wrench
- Clinical Audiology, Speech and Language Research Centre, Queen Margaret University, Musselburgh EH21 6UU, UK
- Articulate Instruments Ltd., Musselburgh EH21 6UU, UK;
- Correspondence: ; Tel.: +44-131-474-0000
| | | |
Collapse
|
6
|
Paris A, Hafiane A. Shape constraint function for artery tracking in ultrasound images. Comput Med Imaging Graph 2021; 93:101970. [PMID: 34428649 DOI: 10.1016/j.compmedimag.2021.101970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 05/26/2021] [Accepted: 08/06/2021] [Indexed: 11/17/2022]
Abstract
Ultrasound guided regional anesthesia (UGRA) has emerged as a powerful technique for pain management in the operating theatre. It uses ultrasound imaging to visualize anatomical structures, the needle insertion and the delivery of the anesthetic around the targeted nerve block. Detection of the nerves is a difficult task, however, due to the poor quality of the ultrasound images. Recent developments in pattern recognition and machine learning have heightened the need for computer aided systems in many applications. This type of system can improve UGRA practice. In many imaging situations nerves are not salient in images. Generally, practitioners rely on the arteries as key anatomical structures to confirm the positions of the nerves, making artery tracking an important aspect for UGRA procedure. However, artery tracking in a noisy environment is a challenging problem, due to the instability of the features. This paper proposes a new method for real-time artery tracking in ultrasound images. It is based on shape information to correct tracker location errors. A new objective function is proposed, which defines an artery as an elliptical shape, enabling its robust fitting in a noisy environment. This approach is incorporated in two well-known tracking algorithms, and shows a systematic improvement over the original trackers. Evaluations were performed on 71 videos of different axillary nerve blocks. The results obtained demonstrated the validity of the proposed method.
Collapse
Affiliation(s)
- Arnaud Paris
- INSA Centre Val de Loire, University of Orléans, Laboratory PRISME EA 4229, 88 boulevard Lahitolle, F-18020 Bourges, France.
| | - Adel Hafiane
- INSA Centre Val de Loire, University of Orléans, Laboratory PRISME EA 4229, 88 boulevard Lahitolle, F-18020 Bourges, France
| |
Collapse
|
7
|
Genna CW, Saperstein Y, Siegel SA, Laine AF, Elad D. Quantitative imaging of tongue kinematics during infant feeding and adult swallowing reveals highly conserved patterns. Physiol Rep 2021; 9:e14685. [PMID: 33547883 PMCID: PMC7866619 DOI: 10.14814/phy2.14685] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 11/24/2020] [Accepted: 11/27/2020] [Indexed: 12/18/2022] Open
Abstract
Tongue motility is an essential physiological component of human feeding from infancy through adulthood. At present, it is a challenge to distinguish among the many pathologies of swallowing due to the absence of quantitative tools. We objectively quantified tongue kinematics from ultrasound imaging during infant and adult feeding. The functional advantage of this method is presented in several subjects with swallowing difficulties. We demonstrated for the first time the differences in tongue kinematics during breast- and bottle-feeding, showing the arrhythmic sucking pattern during bottle-feeding as compared with breastfeeding in the same infant with torticollis. The method clearly displayed the improvement of tongue motility after frenotomy in infants with either tongue-tie or restrictive labial frenulum. The analysis also revealed the absence of posterior tongue peristalsis required for safe swallowing in an infant with dysphagia. We also analyzed for the first time the tongue kinematics in an adult during water bolus swallowing demonstrating tongue peristaltic-like movements in both anterior and posterior segments. First, the anterior segment undulates to close off the oral cavity and the posterior segment held the bolus, and then, the posterior tongue propelled the bolus to the pharynx. The present methodology of quantitative imaging revealed highly conserved patterns of tongue kinematics that can differentiate between swallowing pathologies and evaluate treatment interventions. The method is novel and objective and has the potential to advance knowledge about the normal swallowing and management of feeding disorders.
Collapse
Affiliation(s)
| | - Yiela Saperstein
- Department of Biomedical EngineeringColumbia UniversityNew YorkNYUSA
| | - Scott A. Siegel
- School of Medicine/School of Dental MedicineStony Brook UniversitySuffolk CountyNYUSA
| | - Andrew F. Laine
- Department of Biomedical EngineeringColumbia UniversityNew YorkNYUSA
| | - David Elad
- Department of Biomedical EngineeringColumbia UniversityNew YorkNYUSA
- Department of Biomedical EngineeringTel‐Aviv UniversityTel‐AvivIsrael
| |
Collapse
|
8
|
Hamed Mozaffari M, Lee WS. Encoder-decoder CNN models for automatic tracking of tongue contours in real-time ultrasound data. Methods 2020; 179:26-36. [PMID: 32450205 DOI: 10.1016/j.ymeth.2020.05.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 05/11/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022] Open
Abstract
One application of medical ultrasound imaging is to visualize and characterize human tongue shape and motion in real-time to study healthy or impaired speech production. Due to the low-contrast characteristic and noisy nature of ultrasound images, it requires knowledge about the tongue structure and ultrasound data interpretation for users to recognize tongue locations and gestures easily. Moreover, quantitative analysis of tongue motion needs the tongue contour to be extracted, tracked and visualized instead of the whole tongue region. Manual tongue contour extraction is a cumbersome, subjective, and error-prone task. Furthermore, it is not a feasible solution for real-time applications where the tongue contour moves rapidly with nuance gestures. This paper presents two new deep neural networks (named BowNet models) that benefit from the ability of global prediction of encoding-decoding fully convolutional neural networks and the capability of full-resolution extraction of dilated convolutions. Both qualitatively and quantitatively studies over datasets from two ultrasound machines disclosed the outstanding performances of the proposed deep learning models in terms of performance speed and robustness. Experimental results also revealed a significant improvement in the accuracy of prediction maps due to the better exploration and exploitation ability of the proposed network models.
Collapse
Affiliation(s)
- M Hamed Mozaffari
- School of Electrical Engineering and Computer Science, University of Ottawa, 800 King-Edward Avenue, Ottawa, Ontario K1N-6N5, Canada.
| | - Won-Sook Lee
- School of Electrical Engineering and Computer Science, University of Ottawa, 800 King-Edward Avenue, Ottawa, Ontario K1N-6N5, Canada
| |
Collapse
|
9
|
Naga Karthik EMV, Karimi E, Lulich SM, Laporte C. Automatic tongue surface extraction from three-dimensional ultrasound vocal tract images. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1623. [PMID: 32237834 DOI: 10.1121/10.0000891] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 02/22/2020] [Indexed: 06/11/2023]
Abstract
Three-dimensional (3D/4D) ultrasound (US) imaging of the tongue has emerged as a useful instrument for articulatory studies. However, extracting quantitative measurements of the shape of the tongue surface remains challenging and time-consuming. In response to these challenges, this paper documents and evaluates the first automated method for extracting tongue surfaces from 3D/4D US data. The method draws on established methods in computer vision, and combines image phase symmetry measurements, eigen-analysis of the image Hessian matrix, and a fast marching method for surface evolution towards the automatic detection of the sheet-like surface of the tongue amidst noisy US data. The method was tested on US recordings from eight speakers and the resulting automatically extracted tongue surfaces were generally found to lie within 1 to 2 mm from their corresponding manually delineated surfaces in terms of mean-sum-of-distances error. Further experiments demonstrate that the accuracy of 2D midsagittal tongue contour extraction is also improved using 3D data and methods. This is likely because the additional information afforded by 3D US compared to 2D US images strongly constrains the possible location of the midsagittal contour. Thus, the proposed method seems appropriate for immediate practical use in the analysis of 3D/4D US recordings of the tongue.
Collapse
Affiliation(s)
| | - Elham Karimi
- Department of Electrical Engineering, École de technologie supérieure, Montréal, Québec, Canada
| | - Steven M Lulich
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA
| | - Catherine Laporte
- Department of Electrical Engineering, École de technologie supérieure, Montréal, Québec, Canada
| |
Collapse
|
10
|
Alkhatib M, Hafiane A, Vieyres P, Delbos A. Deep visual nerve tracking in ultrasound images. Comput Med Imaging Graph 2019; 76:101639. [DOI: 10.1016/j.compmedimag.2019.05.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 02/19/2019] [Accepted: 05/31/2019] [Indexed: 01/08/2023]
|
11
|
Karimi E, Ménard L, Laporte C. Fully-automated tongue detection in ultrasound images. Comput Biol Med 2019; 111:103335. [PMID: 31279163 DOI: 10.1016/j.compbiomed.2019.103335] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 05/24/2019] [Accepted: 06/17/2019] [Indexed: 11/16/2022]
Abstract
Tracking the tongue in ultrasound images provides information about its shape and kinematics during speech. Current methods for detecting/tracking the tongue require manual initialization or training using large amounts of labeled images. In this article, we propose a solution to convert a semi-automatic tongue contour tracking system to a fully-automatic one. This work introduces a new method for extracting tongue contours in ultrasound images that requires no training nor manual intervention. The method consists in an image enhancement step based on phase symmetry, followed by skeletonization and clustering steps, leading to a set of candidate points that can be used to fit an active contour to the image and subsequently initialize a tracking algorithm. Two novel quality measures were also developed that predict the reliability of the segmentation result so that an image with a reliable contour can be chosen to confidently initialize fully automated tongue tracking. This is achieved by automatically generating and choosing a set of points that can replace the manually segmented points for a semi-automated tracking approach. This paper also improves the accuracy of tracking by incorporating two criteria to reset the tracking algorithm from time to time. Experiments show that fully automated and semi-automated methods result in very similar mean sum of distances errors, respectively, indicating that the proposed automatic initialization does not significantly alter accuracy. Moreover, further results show that tracking accuracy is improved when using the new segmentation technique within the proposed re-initialization scheme.
Collapse
Affiliation(s)
- Elham Karimi
- Department of Electrical Engineering, École de technologie supérieure, 1100 Rue Notre-Dame O, Montréal, QC, H3C 1K3, Canada.
| | - Lucie Ménard
- Department of Linguistics, Université du Québec à Montréal, Montreal, QC, H2X 1L7, Canada; Center for Research on Brain, Language, and Music, Montreal, H3G 2A8, Canada; Sainte-Justine University Hospital Research Centre, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1C4, Canada.
| | - Catherine Laporte
- Department of Electrical Engineering, École de technologie supérieure, 1100 Rue Notre-Dame O, Montréal, QC, H3C 1K3, Canada; Sainte-Justine University Hospital Research Centre, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1C4, Canada.
| |
Collapse
|
12
|
Alkhatib M, Hafiane A, Tahri O, Vieyres P, Delbos A. Adaptive median binary patterns for fully automatic nerves tracking in ultrasound images. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 160:129-140. [PMID: 29728240 DOI: 10.1016/j.cmpb.2018.03.013] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 02/07/2018] [Accepted: 03/20/2018] [Indexed: 05/28/2023]
Abstract
BACKGROUND AND OBJECTIVE In the last decade, Ultrasound-Guided Regional Anesthesia (UGRA) gained importance in surgical procedures and pain management, due to its ability to perform target delivery of local anesthetics under direct sonographic visualization. However, practicing UGRA can be challenging, since it requires high skilled and experienced operator. Among the difficult task that the operator can face, is the tracking of the nerve structure in ultrasound images. Tracking task in US images is very challenging due to the noise and other artifacts. METHODS In this paper, we introduce a new and robust tracking technique by using Adaptive Median Binary Pattern(AMBP) as texture feature for tracking algorithms (particle filter, mean-shift and Kanade-Lucas-Tomasi(KLT)). Moreover, we propose to incorporate Kalman filter as prediction and correction steps for the tracking algorithms, in order to enhance the accuracy, computational cost and handle target disappearance. RESULTS The proposed method have been applied on real data and evaluated in different situations. The obtained results show that tracking with AMBP features outperforms other descriptors and achieved best performance with 95% accuracy. CONCLUSIONS This paper presents the first fully automatic nerve tracking method in Ultrasound images. AMBP features outperforms other descriptors in all situations such as noisy and filtered images.
Collapse
Affiliation(s)
- Mohammad Alkhatib
- INSA Centre Val de Loire, Laboratoire PRISME EA 4229, Bourges F-18000, France; Université d'Orléans, Laboratoire PRISME EA 4229, Bourges F-18000, France
| | - Adel Hafiane
- INSA Centre Val de Loire, Laboratoire PRISME EA 4229, Bourges F-18000, France.
| | - Omar Tahri
- INSA Centre Val de Loire, Laboratoire PRISME EA 4229, Bourges F-18000, France
| | - Pierre Vieyres
- Université d'Orléans, Laboratoire PRISME EA 4229, Bourges F-18000, France
| | - Alain Delbos
- Clinique Medipôle Garonne, Toulouse F-31036, France
| |
Collapse
|
13
|
Laporte C, Ménard L. Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech. Med Image Anal 2018; 44:98-114. [DOI: 10.1016/j.media.2017.12.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Revised: 11/28/2017] [Accepted: 12/02/2017] [Indexed: 10/18/2022]
|
14
|
Chien CY, Chen JW, Chang CH, Huang CC. Tracking Dynamic Tongue Motion in Ultrasound Images for Obstructive Sleep Apnea. ULTRASOUND IN MEDICINE & BIOLOGY 2017; 43:2791-2805. [PMID: 28942270 DOI: 10.1016/j.ultrasmedbio.2017.08.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Revised: 07/19/2017] [Accepted: 08/02/2017] [Indexed: 06/07/2023]
Abstract
Obstructive sleep apnea (OSA), a breathing disorder characterized by repetitive collapse of the pharyngeal airway during sleep, can cause intermittent hypoxemia and frequent arousal. The evaluation of dynamic tongue motion not only provides the biomechanics and pathophysiology for OSA diagnosis, but also helps doctors to determine treatment strategies for these patients with OSA. The purpose of this study was to develop and verify a dedicated tracking algorithm, called the modified optical flow (OF)-based method, for monitoring the dynamic motion of the tongue base in ultrasound image sequences derived from controls and patients with OSA. The performance of the proposed method was verified by phantom and synthetic data. A common tracking method, the normalized cross-correlation method, was included for comparison. The efficacy of the algorithms was evaluated by calculating the estimated displacement error. All results indicated that the modified OF-based method exhibited higher accuracy in verification experiments. In the human subject experiment, all participants performed the Müller maneuver (MM) to simulate the contour changes of the tongue base with a negative pharyngeal airway pressure in sleep apnea. Ultrasound image sequences of the tongue were obtained during 10 s of a transition from normal breathing to the MM, and these were measured using the modified OF-based method. The results indicated that the displacement of the tongue base during the MM was larger in the controls than in the patients with OSA (p < 0.05); the calculated areas of the tongue in the controls and patients with OSA were 24.9 ± 3.0 and 27.6 ± 3.3 cm2, respectively, during normal breathing (p < 0.05), and 24.7 ± 3.6 and 27.3 ± 3.8 cm2, respectively, at the end of the MM. The percentage changes in the tongue area were 2.2% and 1.3% in the controls and patients with OSA, respectively. We found that quantitative assessment of tongue motion by ultrasound imaging is suitable for evaluating pharyngeal airway behavior in OSA patients with minimal invasiveness and easy accessibility.
Collapse
Affiliation(s)
- Chih-Yen Chien
- Department of Biomedical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Jeng-Wen Chen
- Department of Otolaryngology Head and Neck Surgery, Cardinal Tien Hospital, New Taipei City, Taiwan; School of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan
| | - Chun-Hsiang Chang
- Department of Otolaryngology Head and Neck Surgery, Cardinal Tien Hospital, New Taipei City, Taiwan
| | - Chih-Chung Huang
- Department of Biomedical Engineering, National Cheng Kung University, Tainan, Taiwan.
| |
Collapse
|
15
|
Xu K, Gábor Csapó T, Roussel P, Denby B. A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:EL154. [PMID: 27250201 DOI: 10.1121/1.4951024] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The feasibility of an automatic re-initialization of contour tracking is explored by using an image similarity-based method in the ultrasound tongue sequences. To this end, the re-initialization method was incorporated into current state-of-art tongue tracking algorithms, and a quantitative comparison was made between different algorithms by computing the mean sum of distances errors. The results demonstrate that with automatic re-initialization, the tracking error can be reduced from an average of 5-6 to about 4 pixels, a result obtained by using a large number of hand-labeled frames and similarity measurements to extract the contours, which results in improved performance.
Collapse
Affiliation(s)
- Kele Xu
- Department of Engineering, Université Pierre et Marie Curie; Langevin Institute, ESPCI-ParisTech Paris, 75005, France
| | - Tamás Gábor Csapó
- Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
| | - Pierre Roussel
- Langevin Institute, ESPCI-ParisTech, Paris, 75005, France
| | | |
Collapse
|
16
|
Xu K, Yang Y, Stone M, Jaumard-Hakoun A, Leboullenger C, Dreyfus G, Roussel P, Denby B. Robust contour tracking in ultrasound tongue image sequences. CLINICAL LINGUISTICS & PHONETICS 2016; 30:313-327. [PMID: 26786063 DOI: 10.3109/02699206.2015.1110714] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A new contour-tracking algorithm is presented for ultrasound tongue image sequences, which can follow the motion of tongue contours over long durations with good robustness. To cope with missing segments caused by noise, or by the tongue midsagittal surface being parallel to the direction of ultrasound wave propagation, active contours with a contour-similarity constraint are introduced, which can be used to provide 'prior' shape information. Also, in order to address accumulation of tracking errors over long sequences, we present an automatic re-initialization technique, based on the complex wavelet image similarity index. Experiments on synthetic data and on real 60 frame per second (fps) data from different subjects demonstrate that the proposed method gives good contour tracking for ultrasound image sequences even over durations of minutes, which can be useful in applications such as speech recognition where very long sequences must be analyzed in their entirety.
Collapse
Affiliation(s)
- Kele Xu
- a Faculty of Engineering , Université Pierre et Marie Curie , Paris , France
- b Signal Processing and Machine Learning (SIGMA) Lab , ESPCI ParisTech , Paris , France
| | - Yin Yang
- c Electrical and Computer Engineering Department , University of New Mexico , Albuquerque , New Mexico , USA
| | - Maureen Stone
- d Vocal Tract Visualization Lab , University of Maryland Dental School , Baltimore , Maryland, USA
| | - Aurore Jaumard-Hakoun
- a Faculty of Engineering , Université Pierre et Marie Curie , Paris , France
- b Signal Processing and Machine Learning (SIGMA) Lab , ESPCI ParisTech , Paris , France
| | - Clémence Leboullenger
- a Faculty of Engineering , Université Pierre et Marie Curie , Paris , France
- b Signal Processing and Machine Learning (SIGMA) Lab , ESPCI ParisTech , Paris , France
| | - Gérard Dreyfus
- b Signal Processing and Machine Learning (SIGMA) Lab , ESPCI ParisTech , Paris , France
| | - Pierre Roussel
- b Signal Processing and Machine Learning (SIGMA) Lab , ESPCI ParisTech , Paris , France
| | - Bruce Denby
- e Cognitive Computing and Applications Lab , Tianjin University , Tianjin , China
| |
Collapse
|
17
|
Ibragimov B, Prince JL, Murano EZ, Woo J, Stone M, Likar B, Pernuš F, Vrtovec T. Segmentation of tongue muscles from super-resolution magnetic resonance images. Med Image Anal 2014; 20:198-207. [PMID: 25487963 DOI: 10.1016/j.media.2014.11.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 11/11/2014] [Accepted: 11/15/2014] [Indexed: 10/24/2022]
Abstract
Imaging and quantification of tongue anatomy is helpful in surgical planning, post-operative rehabilitation of tongue cancer patients, and studying of how humans adapt and learn new strategies for breathing, swallowing and speaking to compensate for changes in function caused by disease, medical interventions or aging. In vivo acquisition of high-resolution three-dimensional (3D) magnetic resonance (MR) images with clearly visible tongue muscles is currently not feasible because of breathing and involuntary swallowing motions that occur over lengthy imaging times. However, recent advances in image reconstruction now allow the generation of super-resolution 3D MR images from sets of orthogonal images, acquired at a high in-plane resolution and combined using super-resolution techniques. This paper presents, to the best of our knowledge, the first attempt towards automatic tongue muscle segmentation from MR images. We devised a database of ten super-resolution 3D MR images, in which the genioglossus and inferior longitudinalis tongue muscles were manually segmented and annotated with landmarks. We demonstrate the feasibility of segmenting the muscles of interest automatically by applying the landmark-based game-theoretic framework (GTF), where a landmark detector based on Haar-like features and an optimal assignment-based shape representation were integrated. The obtained segmentation results were validated against an independent manual segmentation performed by a second observer, as well as against B-splines and demons atlasing approaches. The segmentation performance resulted in mean Dice coefficients of 85.3%, 81.8%, 78.8% and 75.8% for the second observer, GTF, B-splines atlasing and demons atlasing, respectively. The obtained level of segmentation accuracy indicates that computerized tongue muscle segmentation may be used in surgical planning and treatment outcome analysis of tongue cancer patients, and in studies of normal subjects and subjects with speech and swallowing problems.
Collapse
Affiliation(s)
- Bulat Ibragimov
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia; Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Emi Z Murano
- Department of Otolaryngology, Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA
| | - Jonghye Woo
- Department of Radiology, Harvard Medical School/MGH, Boston, MA, USA
| | - Maureen Stone
- Department of Oral and Craniofacial Biological Sciences, University of Maryland, Baltimore, MD, USA; Department of Orthodontics, University of Maryland, Baltimore, MD, USA
| | - Boštjan Likar
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Franjo Pernuš
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Tomaž Vrtovec
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|