1
|
Tawhari Y, Shukla C, Ren J. An Image Processing Approach to Quality Control of Drop-on-Demand Electrohydrodynamic (EHD) Printing. MICROMACHINES 2024; 15:1376. [PMID: 39597188 PMCID: PMC11596032 DOI: 10.3390/mi15111376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 11/07/2024] [Accepted: 11/10/2024] [Indexed: 11/29/2024]
Abstract
Droplet quality in drop-on-demand (DoD) Electrohydrodynamic (EHD) inkjet printing plays a crucial role in influencing the overall performance and manufacturing quality of the operation. The current approach to droplet printing analysis involves manually outlining/labeling the printed dots on the substrate under a microscope and then using microscope software to estimate the dot sizes by assuming the dots have a standard circular shape. Therefore, it is prone to errors. Moreover, the dot spacing information is missing, which is also important for EHD DoD printing processes, such as manufacturing micro-arrays. In order to address these issues, the paper explores the application of feature extraction methods aimed at identifying characteristics of the printed droplets to enhance the detection, evaluation, and delineation of significant structures and edges in printed images. The proposed method involves three main stages: (1) image pre-processing, where edge detection techniques such as Canny filtering are applied for printed dot boundary detection; (2) contour detection, which is used to accurately quantify the dot sizes (such as dot perimeter and area); and (3) centroid detection and distance calculation, where the spacing between neighboring dots is quantified as the Euclidean distance of the dot geometric centers. These stages collectively improve the precision and efficiency of EHD DoD printing analysis in terms of dot size and spacing. Edge and contour detection strategies are implemented to minimize edge discrepancies and accurately delineate droplet perimeters for quality analysis, enhancing measurement precision. The proposed image processing approach was first tested using simulated EHD printed droplet arrays with specified dot sizes and spacing, and the achieved quantification accuracy was over 98% in analyzing dot size and spacing, highlighting the high precision of the proposed approach. This approach was further demonstrated through dot analysis of experimentally EHD-printed droplets, showing its superiority over conventional microscope-based measurements.
Collapse
Affiliation(s)
- Yahya Tawhari
- Mechanical Engineering Department, Iowa State University, Ames, IA 50011, USA;
- Department of Mechanical Engineering, College of Engineering and Computer Sciences, Jazan University, Jazan 45142, Saudi Arabia
| | - Charchit Shukla
- Mechanical Engineering Department, Iowa State University, Ames, IA 50011, USA;
| | - Juan Ren
- Mechanical Engineering Department, Iowa State University, Ames, IA 50011, USA;
| |
Collapse
|
2
|
Shi X, Feng T, Huang K, Kadiri SR, Lee J, Lu Y, Zhang Y, Goldstein L, Narayanan S. Direct articulatory observation reveals phoneme recognition performance characteristics of a self-supervised speech model. JASA EXPRESS LETTERS 2024; 4:114801. [PMID: 39556017 DOI: 10.1121/10.0034430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 10/28/2024] [Indexed: 11/19/2024]
Abstract
Variability in speech pronunciation is widely observed across different linguistic backgrounds, which impacts modern automatic speech recognition performance. Here, we evaluate the performance of a self-supervised speech model in phoneme recognition using direct articulatory evidence. Findings indicate significant differences in phoneme recognition, especially in front vowels, between American English and Indian English speakers. To gain a deeper understanding of these differences, we conduct real-time MRI-based articulatory analysis, revealing distinct velar region patterns during the production of specific front vowels. This underscores the need to deepen the scientific understanding of self-supervised speech model variances to advance robust and inclusive speech technology.
Collapse
Affiliation(s)
- Xuan Shi
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089 USA
| | - Tiantian Feng
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089 USA
| | - Kevin Huang
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089 USA
| | - Sudarsana Reddy Kadiri
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089 USA
| | - Jihwan Lee
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089 USA
| | - Yijing Lu
- Department of Linguistics, University of Southern California, Los Angeles, California 90089 , , , , , , , ,
| | - Yubin Zhang
- Department of Linguistics, University of Southern California, Los Angeles, California 90089 , , , , , , , ,
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Los Angeles, California 90089 , , , , , , , ,
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California 90089 USA
- Department of Linguistics, University of Southern California, Los Angeles, California 90089 , , , , , , , ,
| |
Collapse
|
3
|
Belyk M, Carignan C, McGettigan C. An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images. Behav Res Methods 2024; 56:2623-2635. [PMID: 37507650 PMCID: PMC10990993 DOI: 10.3758/s13428-023-02171-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2023] [Indexed: 07/30/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms.
Collapse
Affiliation(s)
- Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk, UK.
| | - Christopher Carignan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
4
|
Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023; 10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open
Abstract
The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech, these do not include ground-truth (GT) segmentations, a key requirement for the development of deep-learning-based segmentation methods. To begin to address this barrier, this work presents rt-MRI speech datasets of five healthy adult volunteers with corresponding GT segmentations and velopharyngeal closure patterns. The images were acquired using standard clinical MRI scanners, coils and sequences to facilitate acquisition of similar images in other centres. The datasets include manually created GT segmentations of six anatomical features including the tongue, soft palate and vocal tract. In addition, this work makes code and instructions to implement a current state-of-the-art deep-learning-based method to segment rt-MRI speech datasets publicly available, thus providing the community and others with a starting point for developing such methods.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | | | - David M Adams
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | - Marc Eric Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
- Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London, E1 1HH, UK.
- Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London, EC1M 6BQ, UK.
| |
Collapse
|
5
|
Erattakulangara S, Kelat K, Meyer D, Priya S, Lingala SG. Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model. Bioengineering (Basel) 2023; 10:bioengineering10050623. [PMID: 37237693 DOI: 10.3390/bioengineering10050623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/11/2023] [Accepted: 05/19/2023] [Indexed: 05/28/2023] Open
Abstract
Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80-100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.
Collapse
Affiliation(s)
- Subin Erattakulangara
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Karthika Kelat
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah University, Winchester, VA 22601, USA
| | - Sarv Priya
- Department of Radiology, University of Iowa, Iowa City, IA 52242, USA
| | - Sajan Goud Lingala
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
- Department of Radiology, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
6
|
Ruthven M, Miquel ME, King AP. A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech. Biomed Signal Process Control 2023; 80:104290. [PMID: 36743699 PMCID: PMC9746295 DOI: 10.1016/j.bspc.2022.104290] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/29/2022] [Accepted: 10/08/2022] [Indexed: 11/06/2022]
Abstract
Objective Dynamic magnetic resonance (MR) imaging enables visualisation of articulators during speech. There is growing interest in quantifying articulator motion in two-dimensional MR images of the vocal tract, to better understand speech production and potentially inform patient management decisions. Image registration is an established way to achieve this quantification. Recently, segmentation-informed deformable registration frameworks have been developed and have achieved state-of-the-art accuracy. This work aims to adapt such a framework and optimise it for estimating displacement fields between dynamic two-dimensional MR images of the vocal tract during speech. Methods A deep-learning-based registration framework was developed and compared with current state-of-the-art registration methods and frameworks (two traditional methods and three deep-learning-based frameworks, two of which are segmentation informed). The accuracy of the methods and frameworks was evaluated using the Dice coefficient (DSC), average surface distance (ASD) and a metric based on velopharyngeal closure. The metric evaluated if the fields captured a clinically relevant and quantifiable aspect of articulator motion. Results The segmentation-informed frameworks achieved higher DSCs and lower ASDs and captured more velopharyngeal closures than the traditional methods and the framework that was not segmentation informed. All segmentation-informed frameworks achieved similar DSCs and ASDs. However, the proposed framework captured the most velopharyngeal closures. Conclusions A framework was successfully developed and found to more accurately estimate articulator motion than five current state-of-the-art methods and frameworks. Significance The first deep-learning-based framework specifically for registering dynamic two-dimensional MR images of the vocal tract during speech has been developed and evaluated.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom,School of Biomedical Engineering & Imaging Sciences, King’s College London, King’s Health Partners, St Thomas’ Hospital, London SE1 7EH, United Kingdom,Corresponding author at: Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom.
| | - Marc E. Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom,Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London E1 1HH, United Kingdom,Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Andrew P. King
- School of Biomedical Engineering & Imaging Sciences, King’s College London, King’s Health Partners, St Thomas’ Hospital, London SE1 7EH, United Kingdom
| |
Collapse
|
7
|
Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, Vaz C, Sorensen T, Oh M, Harper S, Chen W, Lee Y, Töger J, Monteserin ML, Smith C, Godinez B, Goldstein L, Byrd D, Nayak KS, Narayanan SS. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data 2021; 8:187. [PMID: 34285240 PMCID: PMC8292336 DOI: 10.1038/s41597-021-00976-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/22/2021] [Indexed: 12/11/2022] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yannick Bliesener
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Colin Vaz
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Tanner Sorensen
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Miran Oh
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Sarah Harper
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Weiyi Chen
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yoonjeong Lee
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Johannes Töger
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Mairym Lloréns Monteserin
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Caitlin Smith
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Bianca Godinez
- Department of Linguistics, California State University Long Beach, Long Beach, California, USA
| | - Louis Goldstein
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.
| |
Collapse
|
8
|
A deep neural network based correction scheme for improved air-tissue boundary prediction in real-time magnetic resonance imaging video. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2020.101160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Ruthven M, Miquel ME, King AP. Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 198:105814. [PMID: 33197740 PMCID: PMC7732702 DOI: 10.1016/j.cmpb.2020.105814] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/19/2020] [Indexed: 06/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Magnetic resonance (MR) imaging is increasingly used in studies of speech as it enables non-invasive visualisation of the vocal tract and articulators, thus providing information about their shape, size, motion and position. Extraction of this information for quantitative analysis is achieved using segmentation. Methods have been developed to segment the vocal tract, however, none of these also fully segment any articulators. The objective of this work was to develop a method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech, thus overcoming the limitations of existing methods. METHODS Five speech MR image sets (392 MR images in total), each of a different healthy adult volunteer, were used in this work. A fully convolutional network with an architecture similar to the original U-Net was developed to segment the following six regions in the image sets: the head, soft palate, jaw, tongue, vocal tract and tooth space. A five-fold cross-validation was performed to investigate the segmentation accuracy and generalisability of the network. The segmentation accuracy was assessed using standard overlap-based metrics (Dice coefficient and general Hausdorff distance) and a novel clinically relevant metric based on velopharyngeal closure. RESULTS The segmentations created by the method had a median Dice coefficient of 0.92 and a median general Hausdorff distance of 5mm. The method segmented the head most accurately (median Dice coefficient of 0.99), and the soft palate and tooth space least accurately (median Dice coefficients of 0.92 and 0.93 respectively). The segmentations created by the method correctly showed 90% (27 out of 30) of the velopharyngeal closures in the MR image sets. CONCLUSIONS An automatic method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech was successfully developed. The method is intended for use in clinical and non-clinical speech studies which involve quantitative analysis of the shape, size, motion and position of the vocal tract and articulators. In addition, a novel clinically relevant metric for assessing the accuracy of vocal tract and articulator segmentation methods was developed.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom; School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London SE1 7EH, United Kingdom.
| | - Marc E Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom; Centre for Advanced Cardiovascular Imaging, NIHR Barts Biomedical Research Centre, William Harvey Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London SE1 7EH, United Kingdom
| |
Collapse
|
10
|
Lim Y, Bliesener Y, Narayanan S, Nayak KS. Deblurring for spiral real-time MRI using convolutional neural networks. Magn Reson Med 2020; 84:3438-3452. [PMID: 32710516 PMCID: PMC7722023 DOI: 10.1002/mrm.28393] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Revised: 05/06/2020] [Accepted: 06/01/2020] [Indexed: 12/20/2022]
Abstract
PURPOSE To develop and evaluate a fast and effective method for deblurring spiral real-time MRI (RT-MRI) using convolutional neural networks. METHODS We demonstrate a 3-layer residual convolutional neural networks to correct image domain off-resonance artifacts in speech production spiral RT-MRI without the knowledge of field maps. The architecture is motivated by the traditional deblurring approaches. Spatially varying off-resonance blur is synthetically generated by using discrete object approximation and field maps with data augmentation from a large database of 2D human speech production RT-MRI. The effect of off-resonance range, shift-invariance of blur, and readout durations on deblurring performance are investigated. The proposed method is validated using synthetic and real data with longer readouts, quantitatively using image quality metrics and qualitatively via visual inspection, and with a comparison to conventional deblurring methods. RESULTS Deblurring performance was found superior to a current autocalibrated method for in vivo data and only slightly worse than an ideal reconstruction with perfect knowledge of the field map for synthetic test data. Convolutional neural networks deblurring made it possible to visualize articulator boundaries with readouts up to 8 ms at 1.5 T, which is 3-fold longer than the current standard practice. The computation time was 12.3 ± 2.2 ms per frame, enabling low-latency processing for RT-MRI applications. CONCLUSION Convolutional neural networks deblurring is a practical, efficient, and field map-free approach for the deblurring of spiral RT-MRI. In the context of speech production imaging, this can enable 1.7-fold improvement in scan efficiency and the use of spiral readouts at higher field strengths such as 3 T.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yannick Bliesener
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Krishna S. Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
11
|
Martin J, Ruthven M, Boubertakh R, Miquel ME. Realistic Dynamic Numerical Phantom for MRI of the Upper Vocal Tract. J Imaging 2020; 6:86. [PMID: 34460743 PMCID: PMC8320850 DOI: 10.3390/jimaging6090086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 08/08/2020] [Accepted: 08/24/2020] [Indexed: 11/16/2022] Open
Abstract
Dynamic and real-time MRI (rtMRI) of human speech is an active field of research, with interest from both the linguistics and clinical communities. At present, different research groups are investigating a range of rtMRI acquisition and reconstruction approaches to visualise the speech organs. Similar to other moving organs, it is difficult to create a physical phantom of the speech organs to optimise these approaches; therefore, the optimisation requires extensive scanner access and imaging of volunteers. As previously demonstrated in cardiac imaging, realistic numerical phantoms can be useful tools for optimising rtMRI approaches and reduce reliance on scanner access and imaging volunteers. However, currently, no such speech rtMRI phantom exists. In this work, a numerical phantom for optimising speech rtMRI approaches was developed and tested on different reconstruction schemes. The novel phantom comprised a dynamic image series and corresponding k-space data of a single mid-sagittal slice with a temporal resolution of 30 frames per second (fps). The phantom was developed based on images of a volunteer acquired at a frame rate of 10 fps. The creation of the numerical phantom involved the following steps: image acquisition, image enhancement, segmentation, mask optimisation, through-time and spatial interpolation and finally the derived k-space phantom. The phantom was used to: (1) test different k-space sampling schemes (Cartesian, radial and spiral); (2) create lower frame rate acquisitions by simulating segmented k-space acquisitions; (3) simulate parallel imaging reconstructions (SENSE and GRAPPA). This demonstrated how such a numerical phantom could be used to optimise images and test multiple sampling strategies without extensive scanner access.
Collapse
Affiliation(s)
- Joe Martin
- MR Physics, Guy’s and St Thomas’ NHS Foundation Trust, St Thomas’s Hospital, London SE1 7EH, UK;
| | - Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, St Bartholomew’s Hospital, London EC1A 7BE, UK;
| | - Redha Boubertakh
- Singapore Bioimaging Consortium (SBIC), Singapore 138667, Singapore;
| | - Marc E. Miquel
- Clinical Physics, Barts Health NHS Trust, St Bartholomew’s Hospital, London EC1A 7BE, UK;
- Centre for Advanced Cardiovascular Imaging, NIHR Barts Biomedical Research Centre (BRC), William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| |
Collapse
|
12
|
Alexander R, Sorensen T, Toutios A, Narayanan S. A modular architecture for articulatory synthesis from gestural specification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:4458. [PMID: 31893678 PMCID: PMC7043897 DOI: 10.1121/1.5139413] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 09/19/2019] [Accepted: 11/11/2019] [Indexed: 06/10/2023]
Abstract
This paper proposes a modular architecture for articulatory synthesis from a gestural specification comprising relatively simple models for the vocal tract, the glottis, aero-acoustics, and articulatory control. The vocal tract module combines a midsagittal statistical analysis articulatory model, derived by factor analysis of air-tissue boundaries in real-time magnetic resonance imaging data, with an αβ model for converting midsagittal section to area function specifications. The aero-acoustics and glottis models were based on a software implementation of classic work by Maeda. The articulatory control module uses dynamical systems, which implement articulatory gestures, to animate the statistical articulatory model, inspired by the task dynamics model. Results on synthesizing vowel-consonant-vowel sequences with plosive consonants, using models that were built on data from, and simulate the behavior of, two different speakers are presented.
Collapse
Affiliation(s)
- Rachel Alexander
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Tanner Sorensen
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Asterios Toutios
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Shrikanth Narayanan
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| |
Collapse
|
13
|
Turkmen HI, Karsligil ME. Advanced computing solutions for analysis of laryngeal disorders. Med Biol Eng Comput 2019; 57:2535-2552. [DOI: 10.1007/s11517-019-02031-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 08/13/2019] [Indexed: 11/29/2022]
|
14
|
Sorensen T, Toutios A, Goldstein L, Narayanan S. Task-dependence of articulator synergies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1504. [PMID: 31067947 PMCID: PMC6910022 DOI: 10.1121/1.5093538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 02/15/2019] [Accepted: 02/19/2019] [Indexed: 06/09/2023]
Abstract
In speech production, the motor system organizes articulators such as the jaw, tongue, and lips into synergies whose function is to produce speech sounds by forming constrictions at the phonetic places of articulation. The present study tests whether synergies for different constriction tasks differ in terms of inter-articulator coordination. The test is conducted on utterances [ɑpɑ], [ɑtɑ], [ɑiɑ], and [ɑkɑ] with a real-time magnetic resonance imaging biomarker that is computed using a statistical model of the forward kinematics of the vocal tract. The present study is the first to estimate the forward kinematics of the vocal tract from speech production data. Using the imaging biomarker, the study finds that the jaw contributes least to the velar stop for [k], more to pharyngeal approximation for [ɑ], still more to palatal approximation for [i], and most to the coronal stop for [t]. Additionally, the jaw contributes more to the coronal stop for [t] than to the bilabial stop for [p]. Finally, the study investigates how this pattern of results varies by participant. The study identifies differences in inter-articulator coordination by constriction task, which support the claim that inter-articulator coordination differs depending on the active articulator synergy.
Collapse
Affiliation(s)
- Tanner Sorensen
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Asterios Toutios
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA
| | - Shrikanth Narayanan
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
15
|
Kim YC. Fast upper airway magnetic resonance imaging for assessment of speech production and sleep apnea. PRECISION AND FUTURE MEDICINE 2018. [DOI: 10.23838/pfm.2018.00100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
|
16
|
Ramanarayanan V, Tilsen S, Proctor M, Töger J, Goldstein L, Nayak KS, Narayanan S. Analysis of speech production real-time MRI. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2018.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
17
|
Oh M, Lee Y. ACT: An Automatic Centroid Tracking tool for analyzing vocal tract actions in real-time magnetic resonance imaging speech production data. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:EL290. [PMID: 30404513 PMCID: PMC6192793 DOI: 10.1121/1.5057367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 08/28/2018] [Accepted: 09/11/2018] [Indexed: 06/08/2023]
Abstract
Real-time magnetic resonance imaging (MRI) speech production data have expanded the understanding of vocal tract actions. This letter presents an Automatic Centroid Tracking tool, ACT, which obtains both spatial and temporal information characterizing multi-directional articulatory movement. ACT auto-segments an articulatory object composed of connected pixels in a real-time MRI video, by finding its intensity centroids over time and returns kinematic profiles including direction and magnitude information of the object. This letter discusses the utility of ACT, which outperforms other similar object tracking techniques, by demonstrating its successful online tracking of vertical larynx movement. ACT can be deployed generally for dynamic image processing and analysis.
Collapse
Affiliation(s)
- Miran Oh
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA ,
| | - Yoonjeong Lee
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA ,
| |
Collapse
|
18
|
Lim Y, Lingala SG, Narayanan SS, Nayak KS. Dynamic off-resonance correction for spiral real-time MRI of speech. Magn Reson Med 2018; 81:234-246. [PMID: 30058147 DOI: 10.1002/mrm.27373] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/01/2018] [Accepted: 05/02/2018] [Indexed: 02/03/2023]
Abstract
PURPOSE To improve the depiction and tracking of vocal tract articulators in spiral real-time MRI (RT-MRI) of speech production by estimating and correcting for dynamic changes in off-resonance. METHODS The proposed method computes a dynamic field map from the phase of single-TE dynamic images after a coil phase compensation where complex coil sensitivity maps are estimated from the single-TE dynamic scan itself. This method is tested using simulations and in vivo data. The depiction of air-tissue boundaries is evaluated quantitatively using a sharpness metric and visual inspection. RESULTS Simulations demonstrate that the proposed method provides robust off-resonance correction for spiral readout durations up to 5 ms at 1.5T. In -vivo experiments during human speech production demonstrate that image sharpness is improved in a majority of data sets at air-tissue boundaries including the upper lip, hard palate, soft palate, and tongue boundaries, whereas the lower lip shows little improvement in the edge sharpness after correction. CONCLUSION Dynamic off-resonance correction is feasible from single-TE spiral RT-MRI data, and provides a practical performance improvement in articulator sharpness when applied to speech production imaging.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| |
Collapse
|
19
|
Neelapu BC, Kharbanda OP, Sardana HK, Gupta A, Vasamsetti S, Balachandran R, Rana SS, Sardana V. The reliability of different methods of manual volumetric segmentation of pharyngeal and sinonasal subregions. Oral Surg Oral Med Oral Pathol Oral Radiol 2017; 124:577-587. [DOI: 10.1016/j.oooo.2017.08.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Revised: 08/21/2017] [Accepted: 08/27/2017] [Indexed: 11/25/2022]
|
20
|
Hoole P, Pouplier M. Öhman returns: New horizons in the collection and analysis of imaging data in speech production research. COMPUT SPEECH LANG 2017. [DOI: 10.1016/j.csl.2017.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
21
|
Töger J, Sorensen T, Somandepalli K, Toutios A, Lingala SG, Narayanan S, Nayak K. Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:3323. [PMID: 28599561 PMCID: PMC5436977 DOI: 10.1121/1.4983081] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Static anatomical and real-time dynamic magnetic resonance imaging (RT-MRI) of the upper airway is a valuable method for studying speech production in research and clinical settings. The test-retest repeatability of quantitative imaging biomarkers is an important parameter, since it limits the effect sizes and intragroup differences that can be studied. Therefore, this study aims to present a framework for determining the test-retest repeatability of quantitative speech biomarkers from static MRI and RT-MRI, and apply the framework to healthy volunteers. Subjects (n = 8, 4 females, 4 males) are imaged in two scans on the same day, including static images and dynamic RT-MRI of speech tasks. The inter-study agreement is quantified using intraclass correlation coefficient (ICC) and mean within-subject standard deviation (σe). Inter-study agreement is strong to very strong for static measures (ICC: min/median/max 0.71/0.89/0.98, σe: 0.90/2.20/6.72 mm), poor to strong for dynamic RT-MRI measures of articulator motion range (ICC: 0.26/0.75/0.90, σe: 1.6/2.5/3.6 mm), and poor to very strong for velocities (ICC: 0.21/0.56/0.93, σe: 2.2/4.4/16.7 cm/s). In conclusion, this study characterizes repeatability of static and dynamic MRI-derived speech biomarkers using state-of-the-art imaging. The introduced framework can be used to guide future development of speech biomarkers. Test-retest MRI data are provided free for research use.
Collapse
Affiliation(s)
- Johannes Töger
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Tanner Sorensen
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Krishna Somandepalli
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| | - Krishna Nayak
- Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, EEB 400, Los Angeles, California 90089-2560, USA
| |
Collapse
|
22
|
Prasad A, Ghosh PK. Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition. COMPUT SPEECH LANG 2016. [DOI: 10.1016/j.csl.2016.03.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
Carey D, McGettigan C. Magnetic resonance imaging of the brain and vocal tract: Applications to the study of speech production and language learning. Neuropsychologia 2016; 98:201-211. [PMID: 27288115 DOI: 10.1016/j.neuropsychologia.2016.06.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 06/02/2016] [Accepted: 06/05/2016] [Indexed: 10/21/2022]
Abstract
The human vocal system is highly plastic, allowing for the flexible expression of language, mood and intentions. However, this plasticity is not stable throughout the life span, and it is well documented that adult learners encounter greater difficulty than children in acquiring the sounds of foreign languages. Researchers have used magnetic resonance imaging (MRI) to interrogate the neural substrates of vocal imitation and learning, and the correlates of individual differences in phonetic "talent". In parallel, a growing body of work using MR technology to directly image the vocal tract in real time during speech has offered primarily descriptive accounts of phonetic variation within and across languages. In this paper, we review the contribution of neural MRI to our understanding of vocal learning, and give an overview of vocal tract imaging and its potential to inform the field. We propose methods by which our understanding of speech production and learning could be advanced through the combined measurement of articulation and brain activity using MRI - specifically, we describe a novel paradigm, developed in our laboratory, that uses both MRI techniques to for the first time map directly between neural, articulatory and acoustic data in the investigation of vocalisation. This non-invasive, multimodal imaging method could be used to track central and peripheral correlates of spoken language learning, and speech recovery in clinical settings, as well as provide insights into potential sites for targeted neural interventions.
Collapse
Affiliation(s)
- Daniel Carey
- Department of Psychology, Royal Holloway, University of London, Egham, UK
| | - Carolyn McGettigan
- Department of Psychology, Royal Holloway, University of London, Egham, UK
| |
Collapse
|
24
|
Tong Y, Udupa JK, Odhner D, Wu C, Sin S, Wagshul ME, Arens R. Minimally interactive segmentation of 4D dynamic upper airway MR images via fuzzy connectedness. Med Phys 2016; 43:2323. [PMID: 27147344 DOI: 10.1118/1.4945698] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
PURPOSE There are several disease conditions that lead to upper airway restrictive disorders. In the study of these conditions, it is important to take into account the dynamic nature of the upper airway. Currently, dynamic magnetic resonance imaging is the modality of choice for studying these diseases. Unfortunately, the contrast resolution obtainable in the images poses many challenges for an effective segmentation of the upper airway structures. No viable methods have been developed to date to solve this problem. In this paper, the authors demonstrate a practical solution by employing an iterative relative fuzzy connectedness delineation algorithm as a tool. METHODS 3D dynamic images were collected at ten equally spaced instances over the respiratory cycle (i.e., 4D) in 20 female subjects with obstructive sleep apnea syndrome. The proposed segmentation approach consists of the following steps. First, image background nonuniformities are corrected which is then followed by a process to correct for the nonstandardness of MR image intensities. Next, standardized image intensity statistics are gathered for the nasopharynx and oropharynx portions of the upper airway as well as the surrounding soft tissue structures including air outside the body region, hard palate, soft palate, tongue, and other soft structures around the airway including tonsils (left and right) and adenoid. The affinity functions needed for fuzzy connectedness computation are derived based on these tissue intensity statistics. In the next step, seeds for fuzzy connectedness computation are specified for the airway and the background tissue components. Seed specification is needed in only the 3D image corresponding to the first time instance of the 4D volume; from this information, the 3D volume corresponding to the first time point is segmented. Seeds are automatically generated for the next time point from the segmentation of the 3D volume corresponding to the previous time point, and the process continues and runs without human interaction and completes in 10 s for segmenting the airway structure in the whole 4D volume. RESULTS Qualitative evaluations performed to examine smoothness and continuity of motions of the entire upper airway as well as its transverse sections at critical anatomic locations indicate that the segmentations are consistent. Quantitative evaluations of the separate 200 3D volumes and the 20 4D volumes yielded true positive and false positive volume fractions around 95% and 0.1%, respectively, and mean boundary placement errors under 0.5 mm. The method is robust to variations in the subjective action of seed specification. Compared with a segmentation approach based on a registration technique to propagate segmentations, the proposed method is more efficient, accurate, and less prone to error propagation from one respiratory time point to the next. CONCLUSIONS The proposed method is the first demonstration of a viable and practical approach for segmenting the upper airway structures in dynamic MR images. Compared to registration-based methods, it effectively reduces error propagation and consequently achieves not only more accurate segmentations but also more consistent motion representation in the segmentations. The method is practical, requiring minimal user interaction and computational time.
Collapse
Affiliation(s)
- Yubing Tong
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Jayaram K Udupa
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Dewey Odhner
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Caiyun Wu
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Sanghun Sin
- Division of Respiratory and Sleep Medicine, The Children's Hospital at Montefiore, Albert Einstein College of Medicine, Bronx, New York 10467
| | - Mark E Wagshul
- Department of Radiology, Gruss MRRC, Albert Einstein College of Medicine, Bronx, New York 10467
| | - Raanan Arens
- Division of Respiratory and Sleep Medicine, The Children's Hospital at Montefiore, Albert Einstein College of Medicine, Bronx, New York 10467
| |
Collapse
|
25
|
Toutios A, Narayanan SS. Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING 2016; 5:e6. [PMID: 27833745 PMCID: PMC5100697 DOI: 10.1017/atsip.2016.5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Collapse
Affiliation(s)
- Asterios Toutios
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| |
Collapse
|
26
|
|
27
|
Lingala SG, Zhu Y, Kim YC, Toutios A, Narayanan S, Nayak KS. A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn Reson Med 2016; 77:112-125. [PMID: 26778178 DOI: 10.1002/mrm.26090] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Revised: 11/06/2015] [Accepted: 11/24/2015] [Indexed: 11/07/2022]
Abstract
PURPOSE The aim of this work was to develop and evaluate an MRI-based system for study of dynamic vocal tract shaping during speech production, which provides high spatial and temporal resolution. METHODS The proposed system utilizes (a) custom eight-channel upper airway coils that have high sensitivity to upper airway regions of interest, (b) two-dimensional golden angle spiral gradient echo acquisition, (c) on-the-fly view-sharing reconstruction, and (d) off-line temporal finite difference constrained reconstruction. The system also provides simultaneous noise-cancelled and temporally aligned audio. The system is evaluated in 3 healthy volunteers, and 1 tongue cancer patient, with a broad range of speech tasks. RESULTS We report spatiotemporal resolutions of 2.4 × 2.4 mm2 every 12 ms for single-slice imaging, and 2.4 × 2.4 mm2 every 36 ms for three-slice imaging, which reflects roughly 7-fold acceleration over Nyquist sampling. This system demonstrates improved temporal fidelity in capturing rapid vocal tract shaping for tasks, such as producing consonant clusters in speech, and beat-boxing sounds. Novel acoustic-articulatory analysis was also demonstrated. CONCLUSION A synergistic combination of custom coils, spiral acquisitions, and constrained reconstruction enables visualization of rapid speech with high spatiotemporal resolution in multiple planes. Magn Reson Med 77:112-125, 2017. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Sajan Goud Lingala
- Electrical Engineering, University of Southern California, Los Angeles, CA
| | - Yinghua Zhu
- Electrical Engineering, University of Southern California, Los Angeles, CA
| | | | - Asterios Toutios
- Electrical Engineering, University of Southern California, Los Angeles, CA
| | | | - Krishna S Nayak
- Electrical Engineering, University of Southern California, Los Angeles, CA
| |
Collapse
|
28
|
M. Harandi N, Abugharbieh R, Fels S. 3D segmentation of the tongue in MRI: a minimally interactive model-based approach. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2015. [DOI: 10.1080/21681163.2013.864958] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
29
|
|
30
|
Lammert A, Goldstein L, Ramanarayanan V, Narayanan S. Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI. PHONETICA 2015; 71:229-248. [PMID: 25997724 DOI: 10.1159/000371820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 12/31/2014] [Indexed: 06/04/2023]
Abstract
The English past tense allomorph following a coronal stop (e.g., /bɑndəd/) includes a vocoid that has traditionally been transcribed as a schwa or as a barred i. Previous evidence has suggested that this entity does not involve a specific articulatory gesture of any kind. Rather, its presence may simply result from temporal coordination of the two temporally adjacent coronal gestures, while the interval between those two gestures remains voiced and is acoustically reminiscent of a schwa. The acoustic and articulatory characteristics of this vocoid are reexamined in this work using real-time MRI with synchronized audio which affords complete midsagittal views of the vocal tract. A novel statistical analysis is developed to address the issue of articulatory targetlessness based on previous models that predict articulatory action from segmental context. Results reinforce the idea that this vocoid is different, both acoustically and articulatorily, than lexical schwa, but its targetless nature is not supported. Data suggest that an articulatory target does exist, especially in the pharynx where it is revealed by the new data acquisition methodology. Moreover, substantial articulatory differences are observed between subjects, which highlights both the difficulty in characterizing this entity previously, and the need for further study with additional subjects.
Collapse
Affiliation(s)
- Adam Lammert
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, Calif., USA
| | | | | | | |
Collapse
|
31
|
Ibragimov B, Prince JL, Murano EZ, Woo J, Stone M, Likar B, Pernuš F, Vrtovec T. Segmentation of tongue muscles from super-resolution magnetic resonance images. Med Image Anal 2014; 20:198-207. [PMID: 25487963 DOI: 10.1016/j.media.2014.11.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 11/11/2014] [Accepted: 11/15/2014] [Indexed: 10/24/2022]
Abstract
Imaging and quantification of tongue anatomy is helpful in surgical planning, post-operative rehabilitation of tongue cancer patients, and studying of how humans adapt and learn new strategies for breathing, swallowing and speaking to compensate for changes in function caused by disease, medical interventions or aging. In vivo acquisition of high-resolution three-dimensional (3D) magnetic resonance (MR) images with clearly visible tongue muscles is currently not feasible because of breathing and involuntary swallowing motions that occur over lengthy imaging times. However, recent advances in image reconstruction now allow the generation of super-resolution 3D MR images from sets of orthogonal images, acquired at a high in-plane resolution and combined using super-resolution techniques. This paper presents, to the best of our knowledge, the first attempt towards automatic tongue muscle segmentation from MR images. We devised a database of ten super-resolution 3D MR images, in which the genioglossus and inferior longitudinalis tongue muscles were manually segmented and annotated with landmarks. We demonstrate the feasibility of segmenting the muscles of interest automatically by applying the landmark-based game-theoretic framework (GTF), where a landmark detector based on Haar-like features and an optimal assignment-based shape representation were integrated. The obtained segmentation results were validated against an independent manual segmentation performed by a second observer, as well as against B-splines and demons atlasing approaches. The segmentation performance resulted in mean Dice coefficients of 85.3%, 81.8%, 78.8% and 75.8% for the second observer, GTF, B-splines atlasing and demons atlasing, respectively. The obtained level of segmentation accuracy indicates that computerized tongue muscle segmentation may be used in surgical planning and treatment outcome analysis of tongue cancer patients, and in studies of normal subjects and subjects with speech and swallowing problems.
Collapse
Affiliation(s)
- Bulat Ibragimov
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia; Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA.
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Emi Z Murano
- Department of Otolaryngology, Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA
| | - Jonghye Woo
- Department of Radiology, Harvard Medical School/MGH, Boston, MA, USA
| | - Maureen Stone
- Department of Oral and Craniofacial Biological Sciences, University of Maryland, Baltimore, MD, USA; Department of Orthodontics, University of Maryland, Baltimore, MD, USA
| | - Boštjan Likar
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Franjo Pernuš
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Tomaž Vrtovec
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
32
|
Abstract
The laryngeal video stroboscope is an important instrument to test glottal diseases and read vocal fold images and voice quality for physician clinical diagnosis. This study is aimed to develop a medical system with functionality of automatic intelligent recognition of dynamic images. The static images of glottis opening to the largest extent and closing to the smallest extent were screened automatically using color space transformation and image preprocessing. The glottal area was also quantized. As the tongue base movements affected the position of laryngoscope and saliva would result in unclear images, this study used the gray scale adaptive entropy value to set the threshold in order to establish an elimination system. The proposed system can improve the effect of automatically captured images of glottis and achieve an accuracy rate of 96%. In addition, the glottal area and area segmentation threshold were calculated effectively. The glottis area segmentation was corrected, and the glottal area waveform pattern was drawn automatically to assist in vocal fold diagnosis. When developing the intelligent recognition system for vocal fold disorders, this study analyzed the characteristic values of four vocal fold patterns, namely, normal vocal fold, vocal fold paralysis, vocal fold polyp, and vocal fold cyst. It also used the support vector machine classifier to identify vocal fold disorders and achieved an identification accuracy rate of 98.75%. The results can serve as a very valuable reference for diagnosis.
Collapse
|
33
|
Narayanan S, Toutios A, Ramanarayanan V, Lammert A, Kim J, Lee S, Nayak K, Kim YC, Zhu Y, Goldstein L, Byrd D, Bresch E, Ghosh P, Katsamanis A, Proctor M. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:1307. [PMID: 25190403 PMCID: PMC4165284 DOI: 10.1121/1.4890284] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community.
Collapse
Affiliation(s)
- Shrikanth Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Asterios Toutios
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Vikram Ramanarayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Adam Lammert
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Jangwon Kim
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Sungbok Lee
- Signal Analysis and Interpretation Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Krishna Nayak
- Magnetic Resonance Engineering Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Yoon-Chul Kim
- Magnetic Resonance Engineering Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Yinghua Zhu
- Magnetic Resonance Engineering Laboratory, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089-2564
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, 3601 Watt Way, Los Angeles, California 90089-1693
| | - Dani Byrd
- Department of Linguistics, University of Southern California, 3601 Watt Way, Los Angeles, California 90089-1693
| | - Erik Bresch
- Philips Research, High Tech Campus 5, 5656 AE, Eindhoven, Netherlands
| | - Prasanta Ghosh
- Department of Electrical Engineering, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Athanasios Katsamanis
- School of Electrical and Computer Engineering, National Technical University of Athens, Iroon Polytexneiou Street, Athens 15773, Greece
| | - Michael Proctor
- ARC Centre of Excellence in Cognition and its Disorders and Department of Linguistics, Macquarie University, New South Wales 2109, Australia
| |
Collapse
|
34
|
Semi-automatic segmentation for 3D motion analysis of the tongue with dynamic MRI. Comput Med Imaging Graph 2014; 38:714-24. [PMID: 25155697 DOI: 10.1016/j.compmedimag.2014.07.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 06/06/2014] [Accepted: 07/21/2014] [Indexed: 11/23/2022]
Abstract
Dynamic MRI has been widely used to track the motion of the tongue and measure its internal deformation during speech and swallowing. Accurate segmentation of the tongue is a prerequisite step to define the target boundary and constrain the tracking to tissue points within the tongue. Segmentation of 2D slices or 3D volumes is challenging because of the large number of slices and time frames involved in the segmentation, as well as the incorporation of numerous local deformations that occur throughout the tongue during motion. In this paper, we propose a semi-automatic approach to segment 3D dynamic MRI of the tongue. The algorithm steps include seeding a few slices at one time frame, propagating seeds to the same slices at different time frames using deformable registration, and random walker segmentation based on these seed positions. This method was validated on the tongue of five normal subjects carrying out the same speech task with multi-slice 2D dynamic cine-MR images obtained at three orthogonal orientations and 26 time frames. The resulting semi-automatic segmentations of a total of 130 volumes showed an average dice similarity coefficient (DSC) score of 0.92 with less segmented volume variability between time frames than in manual segmentations.
Collapse
|
35
|
Scott AD, Wylezinska M, Birch MJ, Miquel ME. Speech MRI: morphology and function. Phys Med 2014; 30:604-18. [PMID: 24880679 DOI: 10.1016/j.ejmp.2014.05.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 04/24/2014] [Accepted: 05/01/2014] [Indexed: 11/27/2022] Open
Abstract
Magnetic Resonance Imaging (MRI) plays an increasing role in the study of speech. This article reviews the MRI literature of anatomical imaging, imaging for acoustic modelling and dynamic imaging. It describes existing imaging techniques attempting to meet the challenges of imaging the upper airway during speech and examines the remaining hurdles and future research directions.
Collapse
Affiliation(s)
- Andrew D Scott
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; NIHR Cardiovascular Biomedical Research Unit, The Royal Brompton Hospital, Sydney Street, London SW3 6NP, United Kingdom
| | - Marzena Wylezinska
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom
| | - Malcolm J Birch
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom
| | - Marc E Miquel
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom.
| |
Collapse
|
36
|
Ramanarayanan V, Goldstein L, Byrd D, Narayanan SS. An investigation of articulatory setting using real-time magnetic resonance imaging. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:510-9. [PMID: 23862826 PMCID: PMC3724797 DOI: 10.1121/1.4807639] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This paper presents an automatic procedure to analyze articulatory setting in speech production using real-time magnetic resonance imaging of the moving human vocal tract. The procedure extracts frames corresponding to inter-speech pauses, speech-ready intervals and absolute rest intervals from magnetic resonance imaging sequences of read and spontaneous speech elicited from five healthy speakers of American English and uses automatically extracted image features to quantify vocal tract posture during these intervals. Statistical analyses show significant differences between vocal tract postures adopted during inter-speech pauses and those at absolute rest before speech; the latter also exhibits a greater variability in the adopted postures. In addition, the articulatory settings adopted during inter-speech pauses in read and spontaneous speech are distinct. The results suggest that adopted vocal tract postures differ on average during rest positions, ready positions and inter-speech pauses, and might, in that order, involve an increasing degree of active control by the cognitive speech planning mechanism.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.
| | | | | | | |
Collapse
|
37
|
Srinivasan A, Sundaram S. Applications of deformable models for in-dopth analysis and feature extraction from medical images—A review. PATTERN RECOGNITION AND IMAGE ANALYSIS 2013. [DOI: 10.1134/s1054661813020132] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
38
|
Birkholz P. Modeling consonant-vowel coarticulation for articulatory speech synthesis. PLoS One 2013; 8:e60603. [PMID: 23613734 PMCID: PMC3628899 DOI: 10.1371/journal.pone.0060603] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 02/28/2013] [Indexed: 11/19/2022] Open
Abstract
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
Collapse
Affiliation(s)
- Peter Birkholz
- Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
39
|
Lammert A, Goldstein L, Narayanan S, Iskarous K. Statistical Methods for Estimation of Direct and Differential Kinematics of the Vocal Tract. SPEECH COMMUNICATION 2013; 55:147-161. [PMID: 24052685 PMCID: PMC3774006 DOI: 10.1016/j.specom.2012.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We present and evaluate two statistical methods for estimating kinematic relationships of the speech production system: Artificial Neural Networks and Locally-Weighted Regression. The work is motivated by the need to characterize this motor system, with particular focus on estimating differential aspects of kinematics. Kinematic analysis will facilitate progress in a variety of areas, including the nature of speech production goals, articulatory redundancy and, relatedly, acoustic-to-articulatory inversion. Statistical methods must be used to estimate these relationships from data since they are infeasible to express in closed form. Statistical models are optimized and evaluated - using a heldout data validation procedure - on two sets of synthetic speech data. The theoretical and practical advantages of both methods are also discussed. It is shown that both direct and differential kinematics can be estimated with high accuracy, even for complex, nonlinear relationships. Locally-Weighted Regression displays the best overall performance, which may be due to practical advantages in its training procedure. Moreover, accurate estimation can be achieved using only a modest amount of training data, as judged by convergence of performance. The algorithms are also applied to real-time MRI data, and the results are generally consistent with those obtained from synthetic data.
Collapse
Affiliation(s)
- Adam Lammert
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Grace Ford Salvatory 301, Los Angeles, CA 90089-1693, USA
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT, 06511, USA
| | - Shrikanth Narayanan
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA
- Department of Linguistics, University of Southern California, Grace Ford Salvatory 301, Los Angeles, CA 90089-1693, USA
| | - Khalil Iskarous
- Department of Linguistics, University of Southern California, Grace Ford Salvatory 301, Los Angeles, CA 90089-1693, USA
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT, 06511, USA
| |
Collapse
|
40
|
Kim YC, Narayanan SS, Nayak KS. Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order. Magn Reson Med 2010; 65:1365-71. [PMID: 21500262 DOI: 10.1002/mrm.22714] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Revised: 10/08/2010] [Accepted: 10/12/2010] [Indexed: 11/09/2022]
Abstract
In speech production research using real-time magnetic resonance imaging (MRI), the analysis of articulatory dynamics is performed retrospectively. A flexible selection of temporal resolution is highly desirable because of natural variations in speech rate and variations in the speed of different articulators. The purpose of the study is to demonstrate a first application of golden-ratio spiral temporal view order to real-time speech MRI and investigate its performance by comparison with conventional bit-reversed temporal view order. Golden-ratio view order proved to be more effective at capturing the dynamics of rapid tongue tip motion. A method for automated blockwise selection of temporal resolution is presented that enables the synthesis of a single video from multiple temporal resolution videos and potentially facilitates subsequent vocal tract shape analysis.
Collapse
Affiliation(s)
- Yoon-Chul Kim
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089-2564, USA.
| | | | | |
Collapse
|
41
|
Bresch E, Narayanan S. Real-time magnetic resonance imaging investigation of resonance tuning in soprano singing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:EL335-41. [PMID: 21110548 PMCID: PMC2997814 DOI: 10.1121/1.3499700] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This article investigates using real-time magnetic resonance imaging the vocal tract shaping of 5 soprano singers during the production of two-octave scales of sung vowels. A systematic shift of the first vocal tract resonance frequency with respect to the fundamental is shown to exist for high vowels across all subjects. No consistent systematic effect on the vocal tract resonance could be shown across all of the subjects for other vowels or for the second vocal tract resonance.
Collapse
Affiliation(s)
- Erik Bresch
- Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089, USA.
| | | |
Collapse
|
42
|
Ramanarayanan V, Bresch E, Byrd D, Goldstein L, Narayanan SS. Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:EL160-5. [PMID: 19894792 PMCID: PMC2776778 DOI: 10.1121/1.3213452] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 07/30/2009] [Indexed: 05/22/2023]
Abstract
It is hypothesized that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. Real-time magnetic resonance imaging is used to analyze articulation at and around grammatical and ungrammatical pauses in spontaneous speech. Measures quantifying the speed of articulators were developed and applied during these pauses as well as during their immediate neighborhoods. Grammatical pauses were found to have an appreciable drop in speed at the pause itself as compared to ungrammatical pauses, which is consistent with our hypothesis that grammatical pauses are indeed choreographed by a central cognitive planner.
Collapse
|