1
|
Xing F, Jin R, Gilbert I, El Fakhri G, Perry J, Sutton B, Woo J. Quantifying Velopharyngeal Motion Variation in Speech Sound Production Using an Audio-Informed Dynamic MRI Atlas. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2023; 12464:124642M. [PMID: 37621417 PMCID: PMC10448831 DOI: 10.1117/12.2654082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
New developments in dynamic magnetic resonance imaging (MRI) facilitate high-quality data acquisition of human velopharyngeal deformations in real-time speech. With recently established speech motion atlases, group analysis is made possible via spatially and temporally aligned datasets in the atlas space from a desired population of interest. In practice, when analyzing motion characteristics from various subjects performing a designated speech task, it is observed that different subjects' velopharyngeal deformation patterns could vary during the pronunciation of the same utterance, regardless of the spatial and temporal alignment of their MRI. Since such variation can be subtle, identification and extraction of unique patterns out of these high-dimensional datasets is a challenging task. In this work, we present a method that computes and visualizes subtle deformation variation patterns as principal components of a subject group's dynamic motion fields in the atlas space. Coupled with the real-time speech audio recordings during image acquisition, the key time frames that contain maximum speech variations are identified by the principal components of temporally aligned audio waveforms, which in turn inform the temporal location of the maximum spatial deformation variation. Henceforth, the motion fields between the key frames and the reference frame for each subject are computed and warped into the common atlas space, enabling a direct extraction of motion variation patterns via quantitative analysis. The method was evaluated on a dataset of twelve healthy subjects. Subtle velopharyngeal motion differences were visualized quantitatively to reveal pronunciation-specific patterns among different subjects.
Collapse
Affiliation(s)
- Fangxu Xing
- Dept. Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, US 02114
| | - Riwei Jin
- Dept. Bioengineering, University of Illinois at Urbana-Champaign, Champaign, IL, US 61801
| | - Imani Gilbert
- Dept. Communication Sciences and Disorders, East Carolina University, Greenville, NC, US 27858
| | - Georges El Fakhri
- Dept. Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, US 02114
| | - Jamie Perry
- Dept. Communication Sciences and Disorders, East Carolina University, Greenville, NC, US 27858
| | - Bradley Sutton
- Dept. Bioengineering, University of Illinois at Urbana-Champaign, Champaign, IL, US 61801
| | - Jonghye Woo
- Dept. Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, MA, US 02114
| |
Collapse
|
2
|
3D Dynamic Spatiotemporal Atlas of the Vocal Tract during Consonant–Vowel Production from 2D Real Time MRI. J Imaging 2022; 8:jimaging8090227. [PMID: 36135393 PMCID: PMC9504642 DOI: 10.3390/jimaging8090227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/12/2022] [Accepted: 08/19/2022] [Indexed: 11/21/2022] Open
Abstract
In this work, we address the problem of creating a 3D dynamic atlas of the vocal tract that captures the dynamics of the articulators in all three dimensions in order to create a global speaker model independent of speaker-specific characteristics. The core steps of the proposed method are the temporal alignment of the real-time MR images acquired in several sagittal planes and their combination with adaptive kernel regression. As a preprocessing step, a reference space was created to be used in order to remove anatomical information of the speakers and keep only the variability in speech production for the construction of the atlas. The adaptive kernel regression makes the choice of atlas time points independently of the time points of the frames that are used as an input for the construction. The evaluation of this atlas construction method was made by mapping two new speakers to the atlas and by checking how similar the resulting mapped images are. The use of the atlas helps in reducing subject variability. The results show that the use of the proposed atlas can capture the dynamic behavior of the articulators and is able to generalize the speech production process by creating a universal-speaker reference space.
Collapse
|
3
|
Xing F, Jin R, Gilbert IR, Perry JL, Sutton BP, Liu X, El Fakhri G, Shosted RK, Woo J. 4D magnetic resonance imaging atlas construction using temporally aligned audio waveforms in speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3500. [PMID: 34852570 PMCID: PMC8580575 DOI: 10.1121/10.0007064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 09/16/2021] [Accepted: 10/15/2021] [Indexed: 06/13/2023]
Abstract
Magnetic resonance (MR) imaging is becoming an established tool in capturing articulatory and physiological motion of the structures and muscles throughout the vocal tract and enabling visual and quantitative assessment of real-time speech activities. Although motion capture speed has been regularly improved by the continual developments in high-speed MR technology, quantitative analysis of multi-subject group data remains challenging due to variations in speaking rate and imaging time among different subjects. In this paper, a workflow of post-processing methods that matches different MR image datasets within a study group is proposed. Each subject's recorded audio waveform during speech is used to extract temporal domain information and generate temporal alignment mappings from their matching pattern. The corresponding image data are resampled by deformable registration and interpolation of the deformation fields, achieving inter-subject temporal alignment between image sequences. A four-dimensional dynamic MR speech atlas is constructed using aligned volumes from four human subjects. Similarity tests between subject and target domains using the squared error, cross correlation, and mutual information measures all show an overall score increase after spatiotemporal alignment. The amount of image variability in atlas construction is reduced, indicating a quality increase in the multi-subject data for groupwise quantitative analysis.
Collapse
Affiliation(s)
- Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Riwei Jin
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Imani R Gilbert
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, North Carolina 27858, USA
| | - Jamie L Perry
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, North Carolina 27858, USA
| | - Bradley P Sutton
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Xiaofeng Liu
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Ryan K Shosted
- Department of Linguistics, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
4
|
Chu CA, Chen YJ, Chang KV, Wu WT, Özçakar L. Reliability of Sonoelastography Measurement of Tongue Muscles and Its Application on Obstructive Sleep Apnea. Front Physiol 2021; 12:654667. [PMID: 33841189 PMCID: PMC8027470 DOI: 10.3389/fphys.2021.654667] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 03/08/2021] [Indexed: 12/18/2022] Open
Abstract
Few studies have explored the feasibility of shear-wave ultrasound elastography (SWUE) for evaluating the upper airways of patients with obstructive sleep apnea (OSA). This study aimed to establish a reliable SWUE protocol for evaluating tongue muscle elasticity and its feasibility and utility in differentiating patients with OSA. Inter-rater and intra-rater reliability of SWUE measurements were tested using the intraclass correlation coefficients. Submental ultrasound was used to measure tongue thickness and stiffness. Association between the ultrasound measurements and presence of OSA was analyzed using multivariate logistic regression. One-way analysis of variance was used to examine if the values of the ultrasound parameters varied among patients with different severities of OSA. Overall, 37 healthy subjects and 32 patients with OSA were recruited. The intraclass correlation coefficients of intra‐ and inter-rater reliability for SWUE for tongue stiffness ranged from 0.84 to 0.90. After adjusting for age, sex, neck circumference, and body mass index, the risk for OSA was positively associated with tongue thickness [odds ratio 1.16 (95% confidence interval 1.01–1.32)] and negatively associated with coronal imaging of tongue muscle stiffness [odds ratio 0.72 (95% confidence interval 0.54–0.95)]. There were no significant differences in tongue stiffness among OSA patients with varying disease severity. SWUE provided a reliable evaluation of tongue muscle stiffness, which appeared to be softer in patients with OSA. Future longitudinal studies are necessary to investigate the relationship between tongue softening and OSA, as well as response to treatment.
Collapse
Affiliation(s)
- Cheng-An Chu
- Department of Dentistry, School of Dentistry, National Taiwan University Hospital, Taipei, Taiwan
| | - Yunn-Jy Chen
- Department of Dentistry, School of Dentistry, National Taiwan University Hospital, Taipei, Taiwan
| | - Ke-Vin Chang
- Department of Physical Medicine and Rehabilitation and Community and Geriatric Research Center, National Taiwan University Hospital, Bei-Hu Branch and National Taiwan University College of Medicine, Taipei, Taiwan.,Center for Regional Anesthesia and Pain Medicine, Wang-Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Wei-Ting Wu
- Department of Physical Medicine and Rehabilitation and Community and Geriatric Research Center, National Taiwan University Hospital, Bei-Hu Branch and National Taiwan University College of Medicine, Taipei, Taiwan
| | - Levent Özçakar
- Department of Physical and Rehabilitation Medicine, Hacettepe University Medical School, Ankara, Turkey
| |
Collapse
|
5
|
Gomez AD, Stone ML, Woo J, Xing F, Prince JL. Analysis of fiber strain in the human tongue during speech. Comput Methods Biomech Biomed Engin 2020; 23:312-322. [PMID: 32031425 DOI: 10.1080/10255842.2020.1722808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
This study investigates mechanical cooperation among tongue muscles. Five volunteers were imaged using tagged magnetic resonance imaging to quantify spatiotemporal kinematics while speaking. Waveforms of strain in the line of action of fibers (SLAF) were estimated by projecting strain tensors onto a model of fiber directionality. SLAF waveforms were temporally aligned to determine consistency across subjects and correlation across muscles. The cohort exhibited consistent patterns of SLAF, and muscular extension-contraction was correlated. Volume-preserving tongue movement in speech generation can be achieved through multiple paths, but the study reveals similarities in motion patterns and muscular action-despite anatomical (and other) dissimilarities.
Collapse
Affiliation(s)
- Arnold D Gomez
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Maureen L Stone
- Department of Neural and Pain Sciences, University of Maryland, Baltimore, MD, USA
| | - Jonghye Woo
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - Fangxu Xing
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
6
|
Gomez AD, Stone ML, Bayly PV, Prince JL. Quantifying Tensor Field Similarity With Global Distributions and Optimal Transport. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2018; 11071:428-436. [PMID: 33196063 DOI: 10.1007/978-3-030-00934-2_48] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Strain tensor fields quantify tissue deformation and are important for functional analysis of moving organs such as the heart and the tongue. Strain data can be readily obtained using medical imaging. However, quantification of similarity between different data sets is difficult. Strain patterns vary in space and time, and are inherently multidimensional. Also, the same type of mechanical deformation can be applied to different shapes; hence, automatic quantification of similarity should be unaffected by the geometry of the objects being deformed. This work introduces the application of global distributions used to classify shapes and vector fields in the pattern recognition literature, in the context of tensorial strain data. In particular, the distribution of mechanical properties of a field are approximated using a 3D histogram, and the Wasserstein distance from optimal transport theory is used to measure the similarity between histograms. To measure the method's consistency in matching deformations across different objects, the proposed approach was evaluated by sorting strain fields according to their similarity. Performance was compared to sorting via maximum shear distribution (a 1D histogram) and tensor residual magnitude (in perfectly registered objects). The technique was also applied to correlate muscle activation to muscular contraction observed via tagged MRI. The results show that the proposed approach accurately matches deformation regardless of the shape of the object being deformed. Sorting accuracy surpassed 1D shear distribution and was on par with residual magnitude, but without the need for registration between objects.
Collapse
Affiliation(s)
- Arnold D Gomez
- Electrical and Computer Engineerng Department, Jonhs Hopkins University, Baltimore, USA
| | - Maureen L Stone
- Department of Neural and Pain Sciences, University of Maryland Baltimore, USA
| | - Philip V Bayly
- Mechanical Engineering Department, Washington University in St. Louis, St. Louis, USA
| | - Jerry L Prince
- Electrical and Computer Engineerng Department, Jonhs Hopkins University, Baltimore, USA
| |
Collapse
|
7
|
Tolpadi AA, Stone ML, Carass A, Prince JL, Gomez AD. Inverse Biomechanical Modeling of the Tongue via Machine Learning and Synthetic Training Data. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2018; 10576. [PMID: 29997406 DOI: 10.1117/12.2296927] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The tongue's deformation during speech can be measured using tagged magnetic resonance imaging, but there is no current method to directly measure the pattern of muscles that activate to produce a given motion. In this paper, the activation pattern of the tongue's muscles is estimated by solving an inverse problem using a random forest. Examples describing different activation patterns and the resulting deformations are generated using a finite-element model of the tongue. These examples form training data for a random forest comprising 30 decision trees to estimate contractions in 262 contractile elements. The method was evaluated on data from tagged magnetic resonance data from actual speech and on simulated data mimicking flaps that might have resulted from glossectomy surgery. The estimation accuracy was modest (5.6% error), but it surpassed a semi-manual approach (8.1% error). The results suggest that a machine learning approach to contraction pattern estimation in the tongue is feasible, even in the presence of flaps.
Collapse
Affiliation(s)
- Aniket A Tolpadi
- Department of Bioengineering, Rice University, Houston, TX, US 77005
| | - Maureen L Stone
- Department of Neural and Pain Sciences, Dept of Orthodontics, University of Maryland Dental School, Baltimore, MD, US 21201
| | - Aaron Carass
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, US 21218
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, US 21218
| | - Arnold D Gomez
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, US 21218
| |
Collapse
|
8
|
Woo J, Xing F, Stone M, Green J, Reese TG, Brady TJ, Wedeen VJ, Prince JL, El Fakhri G. Speech Map: A Statistical Multimodal Atlas of 4D Tongue Motion During Speech from Tagged and Cine MR Images. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION 2017; 7:361-373. [PMID: 31328049 DOI: 10.1080/21681163.2017.1382393] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Quantitative measurement of functional and anatomical traits of 4D tongue motion in the course of speech or other lingual behaviors remains a major challenge in scientific research and clinical applications. Here, we introduce a statistical multimodal atlas of 4D tongue motion using healthy subjects, which enables a combined quantitative characterization of tongue motion in a reference anatomical configuration. This atlas framework, termed Speech Map, combines cine- and tagged-MRI in order to provide both the anatomic reference and motion information during speech. Our approach involves a series of steps including (1) construction of a common reference anatomical configuration from cine-MRI, (2) motion estimation from tagged-MRI, (3) transformation of the motion estimations to the reference anatomical configuration, and (4) computation of motion quantities such as Lagrangian strain. Using this framework, the anatomic configuration of the tongue appears motionless, while the motion fields and associated strain measurements change over the time course of speech. In addition, to form a succinct representation of the high-dimensional and complex motion fields, principal component analysis is carried out to characterize the central tendencies and variations of motion fields of our speech tasks. Our proposed method provides a platform to quantitatively and objectively explain the differences and variability of tongue motion by illuminating internal motion and strain that have so far been intractable. The findings are used to understand how tongue function for speech is limited by abnormal internal motion and strain in glossectomy patients.
Collapse
Affiliation(s)
- Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences, University of Maryland Dental School, Baltimore, MD 21201, USA
| | - Jordan Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA 02129, USA
| | - Timothy G Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02129, USA
| | - Thomas J Brady
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Van J Wedeen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02129, USA
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| |
Collapse
|
9
|
Woo J, Xing F, Lee J, Stone M, Prince JL. A Spatio-Temporal Atlas and Statistical Model of the Tongue During Speech from Cine-MRI. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION 2016; 6:520-531. [PMID: 30034953 DOI: 10.1080/21681163.2016.1169220] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Statistical modeling of tongue motion during speech using cine magnetic resonance imaging (MRI) provides key information about the relationship between structure and motion of the tongue. In order to study the variability of tongue shape and motion in populations, a consistent integration and characterization of inter-subject variability is needed. In this paper, a method to construct a spatio-temporal atlas comprising a mean motion model and statistical modes of variation during speech is presented. The model is based on the cine-MRI from twenty two normal speakers and consists of several steps involving both spatial and temporal alignment problems independently. First, all images are registered into a common reference space, which is taken to be a neutral resting position of the tongue. Second, the tongue shapes of each individual relative to this reference space are produced. Third, a time warping approach (several are evaluated) is used to align the time frames of each subject to a common time series of initial mean images. Finally, the spatio-temporal atlas is created by time-warping each subject, generating new mean images at each time, and producing shape statistics around these mean images using principal component analysis at each reference time frame. Experimental results consist of comparison of various parameters and methods in creation of the atlas and a demonstration of the final modes of variations at various key time frames in a sample phrase.
Collapse
Affiliation(s)
- Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusettes General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusettes General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Junghoon Lee
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences and Department of Orthodontics, University of Maryland, Baltimore, MD 21201, USA
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|