1
|
Belyk M, Carignan C, McGettigan C. An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images. Behav Res Methods 2024; 56:2623-2635. [PMID: 37507650 PMCID: PMC10990993 DOI: 10.3758/s13428-023-02171-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2023] [Indexed: 07/30/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms.
Collapse
Affiliation(s)
- Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk, UK.
| | - Christopher Carignan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
2
|
Ribeiro V, Isaieva K, Leclere J, Felblinger J, Vuissoz PA, Laprie Y. Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107907. [PMID: 37976615 DOI: 10.1016/j.cmpb.2023.107907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 10/26/2023] [Accepted: 10/29/2023] [Indexed: 11/19/2023]
Abstract
BACKGROUND AND OBJECTIVES The characterization of the vocal tract geometry during speech interests various research topics, including speech production modeling, motor control analysis, and speech therapy design. Real-time MRI is a reliable and non-invasive tool for this purpose. In most cases, it is necessary to know the contours of the individual articulators from the glottis to the lips. Several techniques have been proposed for segmenting vocal tract articulators, but most are limited to specific applications. Moreover, they often do not provide individualized contours for all soft-tissue articulators in a multi-speaker configuration. METHODS A Mask R-CNN network was trained to detect and segment the vocal tract articulator contours in two real-time MRI (RT-MRI) datasets with speech recordings of multiple speakers. Two post-processing algorithms were then proposed to convert the network's outputs into geometrical curves. Nine articulators were considered: the two lips, tongue, soft palate, pharynx, arytenoid cartilage, epiglottis, thyroid cartilage, and vocal folds. A leave-one-out cross-validation protocol was used to evaluate inter-speaker generalization. The evaluation metrics were the point-to-closest-point distance and the Jaccard index (for articulators annotated as closed contours). RESULTS The proposed method accurately segmented the vocal tract articulators, with an average root mean square point-to-closest-point distance of less than 2.2mm for all the articulators in the leave-one-out cross-validation setting. The minimum P2CPRMS was 0.91mm for the upper lip, and the maximum was 2.18mm for the tongue. The Jaccard indices for the thyroid cartilage and vocal folds were 0.60 and 0.61, respectively. Additionally, the method adapted to a new subject with only ten annotated samples. CONCLUSIONS Our research introduced a method for individually segmenting nine non-rigid vocal tract articulators in real-time MRI movies. The software is openly available as an installable package to the speech community. It is designed to develop speech applications and clinical and non-clinical research in fields that require vocal tract geometry, such as speech, singing, and human beatboxing.
Collapse
Affiliation(s)
- Vinicius Ribeiro
- Universite de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France.
| | - Karyna Isaieva
- Universite de Lorraine, INSERM, U1254, IADI, Nancy, F-54000, France
| | - Justine Leclere
- Universite de Lorraine, INSERM, U1254, IADI, Nancy, F-54000, France; Service de Medecine Bucco-dentaire, Hopital Maison Blanche, Reims, F-51100, France
| | - Jacques Felblinger
- Universite de Lorraine, INSERM, U1254, IADI, Nancy, F-54000, France; CIC-IT 1433, INSERM, CHRU, Nancy, F-54000, France
| | | | - Yves Laprie
- Universite de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| |
Collapse
|
3
|
Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023; 10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open
Abstract
The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech, these do not include ground-truth (GT) segmentations, a key requirement for the development of deep-learning-based segmentation methods. To begin to address this barrier, this work presents rt-MRI speech datasets of five healthy adult volunteers with corresponding GT segmentations and velopharyngeal closure patterns. The images were acquired using standard clinical MRI scanners, coils and sequences to facilitate acquisition of similar images in other centres. The datasets include manually created GT segmentations of six anatomical features including the tongue, soft palate and vocal tract. In addition, this work makes code and instructions to implement a current state-of-the-art deep-learning-based method to segment rt-MRI speech datasets publicly available, thus providing the community and others with a starting point for developing such methods.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | | | - David M Adams
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | - Marc Eric Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
- Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London, E1 1HH, UK.
- Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London, EC1M 6BQ, UK.
| |
Collapse
|
4
|
Isaieva K, Odille F, Laprie Y, Drouot G, Felblinger J, Vuissoz PA. Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech. J Imaging 2023; 9:233. [PMID: 37888339 PMCID: PMC10607793 DOI: 10.3390/jimaging9100233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
MRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images is often insufficient. The goal of this study was to test the applicability of super-resolution algorithms for dynamic vocal tract MRI. In total, 25 sagittal slices of 8 mm with an in-plane resolution of 1.6 × 1.6 mm2 were acquired consecutively using a highly-undersampled radial 2D FLASH sequence. The volunteers were reading a text in French with two different protocols. The slices were aligned using the simultaneously recorded sound. The super-resolution strategy was used to reconstruct 1.6 × 1.6 × 1.6 mm3 isotropic volumes. The resulting images were less sharp than the native 2D images but demonstrated a higher signal-to-noise ratio. It was also shown that the super-resolution allows for eliminating inconsistencies leading to regular transitions between the slices. Additionally, it was demonstrated that using visual stimuli and shorter text fragments improves the inter-slice consistency and the super-resolved image sharpness. Therefore, with a correct speech task choice, the proposed method allows for the reconstruction of high-quality dynamic 3D volumes of the vocal tract during natural speech.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| | - Freddy Odille
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Yves Laprie
- LORIA, Université de Lorraine, CNRS, INRIA, F-54000 Nancy, France
| | - Guillaume Drouot
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Jacques Felblinger
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Pierre-André Vuissoz
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| |
Collapse
|
5
|
Erattakulangara S, Kelat K, Meyer D, Priya S, Lingala SG. Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model. Bioengineering (Basel) 2023; 10:bioengineering10050623. [PMID: 37237693 DOI: 10.3390/bioengineering10050623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/11/2023] [Accepted: 05/19/2023] [Indexed: 05/28/2023] Open
Abstract
Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80-100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.
Collapse
Affiliation(s)
- Subin Erattakulangara
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Karthika Kelat
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah University, Winchester, VA 22601, USA
| | - Sarv Priya
- Department of Radiology, University of Iowa, Iowa City, IA 52242, USA
| | - Sajan Goud Lingala
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
- Department of Radiology, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
6
|
Isaieva K, Leclère J, Felblinger J, Gillet R, Dubernard X, Vuissoz PA. Methodology for quantitative evaluation of mandibular condyles motion symmetricity from real-time MRI in the axial plane. Magn Reson Imaging 2023; 102:115-125. [PMID: 37187265 DOI: 10.1016/j.mri.2023.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 05/04/2023] [Accepted: 05/09/2023] [Indexed: 05/17/2023]
Abstract
Diagnosis of temporomandibular disorders is currently based on clinical examination and static MRI. Real-time MRI enables tracking of condylar motion and, thus, evaluation of their motion symmetricity (which could be associated with temporomandibular joint disorders). The purpose of this work is to propose an acquisition protocol, an image processing approach, and a set of parameters enabling objective assessment of motion asymmetry; to check the reliability and find the limitations of the approach, and to verify if the automatically calculated parameters are associated with the motion symmetricity. A rapid radial FLASH sequence was used to acquire a dynamic set of axial images for 10 subjects. One more subject was involved to estimate the dependence of the motion parameters on the slice placement. The images were segmented with a semi-automatic approach based on U-Net convolutional neural network, and the condyles' mass centers were projected on the mid-sagittal axis. Resulting projection curves were used for the extraction of various motion parameters including latency, velocity peak delay, and maximal displacement between the right and the left condyle. These automatically calculated parameters were compared with the physicians' scores. The proposed segmentation approach allowed a reliable center of mass tracking. Latency and velocity peak delay were found to be invariant to the slice position, and maximal displacement difference considerably varied. The automatically calculated parameters demonstrated a significant correlation with the experts' scores. The proposed acquisition and data processing protocol enables the automatizable extraction of quantitative parameters that characterize the symmetricity of condylar motion.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, University of Lorraine, INSERM U1254, Nancy, France.
| | - Justine Leclère
- IADI, University of Lorraine, INSERM U1254, Nancy, France; Oral Medicine Department, University Hospital of Reims, Reims, France
| | - Jacques Felblinger
- IADI, University of Lorraine, INSERM U1254, Nancy, France; CIC-IT 1433, INSERM, CHRU de Nancy, Nancy, France
| | - Romain Gillet
- IADI, University of Lorraine, INSERM U1254, Nancy, France; Guilloz Imaging Department, CHRU of Nancy, Nancy, France
| | | | | |
Collapse
|
7
|
MEYER D, RUSHO RZ, ALAM W, CHRISTENSEN GE, HOWARD DM, ATHA J, HOFFMAN EA, STORY B, TITZE IR, LINGALA SG. High-Resolution Three-Dimensional Hybrid MRI + Low Dose CT Vocal Tract Modeling: A Cadaveric Pilot Study. J Voice 2022. [DOI: 10.1016/j.jvoice.2022.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
8
|
Isaieva K, Fauvel M, Weber N, Vuissoz PA, Felblinger J, Oster J, Odille F. A hardware and software system for MRI applications requiring external device data. Magn Reson Med 2022; 88:1406-1418. [PMID: 35506503 DOI: 10.1002/mrm.29280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/07/2022] [Accepted: 04/07/2022] [Indexed: 01/22/2023]
Abstract
PURPOSE Numerous MRI applications require data from external devices. Such devices are often independent of the MRI system, so synchronizing these data with the MRI data is often tedious and limited to offline use. In this work, a hardware and software system is proposed for acquiring data from external devices during MR imaging, for use online (in real-time) or offline. METHODS The hardware includes a set of external devices - electrocardiography (ECG) devices, respiration sensors, microphone, electronics of the MR system etc. - using various channels for data transmission (analog, digital, optical fibers), all connected to a server through a universal serial bus (USB) hub. The software is based on a flexible client-server architecture, allowing real-time processing pipelines to be configured and executed. Communication protocols and data formats are proposed, in particular for transferring the external device data to an open-source reconstruction software (Gadgetron), for online image reconstruction using external physiological data. The system performance is evaluated in terms of accuracy of the recorded signals and delays involved in the real-time processing tasks. Its flexibility is shown with various applications. RESULTS The real-time system had low delays and jitters (on the order of 1 ms). Example MRI applications using external devices included: prospectively gated cardiac cine imaging, multi-modal acquisition of the vocal tract (image, sound, and respiration) and online image reconstruction with nonrigid motion correction. CONCLUSION The performance of the system and its versatile architecture make it suitable for a wide range of MRI applications requiring online or offline use of external device data.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, Université de Lorraine, INSERM U1254, Nancy, France
| | - Marc Fauvel
- CIC-IT 1433, Université de Lorraine, INSERM, CHRU de Nancy, Nancy, France
| | - Nicolas Weber
- IADI, Université de Lorraine, INSERM U1254, Nancy, France
| | | | - Jacques Felblinger
- IADI, Université de Lorraine, INSERM U1254, Nancy, France.,CIC-IT 1433, Université de Lorraine, INSERM, CHRU de Nancy, Nancy, France
| | - Julien Oster
- IADI, Université de Lorraine, INSERM U1254, Nancy, France
| | - Freddy Odille
- IADI, Université de Lorraine, INSERM U1254, Nancy, France.,CIC-IT 1433, Université de Lorraine, INSERM, CHRU de Nancy, Nancy, France
| |
Collapse
|
9
|
Isaieva K, Laprie Y, Leclère J, Douros IK, Felblinger J, Vuissoz PA. Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers. Sci Data 2021; 8:258. [PMID: 34599194 PMCID: PMC8486854 DOI: 10.1038/s41597-021-01041-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 08/25/2021] [Indexed: 12/28/2022] Open
Abstract
The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus consisting of synthetic sentences was used to ensure a good coverage of the French phonetic context. A real-time MRI technology with temporal resolution of 20 ms was used to acquire vocal tract images of the participants speaking. The sound was recorded simultaneously with MRI, denoised and temporally aligned with the images. The speech was transcribed to obtain phoneme-wise segmentation of sound. We also acquired static 3D MR images for a wide list of French phonemes. In addition, we include annotations of spontaneous swallowing. Measurement(s) | Vocal tract images • Speech | Technology Type(s) | Magnetic Resonance Imaging • Microphone Device | Sample Characteristic - Organism | Homo sapiens |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16404453
Collapse
Affiliation(s)
- Karyna Isaieva
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.
| | - Yves Laprie
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| | - Justine Leclère
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,Oral Medicine Department, University Hospital of Reims, 45 rue Cognacq-Jay, 51092, Reims, Cedex, France
| | - Ioannis K Douros
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| | - Jacques Felblinger
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,CIC-IT, INSERM, CHRU de Nancy, Nancy, F-54000, France
| | | |
Collapse
|