1
|
Isaieva K, Odille F, Laprie Y, Drouot G, Felblinger J, Vuissoz PA. Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech. J Imaging 2023; 9:233. [PMID: 37888339 PMCID: PMC10607793 DOI: 10.3390/jimaging9100233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
MRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images is often insufficient. The goal of this study was to test the applicability of super-resolution algorithms for dynamic vocal tract MRI. In total, 25 sagittal slices of 8 mm with an in-plane resolution of 1.6 × 1.6 mm2 were acquired consecutively using a highly-undersampled radial 2D FLASH sequence. The volunteers were reading a text in French with two different protocols. The slices were aligned using the simultaneously recorded sound. The super-resolution strategy was used to reconstruct 1.6 × 1.6 × 1.6 mm3 isotropic volumes. The resulting images were less sharp than the native 2D images but demonstrated a higher signal-to-noise ratio. It was also shown that the super-resolution allows for eliminating inconsistencies leading to regular transitions between the slices. Additionally, it was demonstrated that using visual stimuli and shorter text fragments improves the inter-slice consistency and the super-resolved image sharpness. Therefore, with a correct speech task choice, the proposed method allows for the reconstruction of high-quality dynamic 3D volumes of the vocal tract during natural speech.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| | - Freddy Odille
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Yves Laprie
- LORIA, Université de Lorraine, CNRS, INRIA, F-54000 Nancy, France
| | - Guillaume Drouot
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Jacques Felblinger
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Pierre-André Vuissoz
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| |
Collapse
|
2
|
Pan W, Deng F, Wang X, Hang B, Zhou W, Zhu T. Exploring the ability of vocal biomarkers in distinguishing depression from bipolar disorder, schizophrenia, and healthy controls. Front Psychiatry 2023; 14:1079448. [PMID: 37575564 PMCID: PMC10415910 DOI: 10.3389/fpsyt.2023.1079448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Background Vocal features have been exploited to distinguish depression from healthy controls. While there have been some claims for success, the degree to which changes in vocal features are specific to depression has not been systematically studied. Hence, we examined the performances of vocal features in differentiating depression from bipolar disorder (BD), schizophrenia and healthy controls, as well as pairwise classifications for the three disorders. Methods We sampled 32 bipolar disorder patients, 106 depression patients, 114 healthy controls, and 20 schizophrenia patients. We extracted i-vectors from Mel-frequency cepstrum coefficients (MFCCs), and built logistic regression models with ridge regularization and 5-fold cross-validation on the training set, then applied models to the test set. There were seven classification tasks: any disorder versus healthy controls; depression versus healthy controls; BD versus healthy controls; schizophrenia versus healthy controls; depression versus BD; depression versus schizophrenia; BD versus schizophrenia. Results The area under curve (AUC) score for classifying depression and bipolar disorder was 0.5 (F-score = 0.44). For other comparisons, the AUC scores ranged from 0.75 to 0.92, and the F-scores ranged from 0.73 to 0.91. The model performance (AUC) of classifying depression and bipolar disorder was significantly worse than that of classifying bipolar disorder and schizophrenia (corrected p < 0.05). While there were no significant differences in the remaining pairwise comparisons of the 7 classification tasks. Conclusion Vocal features showed discriminatory potential in classifying depression and the healthy controls, as well as between depression and other mental disorders. Future research should systematically examine the mechanisms of voice features in distinguishing depression with other mental disorders and develop more sophisticated machine learning models so that voice can assist clinical diagnosis better.
Collapse
Affiliation(s)
- Wei Pan
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Fusong Deng
- Wuhan Wuchang Hospital, Wuchang Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Xianbin Wang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Bowen Hang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Wenwei Zhou
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Tingshao Zhu
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
3
|
Eysenbach G, Jang EH, Lee SH, Choi KY, Park JG, Shin HC. Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach. J Med Internet Res 2023; 25:e34474. [PMID: 36696160 PMCID: PMC9909514 DOI: 10.2196/34474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/20/2022] [Accepted: 12/18/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Automatic diagnosis of depression based on speech can complement mental health treatment methods in the future. Previous studies have reported that acoustic properties can be used to identify depression. However, few studies have attempted a large-scale differential diagnosis of patients with depressive disorders using acoustic characteristics of non-English speakers. OBJECTIVE This study proposes a framework for automatic depression detection using large-scale acoustic characteristics based on the Korean language. METHODS We recruited 153 patients who met the criteria for major depressive disorder and 165 healthy controls without current or past mental illness. Participants' voices were recorded on a smartphone while performing the task of reading predefined text-based sentences. Three approaches were evaluated and compared to detect depression using data sets with text-dependent read speech tasks: conventional machine learning models based on acoustic features, a proposed model that trains and classifies log-Mel spectrograms by applying a deep convolutional neural network (CNN) with a relatively small number of parameters, and models that train and classify log-Mel spectrograms by applying well-known pretrained networks. RESULTS The acoustic characteristics of the predefined text-based sentence reading automatically detected depression using the proposed CNN model. The highest accuracy achieved with the proposed CNN on the speech data was 78.14%. Our results show that the deep-learned acoustic characteristics lead to better performance than those obtained using the conventional approach and pretrained models. CONCLUSIONS Checking the mood of patients with major depressive disorder and detecting the consistency of objective descriptions are very important research topics. This study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing the characteristics of depression. Our method is smartphone based, is easily accessible, and can contribute to the automatic identification of depressive states.
Collapse
Affiliation(s)
| | - Eun Hye Jang
- Medical Information Research Section, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea
| | - Seung-Hwan Lee
- Clinical Emotion and Cognition Research Laboratory, Inje University, Goyang, Republic of Korea.,Department of Psychiatry, Inje University, Ilsan-Paik Hospital, Goyang, Republic of Korea.,Bwave Inc, Goyang, Republic of Korea
| | - Kwang-Yeon Choi
- Department of Psychiatry, College of Medicine, Chungnam National University, Daejeon, Republic of Korea
| | - Jeon Gue Park
- Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea.,Tutorus Labs Inc, Seoul, Republic of Korea
| | - Hyun-Chool Shin
- Department of Electronics Engineering, Soongsil University, Seoul, Republic of Korea
| |
Collapse
|
4
|
Isaieva K, Fauvel M, Weber N, Vuissoz PA, Felblinger J, Oster J, Odille F. A hardware and software system for MRI applications requiring external device data. Magn Reson Med 2022; 88:1406-1418. [PMID: 35506503 DOI: 10.1002/mrm.29280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/07/2022] [Accepted: 04/07/2022] [Indexed: 01/22/2023]
Abstract
PURPOSE Numerous MRI applications require data from external devices. Such devices are often independent of the MRI system, so synchronizing these data with the MRI data is often tedious and limited to offline use. In this work, a hardware and software system is proposed for acquiring data from external devices during MR imaging, for use online (in real-time) or offline. METHODS The hardware includes a set of external devices - electrocardiography (ECG) devices, respiration sensors, microphone, electronics of the MR system etc. - using various channels for data transmission (analog, digital, optical fibers), all connected to a server through a universal serial bus (USB) hub. The software is based on a flexible client-server architecture, allowing real-time processing pipelines to be configured and executed. Communication protocols and data formats are proposed, in particular for transferring the external device data to an open-source reconstruction software (Gadgetron), for online image reconstruction using external physiological data. The system performance is evaluated in terms of accuracy of the recorded signals and delays involved in the real-time processing tasks. Its flexibility is shown with various applications. RESULTS The real-time system had low delays and jitters (on the order of 1 ms). Example MRI applications using external devices included: prospectively gated cardiac cine imaging, multi-modal acquisition of the vocal tract (image, sound, and respiration) and online image reconstruction with nonrigid motion correction. CONCLUSION The performance of the system and its versatile architecture make it suitable for a wide range of MRI applications requiring online or offline use of external device data.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, Université de Lorraine, INSERM U1254, Nancy, France
| | - Marc Fauvel
- CIC-IT 1433, Université de Lorraine, INSERM, CHRU de Nancy, Nancy, France
| | - Nicolas Weber
- IADI, Université de Lorraine, INSERM U1254, Nancy, France
| | | | - Jacques Felblinger
- IADI, Université de Lorraine, INSERM U1254, Nancy, France.,CIC-IT 1433, Université de Lorraine, INSERM, CHRU de Nancy, Nancy, France
| | - Julien Oster
- IADI, Université de Lorraine, INSERM U1254, Nancy, France
| | - Freddy Odille
- IADI, Université de Lorraine, INSERM U1254, Nancy, France.,CIC-IT 1433, Université de Lorraine, INSERM, CHRU de Nancy, Nancy, France
| |
Collapse
|
5
|
Nayak KS, Lim Y, Campbell-Washburn AE, Steeden J. Real-Time Magnetic Resonance Imaging. J Magn Reson Imaging 2022; 55:81-99. [PMID: 33295674 PMCID: PMC8435094 DOI: 10.1002/jmri.27411] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/06/2020] [Accepted: 10/09/2020] [Indexed: 01/03/2023] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) allows for imaging dynamic processes as they occur, without relying on any repetition or synchronization. This is made possible by modern MRI technology such as fast-switching gradients and parallel imaging. It is compatible with many (but not all) MRI sequences, including spoiled gradient echo, balanced steady-state free precession, and single-shot rapid acquisition with relaxation enhancement. RT-MRI has earned an important role in both diagnostic imaging and image guidance of invasive procedures. Its unique diagnostic value is prominent in areas of the body that undergo substantial and often irregular motion, such as the heart, gastrointestinal system, upper airway vocal tract, and joints. Its value in interventional procedure guidance is prominent for procedures that require multiple forms of soft-tissue contrast, as well as flow information. In this review, we discuss the history of RT-MRI, fundamental tradeoffs, enabling technology, established applications, and current trends. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY STAGE: 1.
Collapse
Affiliation(s)
- Krishna S. Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA,Address reprint requests to: K.S.N., 3740 McClintock Ave, EEB 400C, Los Angeles, CA 90089-2564, USA.
| | - Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
| | - Adrienne E. Campbell-Washburn
- Cardiovascular Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Jennifer Steeden
- Institute of Cardiovascular Science, Centre for Cardiovascular Imaging, University College London, London, UK
| |
Collapse
|
6
|
Zhao Q, Fan HZ, Li YL, Liu L, Wu YX, Zhao YL, Tian ZX, Wang ZR, Tan YL, Tan SP. Vocal Acoustic Features as Potential Biomarkers for Identifying/Diagnosing Depression: A Cross-Sectional Study. Front Psychiatry 2022; 13:815678. [PMID: 35573349 PMCID: PMC9095973 DOI: 10.3389/fpsyt.2022.815678] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/30/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND At present, there is no established biomarker for the diagnosis of depression. Meanwhile, studies show that acoustic features convey emotional information. Therefore, this study explored differences in acoustic characteristics between depressed patients and healthy individuals to investigate whether these characteristics can identify depression. METHODS Participants included 71 patients diagnosed with depression from a regional hospital in Beijing, China, and 62 normal controls from within the greater community. We assessed the clinical symptoms of depression of all participants using the Hamilton Depression Scale (HAMD), Hamilton Anxiety Scale (HAMA), and Patient Health Questionnaire (PHQ-9), and recorded the voice of each participant as they read positive, neutral, and negative texts. OpenSMILE was used to analyze their voice acoustics and extract acoustic characteristics from the recordings. RESULTS There were significant differences between the depression and control groups in all acoustic characteristics (p < 0.05). Several mel-frequency cepstral coefficients (MFCCs), including MFCC2, MFCC3, MFCC8, and MFCC9, differed significantly between different emotion tasks; MFCC4 and MFCC7 correlated positively with PHQ-9 scores, and correlations were stable in all emotion tasks. The zero-crossing rate in positive emotion correlated positively with HAMA total score and HAMA somatic anxiety score (r = 0.31, r = 0.34, respectively), and MFCC9 of neutral emotion correlated negatively with HAMD anxiety/somatization scores (r = -0.34). Linear regression showed that the MFCC7-negative was predictive on the PHQ-9 score (β = 0.90, p = 0.01) and MFCC9-neutral was predictive on HAMD anxiety/somatization score (β = -0.45, p = 0.049). Logistic regression showed a superior discriminant effect, with a discrimination accuracy of 89.66%. CONCLUSION The acoustic expression of emotion among patients with depression differs from that of normal controls. Some acoustic characteristics are related to the severity of depressive symptoms and may be objective biomarkers of depression. A systematic method of assessing vocal acoustic characteristics could provide an accurate and discreet means of screening for depression; this method may be used instead of-or in conjunction with-traditional screening methods, as it is not subject to the limitations associated with self-reported assessments wherein subjects may be inclined to provide socially acceptable responses rather than being truthful.
Collapse
Affiliation(s)
- Qing Zhao
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Hong-Zhen Fan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yan-Li Li
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Lei Liu
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Ya-Xue Wu
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yan-Li Zhao
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Zhan-Xiao Tian
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Zhi-Ren Wang
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yun-Long Tan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Shu-Ping Tan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| |
Collapse
|
7
|
Martin J, Ruthven M, Boubertakh R, Miquel ME. Realistic Dynamic Numerical Phantom for MRI of the Upper Vocal Tract. J Imaging 2020; 6:86. [PMID: 34460743 PMCID: PMC8320850 DOI: 10.3390/jimaging6090086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 08/08/2020] [Accepted: 08/24/2020] [Indexed: 11/16/2022] Open
Abstract
Dynamic and real-time MRI (rtMRI) of human speech is an active field of research, with interest from both the linguistics and clinical communities. At present, different research groups are investigating a range of rtMRI acquisition and reconstruction approaches to visualise the speech organs. Similar to other moving organs, it is difficult to create a physical phantom of the speech organs to optimise these approaches; therefore, the optimisation requires extensive scanner access and imaging of volunteers. As previously demonstrated in cardiac imaging, realistic numerical phantoms can be useful tools for optimising rtMRI approaches and reduce reliance on scanner access and imaging volunteers. However, currently, no such speech rtMRI phantom exists. In this work, a numerical phantom for optimising speech rtMRI approaches was developed and tested on different reconstruction schemes. The novel phantom comprised a dynamic image series and corresponding k-space data of a single mid-sagittal slice with a temporal resolution of 30 frames per second (fps). The phantom was developed based on images of a volunteer acquired at a frame rate of 10 fps. The creation of the numerical phantom involved the following steps: image acquisition, image enhancement, segmentation, mask optimisation, through-time and spatial interpolation and finally the derived k-space phantom. The phantom was used to: (1) test different k-space sampling schemes (Cartesian, radial and spiral); (2) create lower frame rate acquisitions by simulating segmented k-space acquisitions; (3) simulate parallel imaging reconstructions (SENSE and GRAPPA). This demonstrated how such a numerical phantom could be used to optimise images and test multiple sampling strategies without extensive scanner access.
Collapse
Affiliation(s)
- Joe Martin
- MR Physics, Guy’s and St Thomas’ NHS Foundation Trust, St Thomas’s Hospital, London SE1 7EH, UK;
| | - Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, St Bartholomew’s Hospital, London EC1A 7BE, UK;
| | - Redha Boubertakh
- Singapore Bioimaging Consortium (SBIC), Singapore 138667, Singapore;
| | - Marc E. Miquel
- Clinical Physics, Barts Health NHS Trust, St Bartholomew’s Hospital, London EC1A 7BE, UK;
- Centre for Advanced Cardiovascular Imaging, NIHR Barts Biomedical Research Centre (BRC), William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| |
Collapse
|
8
|
Measurement of Tongue Tip Velocity from Real-Time MRI and Phase-Contrast Cine-MRI in Consonant Production. J Imaging 2020; 6:jimaging6050031. [PMID: 34460733 PMCID: PMC8321019 DOI: 10.3390/jimaging6050031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 04/30/2020] [Accepted: 05/06/2020] [Indexed: 11/17/2022] Open
Abstract
We evaluate velocity of the tongue tip with magnetic resonance imaging (MRI) using two independent approaches. The first one consists in acquisition with a real-time technique in the mid-sagittal plane. Tracking of the tongue tip manually and with a computer vision method allows its trajectory to be found and the velocity to be calculated as the derivative of the coordinate. We also propose to use another approach—phase contrast MRI—which enables velocities of the moving tissues to be measured directly. We recorded the sound simultaneously with the MR acquisition which enabled us to make conclusions regarding the relation between the movements and the sound. We acquired the data from two French-speaking subjects articulating /tata/. The results of both methods are in qualitative agreement and are consistent with other reviewer techniques used for evaluation of the tongue tip velocity.
Collapse
|
9
|
Wang J, Zhang L, Liu T, Pan W, Hu B, Zhu T. Acoustic differences between healthy and depressed people: a cross-situation study. BMC Psychiatry 2019; 19:300. [PMID: 31615470 PMCID: PMC6794822 DOI: 10.1186/s12888-019-2300-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 09/20/2019] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Abnormalities in vocal expression during a depressed episode have frequently been reported in people with depression, but less is known about if these abnormalities only exist in special situations. In addition, the impacts of irrelevant demographic variables on voice were uncontrolled in previous studies. Therefore, this study compares the vocal differences between depressed and healthy people under various situations with irrelevant variables being regarded as covariates. METHODS To examine whether the vocal abnormalities in people with depression only exist in special situations, this study compared the vocal differences between healthy people and patients with unipolar depression in 12 situations (speech scenarios). Positive, negative and neutral voice expressions between depressed and healthy people were compared in four tasks. Multiple analysis of covariance (MANCOVA) was used for evaluating the main effects of variable group (depressed vs. healthy) on acoustic features. The significances of acoustic features were evaluated by both statistical significance and magnitude of effect size. RESULTS The results of multivariate analysis of covariance showed that significant differences between the two groups were observed in all 12 speech scenarios. Although significant acoustic features were not the same in different scenarios, we found that three acoustic features (loudness, MFCC5 and MFCC7) were consistently different between people with and without depression with large effect magnitude. CONCLUSIONS Vocal differences between depressed and healthy people exist in 12 scenarios. Acoustic features including loudness, MFCC5 and MFCC7 have potentials to be indicators for identifying depression via voice analysis. These findings support that depressed people's voices include both situation-specific and cross-situational patterns of acoustic features.
Collapse
Affiliation(s)
- Jingying Wang
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| | - Lei Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA USA
| | - Tianli Liu
- Institute of Population Research, Peking University, Beijing, China
| | - Wei Pan
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| | - Bin Hu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu Province China
| | - Tingshao Zhu
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
10
|
Abstract
Purpose
Speech production is a complex 3-dimensional (3D) process, and yet most of what is known about it is derived from 2D midsagittal data. The relatively recent development of safe 3D imaging technologies (including magnetic resonance imaging and ultrasound) provide new opportunities to revisit and reformulate what is already known and to push the boundaries of current knowledge still further. A particularly useful imaging modality for this purpose is 3D/4D ultrasound, which until very recently was not well suited for studies in speech research. This technical report presents an overview of what 3D/4D ultrasound can contribute to speech research, with a focus on 2 demonstrations.
Conclusion
The 1st demonstration illustrates how 3D/4D ultrasound makes it possible to image certain vocal tract anatomical structures and planes that conventional 2D ultrasound is not capable of imaging. The 2nd demonstration illustrates how 3D/4D ultrasound can be combined with static 3D magnetic resonance imaging to provide new insight into the temporal pervasiveness and spatial extensiveness of lateral contact between the tongue and palate–teeth during speech production.
Collapse
Affiliation(s)
- Steven M. Lulich
- Department of Speech & Hearing Sciences, Indiana University, Bloomington
| | - William G. Pearson
- Department of Cellular Biology and Anatomy, Medical College of Georgia, Augusta
| |
Collapse
|
11
|
Chen W, Byrd D, Narayanan S, Nayak KS. Intermittently tagged real-time MRI reveals internal tongue motion during speech production. Magn Reson Med 2019; 82:600-613. [PMID: 30919494 PMCID: PMC6510652 DOI: 10.1002/mrm.27745] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 02/11/2019] [Accepted: 02/28/2019] [Indexed: 12/17/2022]
Abstract
PURPOSE To demonstrate a tagging method compatible with RT-MRI for the study of speech production. METHODS Tagging is applied as a brief interruption to a continuous real-time spiral acquisition. Tagging can be initiated manually by the operator, cued to the speech stimulus, or be automatically applied with a fixed frequency. We use a standard 2D 1-3-3-1 binomial SPAtial Modulation of Magnetization (SPAMM) sequence with 1 cm spacing in both in-plane directions. Tag persistence in tongue muscle is simulated and validated in vivo. The ability to capture internal tongue deformations is tested during speech production of American English diphthongs in native speakers. RESULTS We achieved an imaging window of 650-800 ms at 1.5T, with imaging signal to noise ratio ≥ 17 and tag contrast to noise ratio ≥ 5 in human tongue, providing 36 frames/s temporal resolution and 2 mm in-plane spatial resolution with real-time interactive acquisition and view-sharing reconstruction. The proposed method was able to capture tongue motion patterns and their relative timing with adequate spatiotemporal resolution during the production of American English diphthongs and consonants. CONCLUSION Intermittent tagging during real-time MRI of speech production is able to reveal the internal deformations of the tongue. This capability will allow new investigations of valuable spatiotemporal information on the biomechanics of the lingual subsystems during speech without reliance on binning speech utterance repetition.
Collapse
Affiliation(s)
- Weiyi Chen
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S. Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
12
|
Kim YC. Fast upper airway magnetic resonance imaging for assessment of speech production and sleep apnea. PRECISION AND FUTURE MEDICINE 2018. [DOI: 10.23838/pfm.2018.00100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
|
13
|
Lim Y, Zhu Y, Lingala SG, Byrd D, Narayanan S, Nayak KS. 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med 2018; 81:1511-1520. [PMID: 30390319 DOI: 10.1002/mrm.27570] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 09/25/2018] [Accepted: 09/26/2018] [Indexed: 12/19/2022]
Abstract
PURPOSE To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech. METHODS We demonstrate 2.4 × 2.4 × 5.8 mm3 spatial resolution, 61-ms temporal resolution, and a 200 × 200 × 70 mm3 FOV. The proposed method uses 3D gradient-echo imaging with a custom upper-airway coil, a minimum-phase slab excitation, stack-of-spirals readout, pseudo golden-angle view order in kx -ky , linear Cartesian order along kz , and spatiotemporal finite difference constrained reconstruction, with 13-fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI. RESULTS This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants "s" and "l" and of consonant-vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume-of-interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels. CONCLUSION We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant-vowel syllables, without requiring multiple repetitions.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Yinghua Zhu
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Sajan Goud Lingala
- Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, Iowa
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Krishna Shrinivas Nayak
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| |
Collapse
|
14
|
Taguchi T, Tachikawa H, Nemoto K, Suzuki M, Nagano T, Tachibana R, Nishimura M, Arai T. Major depressive disorder discrimination using vocal acoustic features. J Affect Disord 2018; 225:214-220. [PMID: 28841483 DOI: 10.1016/j.jad.2017.08.038] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Revised: 06/01/2017] [Accepted: 08/14/2017] [Indexed: 11/26/2022]
Abstract
BACKGROUND The voice carries various information produced by vibrations of the vocal cords and the vocal tract. Though many studies have reported a relationship between vocal acoustic features and depression, including mel-frequency cepstrum coefficients (MFCCs) which applied to speech recognition, there have been few studies in which acoustic features allowed discrimination of patients with depressive disorder. Vocal acoustic features as biomarker of depression could make differential diagnosis of patients with depressive state. In order to achieve differential diagnosis of depression, in this preliminary study, we examined whether vocal acoustic features could allow discrimination between depressive patients and healthy controls. METHODS Subjects were 36 patients who met the criteria for major depressive disorder and 36 healthy controls with no current or past psychiatric disorders. Voices of reading out digits before and after verbal fluency task were recorded. Voices were analyzed using OpenSMILE. The extracted acoustic features, including MFCCs, were used for group comparison and discriminant analysis between patients and controls. RESULTS The second dimension of MFCC (MFCC 2) was significantly different between groups and allowed the discrimination between patients and controls with a sensitivity of 77.8% and a specificity of 86.1%. The difference in MFCC 2 between the two groups reflected an energy difference of frequency around 2000-3000Hz. CONCLUSIONS The MFCC 2 was significantly different between depressive patients and controls. This feature could be a useful biomarker to detect major depressive disorder. LIMITATIONS Sample size was relatively small. Psychotropics could have a confounding effect on voice.
Collapse
Affiliation(s)
- Takaya Taguchi
- Department of Psychiatry, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Japan; University of Tsukuba Hospital, Japan
| | - Hirokazu Tachikawa
- Department of Psychiatry, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Japan; Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Japan.
| | - Kiyotaka Nemoto
- Department of Psychiatry, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Japan; Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Japan
| | | | | | | | - Masafumi Nishimura
- Graduate School of Integrated Science and Technology, Shizuoka University, Japan
| | - Tetsuaki Arai
- Department of Psychiatry, Graduate School of Comprehensive Human Sciences, University of Tsukuba, Japan; Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Japan
| |
Collapse
|
15
|
Traser L, Birkholz P, Flügge TV, Kamberger R, Burdumy M, Richter B, Korvink JG, Echternach M. Relevance of the Implementation of Teeth in Three-Dimensional Vocal Tract Models. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:2379-2393. [PMID: 28898358 DOI: 10.1044/2017_jslhr-s-16-0395] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 02/23/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE Recently, efforts have been made to investigate the vocal tract using magnetic resonance imaging (MRI). Due to technical limitations, teeth were omitted in many previous studies on vocal tract acoustics. However, the knowledge of how teeth influence vocal tract acoustics might be important in order to estimate the necessity of implementing teeth in vocal tract models. The aim of this study was therefore to estimate the effect of teeth on vocal tract acoustics. METHOD The acoustic properties of 18 solid (3-dimensional printed) vocal tract models without teeth were compared to the same 18 models including teeth in terms of resonance frequencies (fRn). The fRn were obtained from the transfer functions of these models excited by white noise at the glottis level. The models were derived from MRI data of 2 trained singers performing 3 different vowel conditions (/i/, /a/, and /u/) in speech and low-pitched and high-pitched singing. RESULTS Depending on the oral configuration, models exhibiting side cavities or side branches were characterized by major changes in the transfer function when teeth were implemented via the introduction of pole-zero pairs. CONCLUSIONS To avoid errors in modeling, teeth should be included in 3-dimensional vocal tract models for acoustic evaluation. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.5386771.
Collapse
Affiliation(s)
- Louisa Traser
- Institute of Musicians' Medicine, Freiburg University Medical Center, Germany
- Department of Otolaryngology, Freiburg University Medical Center, Germany
- Faculty of Medicine, University of Freiburg, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität, Dresden, Germany
| | - Tabea Viktoria Flügge
- Faculty of Medicine, University of Freiburg, Germany
- Department of Craniomaxillofacial Surgery, Freiburg University Medical Center, Germany
| | - Robert Kamberger
- Laboratory of Simulation, Department of Microsystems Engineering-IMTEK, University of Freiburg, Germany
| | - Michael Burdumy
- Faculty of Medicine, University of Freiburg, Germany
- Department of Medical Physics, Radiology, Freiburg University Medical Center, Germany
| | - Bernhard Richter
- Institute of Musicians' Medicine, Freiburg University Medical Center, Germany
- Faculty of Medicine, University of Freiburg, Germany
| | - Jan Gerrit Korvink
- Institute of Microstructure Technology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Matthias Echternach
- Institute of Musicians' Medicine, Freiburg University Medical Center, Germany
- Faculty of Medicine, University of Freiburg, Germany
| |
Collapse
|
16
|
Fleck RJ, Ishman SL, Shott SR, Gutmark EJ, McConnell KB, Mahmoud M, Mylavarapu G, Subramaniam DR, Szczesniak R, Amin RS. Dynamic Volume Computed Tomography Imaging of the Upper Airway in Obstructive Sleep Apnea. J Clin Sleep Med 2017; 13:189-196. [PMID: 27784422 PMCID: PMC5263074 DOI: 10.5664/jcsm.6444] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 09/14/2016] [Indexed: 12/22/2022]
Abstract
STUDY OBJECTIVES To describe a dynamic three-dimensional (3D) computed tomography (CT) technique for the upper airway and compare the required radiation dose to that used for common clinical studies of a similar anatomical area, such as for subjects undergoing routine clinical facial CT. METHODS Dynamic upper-airway CT was performed on eight subjects with persistent obstructive sleep apnea, four of whom were undergoing magnetic resonance imaging and an additional four subjects who had a contraindication to magnetic resonance imaging. This Health Insurance Portability and Accountability Act-compliant study was approved by our institutional review board, and informed consent was obtained. The control subjects (n = 41) for comparison of radiation dose were obtained from a retrospective review of the clinical picture-archiving computer system to identify 10 age-matched patients per age-based control group undergoing facial CT. RESULTS Dynamic 3D CT can be performed with an effective radiation dose of less than 0.38 mSv, a dose that is less than or comparable to that used for clinical facial CT. The resulting data- set is a uniquely complete, dynamic 3D volume of the upper airway through a full respiratory cycle that can be processed for clinical and modeling analyses. CONCLUSIONS A dynamic 3D CT technique of the upper airway is described that can be performed with a clinically reasonable radiation dose and sets a benchmark for future use.
Collapse
Affiliation(s)
- Robert J. Fleck
- Division of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Stacey L. Ishman
- Division of Pediatric Otolaryngology - Head and Neck Surgery, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati School of Medicine, Cincinnati, OH
| | - Sally R. Shott
- Division of Pediatric Otolaryngology - Head and Neck Surgery, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati School of Medicine, Cincinnati, OH
| | - Ephraim J. Gutmark
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati School of Medicine, Cincinnati, OH
- Department of Aerospace Engineering and Engineering Mechanics, CEAS, University of Cincinnati, Cincinnati, OH
| | - Keith B. McConnell
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Mohamed Mahmoud
- Division of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- Department of Anesthesia, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Goutham Mylavarapu
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Dhananjay R. Subramaniam
- Department of Aerospace Engineering and Engineering Mechanics, CEAS, University of Cincinnati, Cincinnati, OH
| | - Rhonda Szczesniak
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
| | - Raouf S. Amin
- Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
- Department of Pediatrics, University of Cincinnati School of Medicine, Cincinnati, OH
| |
Collapse
|
17
|
Whole heart coronary imaging with flexible acquisition window and trigger delay. PLoS One 2015; 10:e0112020. [PMID: 25719750 PMCID: PMC4342264 DOI: 10.1371/journal.pone.0112020] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/27/2014] [Indexed: 11/18/2022] Open
Abstract
Coronary magnetic resonance imaging (MRI) requires a correctly timed trigger delay derived from a scout cine scan to synchronize k-space acquisition with the quiescent period of the cardiac cycle. However, heart rate changes between breath-held cine and free-breathing coronary imaging may result in inaccurate timing errors. Additionally, the determined trigger delay may not reflect the period of minimal motion for both left and right coronary arteries or different segments. In this work, we present a whole-heart coronary imaging approach that allows flexible selection of the trigger delay timings by performing k-space sampling over an enlarged acquisition window. Our approach addresses coronary motion in an interactive manner by allowing the operator to determine the temporal window with minimal cardiac motion for each artery region. An electrocardiogram-gated, k-space segmented 3D radial stack-of-stars sequence that employs a custom rotation angle is developed. An interactive reconstruction and visualization platform is then employed to determine the subset of the enlarged acquisition window for minimal coronary motion. Coronary MRI was acquired on eight healthy subjects (5 male, mean age = 37 ± 18 years), where an enlarged acquisition window of 166–220 ms was set 50 ms prior to the scout-derived trigger delay. Coronary visualization and sharpness scores were compared between the standard 120 ms window set at the trigger delay, and those reconstructed using a manually adjusted window. The proposed method using manual adjustment was able to recover delineation of five mid and distal right coronary artery regions that were otherwise not visible from the standard window, and the sharpness scores improved in all coronary regions using the proposed method. This paper demonstrates the feasibility of a whole-heart coronary imaging approach that allows interactive selection of any subset of the enlarged acquisition window for a tailored reconstruction for each branch region.
Collapse
|
18
|
A Comparison of Different Methods to Generate Tooth Surface Models Without Applying Ionizing Radiation for Digital 3-Dimensional Image Fusion With Magnetic Resonance Imaging–Based Data of the Head and Neck Region. J Comput Assist Tomogr 2015; 39:882-9. [PMID: 26295193 DOI: 10.1097/rct.0000000000000293] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Traser L, Burdumy M, Richter B, Vicari M, Echternach M. Weight-bearing MR imaging as an option in the study of gravitational effects on the vocal tract of untrained subjects in singing phonation. PLoS One 2014; 9:e112405. [PMID: 25379885 PMCID: PMC4224454 DOI: 10.1371/journal.pone.0112405] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 10/15/2014] [Indexed: 11/18/2022] Open
Abstract
Magnetic Resonance Imaging (MRI) of subjects in a supine position can be used to evaluate the configuration of the vocal tract during phonation. However, studies of speech phonation have shown that gravity can affect vocal tract shape and bias measurements. This is one of the reasons that MRI studies of singing phonation have used professionally trained singers as subjects, because they are generally considered to be less affected by the supine body position and environmental distractions. A study of untrained singers might not only contribute to the understanding of intuitive singing function and aid the evaluation of potential hazards for vocal health, but also provide insights into the effect of the supine position on singers in general. In the present study, an open configuration 0.25 T MRI system with a rotatable examination bed was used to study the effect of body position in 20 vocally untrained subjects. The subjects were asked to sing sustained tones in both supine and upright body positions on different pitches and in different register conditions. Morphometric measurements were taken from the acquired images of a sagittal slice depicting the vocal tract. The analysis concerning the vocal tract configuration in the two body positions revealed differences in 5 out of 10 measured articulatory parameters. In the upright position the jaw was less protruded, the uvula was elongated, the larynx more tilted and the tongue was positioned more to the front of the mouth than in the supine position. The findings presented are in agreement with several studies on gravitational effects in speech phonation, but contrast with the results of a previous study on professional singers of our group where only minor differences between upright and supine body posture were observed. The present study demonstrates that imaging of the vocal tract using weight-bearing MR imaging is a feasible tool for the study of sustained phonation in singing for vocally untrained subjects.
Collapse
Affiliation(s)
- Louisa Traser
- Institute of Musicians' Medicine, University Medical Center, Freiburg, Germany; Department of Oto-Rhino-Laryngology, Head and Neck Surgery, University Medical Center, Freiburg, Germany
| | - Michael Burdumy
- Institute of Musicians' Medicine, University Medical Center, Freiburg, Germany; Department of Radiology, Medical Physics, University Medical Center, Freiburg, Germany
| | - Bernhard Richter
- Institute of Musicians' Medicine, University Medical Center, Freiburg, Germany
| | - Marco Vicari
- Fraunhofer MEVIS, Bremen, Germany; Esaote S.p.A., Genoa, Italy
| | - Matthias Echternach
- Institute of Musicians' Medicine, University Medical Center, Freiburg, Germany
| |
Collapse
|
20
|
Scott AD, Wylezinska M, Birch MJ, Miquel ME. Speech MRI: morphology and function. Phys Med 2014; 30:604-18. [PMID: 24880679 DOI: 10.1016/j.ejmp.2014.05.001] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 04/24/2014] [Accepted: 05/01/2014] [Indexed: 11/27/2022] Open
Abstract
Magnetic Resonance Imaging (MRI) plays an increasing role in the study of speech. This article reviews the MRI literature of anatomical imaging, imaging for acoustic modelling and dynamic imaging. It describes existing imaging techniques attempting to meet the challenges of imaging the upper airway during speech and examines the remaining hurdles and future research directions.
Collapse
Affiliation(s)
- Andrew D Scott
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; NIHR Cardiovascular Biomedical Research Unit, The Royal Brompton Hospital, Sydney Street, London SW3 6NP, United Kingdom
| | - Marzena Wylezinska
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom
| | - Malcolm J Birch
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom
| | - Marc E Miquel
- Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom; Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom.
| |
Collapse
|
21
|
Zhu Y, Kim YC, Proctor MI, Narayanan SS, Nayak KS. Dynamic 3-D visualization of vocal tract shaping during speech. IEEE TRANSACTIONS ON MEDICAL IMAGING 2013; 32:838-848. [PMID: 23204279 PMCID: PMC3896513 DOI: 10.1109/tmi.2012.2230017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Noninvasive imaging is widely used in speech research as a means to investigate the shaping and dynamics of the vocal tract during speech production. 3-D dynamic MRI would be a major advance, as it would provide 3-D dynamic visualization of the entire vocal tract. We present a novel method for the creation of 3-D dynamic movies of vocal tract shaping based on the acquisition of 2-D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2-D real-time movies with synchronized audio recordings are acquired for English vowel-consonant-vowel stimuli /ala/, /a.ιa/, /asa/, and /a∫a/. Audio data are aligned using mel-frequency cepstral coefficients (MFCC) extracted from windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using dynamic time warping (DTW). The aligned image sequences enable dynamic 3-D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. The resulting volumes allow for dynamic 3-D visualization of salient aspects of lingual articulation, including the formation of tongue grooves and sublingual cavities, with a temporal resolution of 78 ms.
Collapse
Affiliation(s)
- Yinghua Zhu
- Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA.
| | | | | | | | | |
Collapse
|