1
|
Fan X, Chen R, Huang H, Zhang G, Zhou S, Chen X, Zhao Y, Diao Y, Pan S, Zhang F, Sun Y, Zhou F. Classification and prognostic factors of patients with cervical spondylotic myelopathy after surgical treatment: a cluster analysis. Sci Rep 2024; 14:99. [PMID: 38167939 PMCID: PMC10762243 DOI: 10.1038/s41598-023-49477-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
Identifying potential prognostic factors of CSM patients could improve doctors' clinical decision-making ability. The study retrospectively collected the baseline data of population characteristics, clinical symptoms, physical examination, neurological function and quality of life scores of patients with CSM based on the clinical big data research platform. The modified Japanese Orthopedic Association (mJOA) score and SF-36 score from the short-term follow-up data were entered into the cluster analysis to characterize postoperative residual symptoms and quality of life. Four clusters were yielded representing different patterns of residual symptoms and quality of patients' life. Patients in cluster 2 (mJOA RR 55.8%) and cluster 4 (mJOA RR 55.8%) were substantially improved and had better quality of life. The influencing factors for the better prognosis of patients in cluster 2 were young age (50.1 ± 11.8), low incidence of disabling claudication (5.0%) and pathological signs (63.0%), and good preoperative SF36-physiological function score (73.1 ± 24.0) and mJOA socre (13.7 ± 2.8); and in cluster 4 the main influencing factor was low incidence of neck and shoulder pain (11.7%). We preliminarily verified the reliability of the clustering results with the long-term follow-up data and identified the preoperative features that were helpful to predict the prognosis of the patients. This study provided reference and research basis for further study with a larger sample data, extracting more patient features, selecting more follow-up nodes, and improving clustering algorithm.
Collapse
Affiliation(s)
- Xiao Fan
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Rui Chen
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Haoge Huang
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Gangqiang Zhang
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Shuai Zhou
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Xin Chen
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Yanbin Zhao
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Yinze Diao
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Shengfa Pan
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Fengshan Zhang
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Yu Sun
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Feifei Zhou
- Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China.
- Engineering Research Center of Bone and Joint Precision Medicine, 49 North Garden Road, Haidian District, Beijing, 100191, China.
- Beijing Key Laboratory of Spinal Disease Research, 49 North Garden Road, Haidian District, Beijing, 100191, China.
| |
Collapse
|
2
|
Ito T, Morooka H, Takahashi H, Fujii H, Iwaki M, Hayashi H, Toyoda H, Oeda S, Hyogo H, Kawanaka M, Morishita A, Munekage K, Kawata K, Tsutsumi T, Sawada K, Maeshiro T, Tobita H, Yoshida Y, Naito M, Araki A, Arakaki S, Kawaguchi T, Noritake H, Ono M, Masaki T, Yasuda S, Tomita E, Yoneda M, Tokushige A, Ishigami M, Kamada Y, Ueda S, Aishima S, Sumida Y, Nakajima A, Okanoue T. Identification of clinical phenotypes associated with poor prognosis in patients with nonalcoholic fatty liver disease via unsupervised machine learning. J Gastroenterol Hepatol 2023; 38:1832-1839. [PMID: 37596843 DOI: 10.1111/jgh.16326] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 07/22/2023] [Accepted: 08/01/2023] [Indexed: 08/20/2023]
Abstract
BACKGROUND AND AIMS Both fibrosis status and body weight are important for assessing prognosis in nonalcoholic fatty liver disease (NAFLD). The aim of this study was to identify population clusters for specific clinical outcomes based on fibrosis-4 (FIB-4) index and body mass index (BMI) using an unsupervised machine learning method. METHODS We conducted a multicenter study of 1335 biopsy-proven NAFLD patients from Japan. Using the Gaussian mixture model to divide the cohort into clusters based on FIB-4 index and BMI, we investigated prognosis for these clusters. RESULTS The cohort consisted of 223 cases (16.0%) with advanced fibrosis (F3-4) as assessed from liver biopsy. Median values of BMI and FIB-4 index were 27.3 kg/m2 and 1.67. The patients were divided into four clusters by Bayesian information criterion, and all-cause mortality was highest in cluster d, followed by cluster b (P = 0.001). Regarding the characteristics of each cluster, clusters d and b presented a high FIB-4 index (median 5.23 and 2.23), cluster a presented the lowest FIB-4 index (median 0.78), and cluster c was associated with moderate FIB-4 level (median 1.30) and highest BMI (median 34.3 kg/m2 ). Clusters a and c had lower mortality rates than clusters b and d. However, all-cause of death in clusters a and c was unrelated to liver disease. CONCLUSIONS Our clustering approach found that the FIB-4 index is an important predictor of mortality in NAFLD patients regardless of BMI. Additionally, non-liver-related diseases were identified as the causes of death in NAFLD patients with low FIB-4 index.
Collapse
Affiliation(s)
- Takanori Ito
- Department of Gastroenterology and Hepatology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Hikaru Morooka
- Department of Emergency and Critical Care Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | | | - Hideki Fujii
- Department of Hepatology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Michihiro Iwaki
- Division of Gastroenterology and Hepatology, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hideki Hayashi
- Department of Gastroenterology and Hepatology, Gifu Municipal Hospital, Gifu, Japan
| | - Hidenori Toyoda
- Department of Gastroenterology, Ogaki Municipal Hospital, Ogaki, Japan
| | - Satoshi Oeda
- Liver Center, Saga University Hospital, Saga, Japan
- Department of Laboratory Medicine, Saga University Hospital, Saga, Japan
| | - Hideyuki Hyogo
- Department of Gastroenterology, JA Hiroshima Kouseiren General Hospital, Hiroshima, Japan
- Hyogo Life Care Clinic Hiroshima, Hiroshima, Japan
| | - Miwa Kawanaka
- Department of General Internal Medicine 2, Kawasaki Medical Center, Kawasaki Medical School, Okayama, Japan
| | - Asahiro Morishita
- Department of Gastroenterology and Neurology, Faculty of Medicine, Kagawa University, Takamatsu, Japan
| | - Kensuke Munekage
- Department of Gastroenterology and Hepatology, Kochi Medical School, Kochi, Japan
| | - Kazuhito Kawata
- Hepatology Division, Department of Internal Medicine II, Hamamatsu University School of Medicine, Shizuoka, Japan
| | - Tsubasa Tsutsumi
- Division of Gastroenterology, Department of Medicine, Kurume University School of Medicine, Kurume, Japan
| | - Koji Sawada
- Division of Metabolism and Biosystemic Science, Gastroenterology, and Hematology/Oncology, Asahikawa Medical University, Asahikawa, Japan
| | - Tatsuji Maeshiro
- First Department of Internal Medicine, University of the Ryukyus Hospital, Okinawa, Japan
| | - Hiroshi Tobita
- Division of Hepatology, Shimane University Hospital, Izumo, Japan
| | - Yuichi Yoshida
- Department of Gastroenterology and Hepatology, Suita Municipal Hospital, Osaka, Japan
| | - Masafumi Naito
- Department of Gastroenterology and Hepatology, Suita Municipal Hospital, Osaka, Japan
| | - Asuka Araki
- Division of Hepatology, Shimane University Hospital, Izumo, Japan
| | - Shingo Arakaki
- First Department of Internal Medicine, University of the Ryukyus Hospital, Okinawa, Japan
| | - Takumi Kawaguchi
- Division of Gastroenterology, Department of Medicine, Kurume University School of Medicine, Kurume, Japan
| | - Hidenao Noritake
- Hepatology Division, Department of Internal Medicine II, Hamamatsu University School of Medicine, Shizuoka, Japan
| | - Masafumi Ono
- Division of Innovative Medicine for Hepatobiliary and Pancreatology, Faculty of Medicine, Kagawa University, Takamatsu, Japan
| | - Tsutomu Masaki
- Department of Gastroenterology and Neurology, Faculty of Medicine, Kagawa University, Takamatsu, Japan
| | - Satoshi Yasuda
- Department of Gastroenterology, Ogaki Municipal Hospital, Ogaki, Japan
| | - Eiichi Tomita
- Department of Gastroenterology and Hepatology, Gifu Municipal Hospital, Gifu, Japan
| | - Masato Yoneda
- Division of Gastroenterology and Hepatology, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Akihiro Tokushige
- Department of Cardiovascular Medicine and Hypertension, Kagoshima University Graduate School of Medical and Dental Sciences, Kagoshima, Japan
| | - Masatoshi Ishigami
- Department of Gastroenterology and Hepatology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Yoshihiro Kamada
- Department of Advanced Metabolic Hepatology, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Shinichiro Ueda
- Department of Clinical Pharmacology and Therapeutics, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| | - Shinichi Aishima
- Department of Pathology and Microbiology, Faculty of Medicine, Saga University, Saga, Japan
| | - Yoshio Sumida
- Division of Hepatology and Pancreatology, Department of Internal Medicine, Aichi Medical University, Nagakute, Japan
| | - Atsushi Nakajima
- Division of Gastroenterology and Hepatology, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | |
Collapse
|
3
|
Hudon A, Beaudoin M, Phraxayavong K, Potvin S, Dumais A. Unsupervised Machine Learning Driven Analysis of Verbatims of Treatment-Resistant Schizophrenia Patients Having Followed Avatar Therapy. J Pers Med 2023; 13:jpm13050801. [PMID: 37240971 DOI: 10.3390/jpm13050801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023] Open
Abstract
(1) Background: The therapeutic mechanisms underlying psychotherapeutic interventions for individuals with treatment-resistant schizophrenia are mostly unknown. One of these treatment techniques is avatar therapy (AT), in which the patient engages in immersive sessions while interacting with an avatar representing their primary persistent auditory verbal hallucination. The aim of this study was to conduct an unsupervised machine-learning analysis of verbatims of treatment-resistant schizophrenia patients that have followed AT. The second aim of the study was to compare the data clusters obtained from the unsupervised machine-learning analysis with previously conducted qualitative analysis. (2) Methods: A k-means algorithm was performed over the immersive-session verbatims of 18 patients suffering from treatment-resistant schizophrenia who followed AT to cluster interactions of the avatar and the patient. Data were pre-processed using vectorization and data reduction. (3): Results: Three clusters of interactions were identified for the avatar's interactions whereas four clusters were identified for the patient's interactions. (4) Conclusion: This study was the first attempt to conduct unsupervised machine learning on AT and provided a quantitative insight into the inner interactions that take place during immersive sessions. The use of unsupervised machine learning could yield a better understanding of the type of interactions that take place in AT and their clinical implications.
Collapse
Affiliation(s)
- Alexandre Hudon
- Centre de Recherche de l'Institut Universitaire en Santé Mentale de Montréal, Montreal, QC H1N 3J4, Canada
- Department of Psychiatry and Addictology, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
| | - Mélissa Beaudoin
- Centre de Recherche de l'Institut Universitaire en Santé Mentale de Montréal, Montreal, QC H1N 3J4, Canada
- Department of Psychiatry and Addictology, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
- Faculty of Medicine and Health Sciences, McGill University, Montreal, QC H3G 2M1, Canada
| | - Kingsada Phraxayavong
- Centre de Recherche de l'Institut Universitaire en Santé Mentale de Montréal, Montreal, QC H1N 3J4, Canada
- Services et Recherches Psychiatriques AD, Montreal, QC H1C 1H1, Canada
| | - Stéphane Potvin
- Centre de Recherche de l'Institut Universitaire en Santé Mentale de Montréal, Montreal, QC H1N 3J4, Canada
- Department of Psychiatry and Addictology, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
| | - Alexandre Dumais
- Centre de Recherche de l'Institut Universitaire en Santé Mentale de Montréal, Montreal, QC H1N 3J4, Canada
- Department of Psychiatry and Addictology, Faculty of Medicine, Université de Montréal, Montreal, QC H3T 1J4, Canada
- Services et Recherches Psychiatriques AD, Montreal, QC H1C 1H1, Canada
- Institut National de Psychiatrie Légale Philippe-Pinel, Montreal, QC H1C 1H1, Canada
| |
Collapse
|
4
|
Zhang JK, Javeed S, Greenberg JK, Dibble CF, Song SK, Ray WZ. Diffusion Basis Spectrum Imaging Identifies Clinically Relevant Disease Phenotypes of Cervical Spondylotic Myelopathy. Clin Spine Surg 2023; 36:134-142. [PMID: 36959182 PMCID: PMC10042585 DOI: 10.1097/bsd.0000000000001451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 01/29/2023] [Indexed: 03/25/2023]
Abstract
STUDY DESIGN Prospective cohort study. OBJECTIVE Apply a machine learning clustering algorithm to baseline imaging data to identify clinically relevant cervical spondylotic myelopathy (CSM) patient phenotypes. SUMMARY OF BACKGROUND DATA A major shortcoming in improving care for CSM patients is the lack of robust quantitative imaging tools to guide surgical decision-making. Advanced diffusion-weighted magnetic resonance imaging (MRI) techniques, such as diffusion basis spectrum imaging (DBSI), may help address this limitation by providing detailed evaluations of white matter injury in CSM. METHODS Fifty CSM patients underwent comprehensive clinical assessments and diffusion-weighted MRI, followed by DBSI modeling. DBSI metrics included fractional anisotropy, axial and radial diffusivity, fiber fraction, extra-axonal fraction, restricted fraction, and nonrestricted fraction. Neurofunctional status was assessed by the modified Japanese Orthopedic Association, myelopathic disability index, and disabilities of the arm, shoulder, and hand. Quality-of-life was measured by the 36-Item Short Form Survey physical component summary and mental component summary. The neck disability index was used to measure self-reported neck pain. K-means clustering was applied to baseline DBSI measures to identify 3 clinically relevant CSM disease phenotypes. Baseline demographic, clinical, radiographic, and patient-reported outcome measures were compared among clusters using one-way analysis of variance (ANOVA). RESULTS Twenty-three (55%) mild, 9 (21%) moderate, and 10 (24%) severe myelopathy patients were enrolled. Eight patients were excluded due to MRI data of insufficient quality. Of the remaining 42 patients, 3 groups were generated by k-means clustering. When compared with clusters 1 and 2, cluster 3 performed significantly worse on the modified Japanese Orthopedic Association and all patient-reported outcome measures (P<0.001), except the 36-Item Short Form Survey mental component summary (P>0.05). Cluster 3 also possessed the highest proportion of non-Caucasian patients (43%, P=0.04), the worst hand dynamometer measurements (P<0.05), and significantly higher intra-axonal axial diffusivity and extra-axonal fraction values (P<0.001). CONCLUSIONS Using baseline imaging data, we delineated a clinically meaningful CSM disease phenotype, characterized by worse neurofunctional status, quality-of-life, and pain, and more severe imaging markers of vasogenic edema. LEVEL OF EVIDENCE II.
Collapse
Affiliation(s)
- Justin K. Zhang
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA
| | - Saad Javeed
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA
| | - Jacob K. Greenberg
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA
| | - Christopher F. Dibble
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA
| | - Sheng-Kwei Song
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA
| | - Wilson Z. Ray
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA
| |
Collapse
|
5
|
Boulcourt S, Badel A, Pionnier R, Neder Y, Ilharreborde B, Simon AL. A gait functional classification of adolescent idiopathic scoliosis (AIS) based on spatio-temporal parameters (STP). Gait Posture 2023; 102:50-55. [PMID: 36905785 DOI: 10.1016/j.gaitpost.2023.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 02/04/2023] [Accepted: 03/06/2023] [Indexed: 03/13/2023]
Abstract
BACKGROUND Therapeutic decisions for patients with adolescent idiopathic scoliosis (AIS) are mostly based on static measurements performed on two-dimensional standing full-spine radiographs. However, the trunk plays an essential role in the human locomotion, and the functional consequences during daily activities of this specific and common spinal deformity are not factored in. RESEARCH QUESTION Does patients with AIS have specific gait patterns based on spatio-temporals parameters measurements ? METHODS 90 AIS patients (aged 10-18 years) with preoperative simplified gait analysis were retrospectively included for analysis between 2017 and 2020. Spatio-temporal parameters (STP) were measured on a 3-m baropodometric gaitway and included the measurement of 15 normalized gait parameters. A hierarchical cluster analysis was performed to identify group of patients based on the similarities of their gait patterns, and functional variables' inter-group differences were also measured. The subject distribution was calculated to identify the structural characteristics of the subjects according to their gait patterns. RESULTS Three gait patterns were identified. Cluster 1 (46%) was defined by asymmetry, Cluster 2 (16%) by instability, and Cluster 3 (36%) by variability. Each cluster was significantly different from the other ones on at least 6 different parameters (p < 0.05). Furthermore, each cluster was associated with one type of curve: Lenke 1 for Cluster 1 (57.5%), Lenke 6 for Cluster 2 (40%) and Lenke 5 for Cluster 3 (43.5%). SIGNIFICANCE Patients with severe AIS have a dynamic signature during gait identified on STP. Understanding consequences of this deformity on gait may be an interesting avenue to study the pathological mechanisms involved in their dynamic motor organization. Furthermore, these results might also be a first step to study the effectiveness of the different therapies.
Collapse
Affiliation(s)
- Sarah Boulcourt
- Plateforme d'Analyse de la Marche (PAM), Robert Debré University Hospital, Assistance Publique des Hôpitaux de Paris (AP-HP), Paris, France
| | - Anne Badel
- Unité de Biologie Fonctionnelle et Adaptative (BFA), UMR 8251, CNRS, ERL 1133, Inserm, Paris, France; Université Paris Cité, Paris, France
| | - Raphaël Pionnier
- Unité Fonctionnelle d'Analyse du Mouvement (UFAM), Hôpitaux Nationaux de Saint-Maurice, Saint-Maurice, France
| | - Yamile Neder
- Plateforme d'Analyse de la Marche (PAM), Robert Debré University Hospital, Assistance Publique des Hôpitaux de Paris (AP-HP), Paris, France; Department of Pediatric Orthopaedic Surgery, Robert Debré University Hospital, Assistance Publique des Hôpitaux de Paris (AP-HP), Paris, France
| | - Brice Ilharreborde
- Université Paris Cité, Paris, France; Department of Pediatric Orthopaedic Surgery, Robert Debré University Hospital, Assistance Publique des Hôpitaux de Paris (AP-HP), Paris, France
| | - Anne-Laure Simon
- Plateforme d'Analyse de la Marche (PAM), Robert Debré University Hospital, Assistance Publique des Hôpitaux de Paris (AP-HP), Paris, France; Université Paris Cité, Paris, France; Department of Pediatric Orthopaedic Surgery, Robert Debré University Hospital, Assistance Publique des Hôpitaux de Paris (AP-HP), Paris, France.
| |
Collapse
|
6
|
Tong X, Feng X, Peng F, Niu H, Zhang X, Li X, Zhao Y, Liu A, Duan C. Rupture discrimination of multiple small (< 7 mm) intracranial aneurysms based on machine learning-based cluster analysis. BMC Neurol 2023; 23:45. [PMID: 36709247 PMCID: PMC9883873 DOI: 10.1186/s12883-023-03088-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Small multiple intracranial aneurysms (SMIAs) are known to be more prone to rupture than are single aneurysms. However, specific recommendations for patients with small MIAs are not included in the guidelines of the American Heart Association and American Stroke Association. In this study, we aimed to evaluate the feasibility of machine learning-based cluster analysis for discriminating the risk of rupture of SMIAs. METHODS This multi-institutional cross-sectional study included 1,427 SMIAs from 660 patients. Hierarchical cluster analysis guided patient classification based on patient-level characteristics. Based on the clusters and morphological features, machine learning models were constructed and compared to screen the optimal model for discriminating aneurysm rupture. RESULTS Three clusters with markedly different features were identified. Cluster 1 (n = 45) had the highest risk of subarachnoid hemorrhage (SAH) (75.6%) and was characterized by a higher prevalence of familiar IAs. Cluster 2 (n = 110) had a moderate risk of SAH (38.2%) and was characterized by the highest rate of SAH history and highest number of vascular risk factors. Cluster 3 (n = 505) had a relatively mild risk of SAH (17.6%) and was characterized by a lower prevalence of SAH history and lower number of vascular risk factors. Lasso regression analysis showed that compared with cluster 3, clusters 1 (odds ratio [OR], 7.391; 95% confidence interval [CI], 4.074-13.150) and 2 (OR, 3.014; 95% CI, 1.827-4.970) were at a higher risk of aneurysm rupture. In terms of performance, the area under the curve of the model was 0.828 (95% CI, 0.770-0.833). CONCLUSIONS An unsupervised machine learning-based algorithm successfully identified three distinct clusters with different SAH risk in patients with SMIAs. Based on the morphological factors and identified clusters, our proposed model has good discrimination ability for SMIA ruptures.
Collapse
Affiliation(s)
- Xin Tong
- grid.411617.40000 0004 0642 1244Department of Neurosurgery, Beijing Neurosurgical Institute and Beijing Tiantan Hospital, Capital Medical University, China National Clinical Research Center for Neurological Diseases, 119 Fanyang Road, Beijing, 100070 China
| | - Xin Feng
- grid.417404.20000 0004 1771 3058National Key Clinical Specialty, Department of Neurosurgery, Engineering Technology Research Center of Education Ministry of China, Guangdong Provincial Key Laboratory On Brain Function Repair and Regeneration, Neurosurgery Institute, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Fei Peng
- grid.411617.40000 0004 0642 1244Department of Neurosurgery, Beijing Neurosurgical Institute and Beijing Tiantan Hospital, Capital Medical University, China National Clinical Research Center for Neurological Diseases, 119 Fanyang Road, Beijing, 100070 China
| | - Hao Niu
- grid.411617.40000 0004 0642 1244Department of Neurosurgery, Beijing Neurosurgical Institute and Beijing Tiantan Hospital, Capital Medical University, China National Clinical Research Center for Neurological Diseases, 119 Fanyang Road, Beijing, 100070 China
| | - Xin Zhang
- grid.417404.20000 0004 1771 3058National Key Clinical Specialty, Department of Neurosurgery, Engineering Technology Research Center of Education Ministry of China, Guangdong Provincial Key Laboratory On Brain Function Repair and Regeneration, Neurosurgery Institute, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Xifeng Li
- grid.417404.20000 0004 1771 3058National Key Clinical Specialty, Department of Neurosurgery, Engineering Technology Research Center of Education Ministry of China, Guangdong Provincial Key Laboratory On Brain Function Repair and Regeneration, Neurosurgery Institute, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yuanli Zhao
- grid.449412.eDepartment of Neurosurgery, Peking University International Hospital, Beijing, China
| | - Aihua Liu
- grid.411617.40000 0004 0642 1244Department of Neurosurgery, Beijing Neurosurgical Institute and Beijing Tiantan Hospital, Capital Medical University, China National Clinical Research Center for Neurological Diseases, 119 Fanyang Road, Beijing, 100070 China
| | - Chuanzhi Duan
- grid.417404.20000 0004 1771 3058National Key Clinical Specialty, Department of Neurosurgery, Engineering Technology Research Center of Education Ministry of China, Guangdong Provincial Key Laboratory On Brain Function Repair and Regeneration, Neurosurgery Institute, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
7
|
Morooka H, Tanaka A, Inaguma D, Maruyama S. Clustering phosphate and iron-related markers and prognosis in dialysis patients. Clin Kidney J 2022; 15:328-337. [PMID: 35145647 PMCID: PMC8824794 DOI: 10.1093/ckj/sfab207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Indexed: 11/13/2022] Open
Abstract
Background Hyperphosphatemia in patients undergoing dialysis is common and is associated with mortality. Recently, the link between phosphate metabolism and iron dynamics has received increasing attention. However, the association between this relationship and prognosis remains largely unexplored. Methods We conducted an observational study of patients who initiated dialysis in the 17 centers participating in the Aichi Cohort Study of the Prognosis in Patients Newly Initiated into Dialysis. Data were available on sex, age, use of phosphate binder, drug history, medical history and laboratory data. After excluding patients with missing values of phosphate, hemoglobin, ferritin and transferrin saturation, we used the Gaussian mixture model to divide the cohort into clusters based on phosphate, hemoglobin, logarithmic ferritin and transferrin saturation. We investigated the prognosis of patients in these clusters. The primary outcome was all-cause death. In each cluster, the prognostic impact of phosphate binder was also studied. Results The study included 1175 patients with chronic kidney disease who initiated dialysis between October 2011 and September 2013. Among them, 785 were men and 390 were women, with a mean ± SD age of 67.9 ± 13.0 years. The patients were divided into three clusters, and mortality was higher in cluster c than in cluster a (P = 0.005). Moreover, the use of phosphate binders was associated with a lower risk of all-cause death in two clusters (a and c) that were characterized by older age and higher prevalence of diabetes mellitus, among other things. Conclusions We used an unsupervised machine learning method to cluster patients, using phosphate, hemoglobin and iron-related markers. In two of the clusters, the oral use of a phosphate binder might improve prognosis.
Collapse
Affiliation(s)
- Hikaru Morooka
- Division of Nephrology, Nagoya University Hospital, Nagoya, Japan
| | - Akihito Tanaka
- Division of Nephrology, Nagoya University Hospital, Nagoya, Japan
| | - Daijo Inaguma
- Division of Internal Medicine, Fujita Health University Bantane Hospital, Nagoya, Japan
| | - Shoichi Maruyama
- Division of Nephrology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
8
|
Thongprayoon C, Mao MA, Kattah AG, Keddis MT, Pattharanitima P, Erickson SB, Dillon JJ, Garovic VD, Cheungpasitporn W. Subtyping hospitalized patients with hypokalemia by machine learning consensus clustering and associated mortality risks. Clin Kidney J 2022; 15:253-261. [PMID: 35145640 PMCID: PMC8825225 DOI: 10.1093/ckj/sfab190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Indexed: 12/18/2022] Open
Abstract
Background Hospitalized patients with hypokalemia are heterogeneous and cluster analysis, an unsupervised machine learning methodology, may discover more precise and specific homogeneous groups within this population of interest. Our study aimed to cluster patients with hypokalemia at hospital admission using an unsupervised machine learning approach and assess the mortality risk among these distinct clusters. Methods We performed consensus clustering analysis based on demographic information, principal diagnoses, comorbidities and laboratory data among 4763 hospitalized adult patients with admission serum potassium ≤3.5 mEq/L. We calculated the standardized mean difference of each variable and used the cutoff of ±0.3 to identify each cluster's key features. We assessed the association of the hypokalemia cluster with hospital and 1-year mortality. Results Consensus cluster analysis identified three distinct clusters that best represented patients’ baseline characteristics. Cluster 1 had 1150 (32%) patients, cluster 2 had 1344 (28%) patients and cluster 3 had 1909 (40%) patients. Based on the standardized difference, patients in cluster 1 were younger, had less comorbidity burden but higher estimated glomerular filtration rate (eGFR) and higher hemoglobin; patients in cluster 2 were older, more likely to be admitted for cardiovascular disease and had higher serum sodium and chloride levels but lower eGFR, serum bicarbonate, strong ion difference (SID) and hemoglobin, while patients in cluster 3 were older, had a greater comorbidity burden, higher serum bicarbonate and SID but lower serum sodium, chloride and eGFR. Compared with cluster 1, cluster 2 had both higher hospital and 1-year mortality, whereas cluster 3 had higher 1-year mortality but comparable hospital mortality. Conclusion Our study demonstrated the use of consensus clustering analysis in the heterogeneous cohort of hospitalized hypokalemic patients to characterize their patterns of baseline clinical and laboratory data into three clinically distinct clusters with different mortality risks.
Collapse
Affiliation(s)
- Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Michael A Mao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Jacksonville, FL, USA
| | - Andrea G Kattah
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Mira T Keddis
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Phoenix, AZ, USA
| | | | - Stephen B Erickson
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - John J Dillon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Vesna D Garovic
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
9
|
Global longitudinal strain in heart failure with reduced ejection fraction: Prognostic relevance across disease severity as assessed by automated cluster analysis. Int J Cardiol 2021; 332:91-98. [PMID: 33713708 DOI: 10.1016/j.ijcard.2021.02.072] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 11/20/2022]
Abstract
BACKGROUND Ejection fraction (EF) is still widely used to categorize heart failure (HF) patients but has limitations. Global longitudinal strain (GLS) has emerged as a new prognosticator in HF, independent of EF. AIM We investigated the incremental predictive benefit of GLS over different risk profiles as identified by automated cluster analysis of simple echocardiographic parameters. METHODS AND RESULTS In 797 HFrEF patients (age 66 ± 12y; mean EF 30 ± 7%), unsupervised cluster analysis of 10 routine echocardiographic variables (without GLS) was performed. Median follow-up was 37 months. End-point was all-cause mortality. Association between risk profiles, GLS, and mortality was assessed by Cox proportional-hazard modeling with interaction term. Cluster analysis allocated patients to 3 different risk phenogroups (PG): PG-1 (mild diastolic dysfunction [DD], moderate systolic dysfunction, no pulmonary hypertension, normal right ventricular [RV] function); PG-2 (moderate DD, mild pulmonary hypertension, normal RV function); PG-3 (severe DD, advanced systolic dysfunction, pulmonary hypertension, RV dysfunction). Compared to PG-1, PG-2 and PG-3 showed increased adjusted-hazard ratio (1.71; 95% CI:1.05-2.77, P = 0.30; and 2.58; 95% CI:1.50-4.44, P < 0.001, respectively). GLS was independently associated with outcome in the whole population (adjusted-HR: 1.11; 95% CI: 1.05-1.17, P = 0.001); however, profile membership modified the relationship between GLS and outcome which was no longer significant in PG-3 (P for interaction = 0.003). CONCLUSIONS Within HFrEF populations, clustering of routine echocardiography parameters can automatically identify patients with different risk profiles; further assessment by GLS may be useful for patients with not advanced disease.
Collapse
|
10
|
Abrams ZB, Tally DG, Zhang L, Coombes CE, Payne PRO, Abruzzo LV, Coombes KR. Pattern recognition in lymphoid malignancies using CytoGPS and Mercator. BMC Bioinformatics 2021; 22:100. [PMID: 33648439 PMCID: PMC7923511 DOI: 10.1186/s12859-021-03992-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 02/02/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers. RESULTS In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database. CONCLUSION Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.
Collapse
Affiliation(s)
- Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| | - Dwayne G Tally
- The Center for Genomic Advocacy At Indiana State University, Terre Haute, IN, 47809, USA
| | - Lin Zhang
- Institute for Informatics, Washington University School of Medicine in St. Louis, St. Louis, MO, 63108, USA
| | - Caitlin E Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Philip R O Payne
- Institute for Informatics, Washington University School of Medicine in St. Louis, St. Louis, MO, 63108, USA
| | - Lynne V Abruzzo
- Department of Pathology, The Ohio State University, Columbus, OH, 43210, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| |
Collapse
|
11
|
Saha DK, Damaraju E, Rashid B, Abrol A, Plis SM, Calhoun VD. A Classification-Based Approach to Estimate the Number of Resting Functional Magnetic Resonance Imaging Dynamic Functional Connectivity States. Brain Connect 2021; 11:132-145. [PMID: 33317408 DOI: 10.1089/brain.2020.0794] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Aim: To determine the optimal number of connectivity states in dynamic functional connectivity analysis. Introduction: Recent work has focused on the study of dynamic (vs. static) brain connectivity in resting functional magnetic resonance imaging data. In this work, we focus on temporal correlation between time courses extracted from coherent networks called functional network connectivity (FNC). Dynamic FNC is most commonly estimated using a sliding window-based approach to capture short periods of FNC change. These data are then clustered to estimate transient connectivity patterns or states. Determining the number of states is a challenging problem. The elbow criterion is one of the widely used approaches to determine the connectivity states. Materials and Methods: In our work, we present an alternative approach that evaluates classification (e.g., healthy controls [HCs] vs. patients) as a measure to select the optimal number of states (clusters). We apply different classification strategies to perform classification between HCs and patients with schizophrenia for different numbers of states (i.e., varying the model order in the clustering algorithm). We compute cross-validated accuracy for different model orders to evaluate the classification performance. Results: Our results are consistent with our earlier work which shows that overall accuracy improves when dynamic connectivity measures are used separately or in combination with static connectivity measures. Results also show that the optimal model order for classification is different from that using the standard k-means model selection method, and that such optimization improves cross-validated accuracy. The optimal model order obtained from the proposed approach also gives significantly improved classification performance over the traditional model selection method. Conclusion: The observed results suggest that if one's goal is to perform classification, using the proposed approach as a criterion for selecting the optimal number of states in dynamic connectivity analysis leads to improved accuracy in hold-out data.
Collapse
Affiliation(s)
- Debbrata K Saha
- Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, Georgia, USA
| | - Eswar Damaraju
- Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, Georgia, USA
| | | | - Anees Abrol
- Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, Georgia, USA
| | - Sergey M Plis
- Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, Georgia, USA
| | - Vince D Calhoun
- Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
12
|
Abrams ZB, Coombes CE, Li S, Coombes KR. Mercator: A Pipeline For Multi-Method, Unsupervised Visualization And Distance Generation. Bioinformatics 2021; 37:2780-2781. [PMID: 33515233 PMCID: PMC8428582 DOI: 10.1093/bioinformatics/btab037] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 01/12/2021] [Accepted: 01/22/2021] [Indexed: 11/13/2022] Open
Abstract
Summary Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. Availabilityand implementation Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).
Collapse
Affiliation(s)
- Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | - Caitlin E Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA.,College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Suli Li
- Department of Operations Research and Information Engineering, College of Engineering, Cornell New York, USA NY 10044
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
13
|
Lamsal R, Katiyar S. cs-means: Determining optimal number of clusters based on a level-of-similarity. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-03582-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
14
|
Abrams ZB, Li S, Zhang L, Coombes CE, Payne PRO, Heerema NA, Abruzzo LV, Coombes KR. CytoGPS: A large-scale karyotype analysis of CML data. Cancer Genet 2020; 248-249:34-38. [PMID: 33059160 DOI: 10.1016/j.cancergen.2020.09.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 09/11/2020] [Accepted: 09/25/2020] [Indexed: 01/19/2023]
Abstract
Karyotyping, the practice of visually examining and recording chromosomal abnormalities, is commonly used to diagnose diseases of genetic origin, including cancers. Karyotypes are recorded as text written in the International System for Human Cytogenetic Nomenclature (ISCN). Downstream analysis of karyotypes is conducted manually, due to the visual nature of analysis and the linguistic structure of the ISCN. The ISCN has not been computer-readable and, as such, prevents the full potential of these genomic data from being realized. In response, we developed CytoGPS, a platform to analyze large volumes of cytogenetic data using a Loss-Gain-Fusion model that converts the human-readable ISCN karyotypes into a machine-readable binary format. As proof of principle, we applied CytoGPS to cytogenetic data from the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer, a National Cancer Institute hosted database of over 69,000 karyotypes of human cancers. Using the Jaccard coefficient to determine similarity between karyotypes structured as binary vectors, we were able to identify novel patterns from 4,968 Mitelman CML karyotypes, such as the co-occurrence of trisomy 19 and 21. The CytoGPS platform unlocks the potential for large-scale, comparative analysis of cytogenetic data. This methodological platform is freely available at CytoGPS.org.
Collapse
Affiliation(s)
- Zachary B Abrams
- Department of Biomedical Informatics, Wexner Medical Center, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH 43210, USA.
| | - Suli Li
- Department of Biomedical Informatics, Wexner Medical Center, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH 43210, USA
| | - Lin Zhang
- Institute for Informatics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| | - Caitlin E Coombes
- Department of Biomedical Informatics, Wexner Medical Center, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH 43210, USA
| | - Philip R O Payne
- Institute for Informatics, Washington University School of Medicine in St. Louis, St. Louis, MO 63108, USA
| | - Nyla A Heerema
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
| | - Lynne V Abruzzo
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, Wexner Medical Center, The Ohio State University, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH 43210, USA.
| |
Collapse
|
15
|
Coombes CE, Abrams ZB, Li S, Abruzzo LV, Coombes KR. Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia. J Am Med Inform Assoc 2020; 27:1019-1027. [PMID: 32483590 PMCID: PMC7647286 DOI: 10.1093/jamia/ocaa060] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Revised: 04/08/2020] [Accepted: 04/24/2020] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. METHODS To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. RESULTS In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age. CONCLUSIONS This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity.
Collapse
MESH Headings
- Adult
- Aged
- Aged, 80 and over
- Female
- Humans
- Immunoglobulin Heavy Chains/genetics
- Kaplan-Meier Estimate
- Leukemia, Lymphocytic, Chronic, B-Cell/immunology
- Leukemia, Lymphocytic, Chronic, B-Cell/metabolism
- Leukemia, Lymphocytic, Chronic, B-Cell/mortality
- Male
- Middle Aged
- Mutation
- Prognosis
- Proportional Hazards Models
- Unsupervised Machine Learning
- ZAP-70 Protein-Tyrosine Kinase/metabolism
Collapse
Affiliation(s)
- Caitlin E Coombes
- The Ohio State University College of Medicine, Columbus, Ohio, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Suli Li
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA
| | - Lynne V Abruzzo
- Department of Pathology, The Ohio State University, Columbus, Ohio, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
16
|
Way GP, Zietz M, Rubinetti V, Himmelstein DS, Greene CS. Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol 2020; 21:109. [PMID: 32393369 PMCID: PMC7212571 DOI: 10.1186/s13059-020-02021-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 04/16/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Unsupervised compression algorithms applied to gene expression data extract latent or hidden signals representing technical and biological sources of variation. However, these algorithms require a user to select a biologically appropriate latent space dimensionality. In practice, most researchers fit a single algorithm and latent dimensionality. We sought to determine the extent by which selecting only one fit limits the biological features captured in the latent representations and, consequently, limits what can be discovered with subsequent analyses. RESULTS We compress gene expression data from three large datasets consisting of adult normal tissue, adult cancer tissue, and pediatric cancer tissue. We train many different models across a large range of latent space dimensionalities and observe various performance differences. We identify more curated pathway gene sets significantly associated with individual dimensions in denoising autoencoder and variational autoencoder models trained using an intermediate number of latent dimensionalities. Combining compressed features across algorithms and dimensionalities captures the most pathway-associated representations. When trained with different latent dimensionalities, models learn strongly associated and generalizable biological representations including sex, neuroblastoma MYCN amplification, and cell types. Stronger signals, such as tumor type, are best captured in models trained at lower dimensionalities, while more subtle signals such as pathway activity are best identified in models trained with more latent dimensionalities. CONCLUSIONS There is no single best latent dimensionality or compression algorithm for analyzing gene expression data. Instead, using features derived from different compression models across multiple latent space dimensionalities enhances biological representations.
Collapse
Affiliation(s)
- Gregory P Way
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA, 19104, USA.
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, 19102, USA.
| |
Collapse
|
17
|
Asiaee A, Abrams ZB, Nakayiza S, Sampath D, Coombes KR. Explaining Gene Expression Using Twenty-One MicroRNAs. J Comput Biol 2019; 27:1157-1170. [PMID: 31794247 DOI: 10.1089/cmb.2019.0321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The transcriptome of a tumor contains detailed information about the disease. Although advances in sequencing technologies have generated larger data sets, there are still many questions about exactly how the transcriptome is regulated. One class of regulatory elements consists of microRNAs (or miRs), many of which are known to be associated with cancer. To better understand the relationships between miRs and cancers, we analyzed ∼9000 samples from 32 cancer types studied in The Cancer Genome Atlas. Our feature reduction algorithm found evidence for 21 biologically interpretable clusters of miRs, many of which were statistically associated with a specific type of cancer. Moreover, the clusters contain sufficient information to distinguish between most types of cancer. We then used linear models to measure, genome-wide, how much variation in gene expression could be explained by the 21 average expression values ("scores") of the clusters. Based on the ∼20,000 per-gene R2 values, we found that (1) mean differences between tissues of origin explain about 36% of variation; (2) the 21 miR cluster scores explain about 30% of the variation; and (3) combining tissue type with the miR scores explained about 56% of the total genome-wide variation in gene expression. Our analysis of poorly explained genes shows that they are enriched for olfactory receptor processes, sensory perception, and nervous system processing, which are necessary to receive and interpret signals from outside the organism. Therefore, it is reasonable for those genes to be always active and not get downregulated by miRs. In contrast, highly explained genes are characterized by genes translating to proteins necessary for transport, plasma membrane, or metabolic processes that are heavily regulated processes inside the cell. Other genetic regulatory elements such as transcription factors and methylation might help explain some of the remaining variation in gene expression.
Collapse
Affiliation(s)
- Amir Asiaee
- Mathematical Biosciences Institute, The Ohio State University, Columbus, Ohio, USA
| | - Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Samantha Nakayiza
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Deepa Sampath
- Division of Hematology, Department of Internal Medicine, The Ohio State University, Columbus, Ohio, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
18
|
Guilamet MCV, Bernauer M, Micek ST, Kollef MH. Cluster analysis to define distinct clinical phenotypes among septic patients with bloodstream infections. Medicine (Baltimore) 2019; 98:e15276. [PMID: 31008972 PMCID: PMC6494365 DOI: 10.1097/md.0000000000015276] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Prior attempts at identifying outcome determinants associated with bloodstream infection have employed a priori determined classification schemes based on readily identifiable microbiology, infection site, and patient characteristics. We hypothesized that even amongst this heterogeneous population, clinically relevant groupings can be described that transcend old a priori classifications.We applied cluster analysis to variables from three domains: patient characteristics, acuity of illness/clinical presentation and infection characteristics. We validated our clusters based on both content validity and predictive validity.Among 3715 patients with bloodstream infections from Barnes-Jewish Hospital (2008-2015), the most stable cluster arrangement occurred with the formation of 4 clusters. This clustering arrangement resulted in an approximately uniform distribution of the population: Cluster One "Surgical Outside Hospital Transfers" (21.5%), Cluster Two "Functional Immunocompromised Patients" (27.9%), Cluster Three "Women with Skin and Urinary Tract Infection" (28.7%) and Cluster Four "Acutely Sick Pneumonia" (21.8%). Staphylococcus aureus distributed primarily to Clusters Three (40%) and Four (25%), while nonfermenting Gram-negative bacteria grouped mainly in Clusters Two and Four (31% and 30%). More than half of the pneumonia cases occurred in Cluster Four. Clusters One and Two contained 33% and 31% respectively of the individuals receiving inappropriate antibiotic administration. Mortality was greatest for Cluster Four (33.8%, 27.4%, 19.2%, 44.6%; P < .001), while Cluster One patients were most likely to be discharged to a nursing home.Our results support the potential for machine learning methods to identify homogenous groupings in infectious diseases that transcend old a priori classifications. These methods may allow new clinical phenotypes to be identified potentially improving the severity staging and development of new treatments for complex infectious diseases.
Collapse
Affiliation(s)
- Maria Cristina Vazquez Guilamet
- Division of Pulmonary, Critical Care, and Sleep Medicine
- Division of Infectious Diseases, University of New Mexico Health Sciences Center, Albuquerque, NM
| | - Michael Bernauer
- Division of Health Sciences Library and Informatics Center, University of New Mexico, Albuquerque, NM
| | - Scott T. Micek
- Department of Pharmacy Practice, St. Louis College of Pharmacy, St. Louis, MO
| | - Marin H. Kollef
- Division of Pulmonary and Critical Care Medicine, Washington University School of Medicine, St. Louis, MO
| |
Collapse
|
19
|
Abrams ZB, Zucker M, Wang M, Asiaee Taheri A, Abruzzo LV, Coombes KR. Thirty biologically interpretable clusters of transcription factors distinguish cancer type. BMC Genomics 2018; 19:738. [PMID: 30305013 PMCID: PMC6180590 DOI: 10.1186/s12864-018-5093-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 09/19/2018] [Indexed: 12/27/2022] Open
Abstract
Background Transcription factors are essential regulators of gene expression and play critical roles in development, differentiation, and in many cancers. To carry out their regulatory programs, they must cooperate in networks and bind simultaneously to sites in promoter or enhancer regions of genes. We hypothesize that the mRNA co-expression patterns of transcription factors can be used both to learn how they cooperate in networks and to distinguish between cancer types. Results We recently developed a new algorithm, Thresher, that combines principal component analysis, outlier filtering, and von Mises-Fisher mixture models to cluster genes (in this case, transcription factors) based on expression, determining the optimal number of clusters in the process. We applied Thresher to the RNA-Seq expression data of 486 transcription factors from more than 10,000 samples of 33 kinds of cancer studied in The Cancer Genome Atlas (TCGA). We found that 30 clusters of transcription factors from a 29-dimensional principal component space were able to distinguish between most cancer types, and could separate tumor samples from normal controls. Moreover, each cluster of transcription factors could be either (i) linked to a tissue-specific expression pattern or (ii) associated with a fundamental biological process such as cell cycle, angiogenesis, apoptosis, or cytoskeleton. Clusters of the second type were more likely also to be associated with embryonically lethal mouse phenotypes. Conclusions Using our approach, we have shown that the mRNA expression patterns of transcription factors contain most of the information needed to distinguish different cancer types. The Thresher method is capable of discovering biologically interpretable clusters of genes. It can potentially be applied to other gene sets, such as signaling pathways, to decompose them into simpler, yet biologically meaningful, components. Electronic supplementary material The online version of this article (10.1186/s12864-018-5093-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA
| | - Mark Zucker
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA
| | - Min Wang
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA.,Mathematical Biosciences Institute, The Ohio State University, 1735 Neil Avenue, Columbus, 43210, OH, USA
| | - Amir Asiaee Taheri
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA.,Mathematical Biosciences Institute, The Ohio State University, 1735 Neil Avenue, Columbus, 43210, OH, USA
| | - Lynne V Abruzzo
- Department of Pathology, The Ohio State University, 129 Hamilton Hall, 1645 Neil Avenue, Columbus, 43210, OH, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA.
| |
Collapse
|