1
|
Jiang J, Zheng P, Li L. Identification of Prognostic and Immune Characteristics of Two Lung Adenocarcinoma Subtypes Based on TRPV Channel Family Genes. J Membr Biol 2024; 257:115-129. [PMID: 38150051 DOI: 10.1007/s00232-023-00300-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/21/2023] [Indexed: 12/28/2023]
Abstract
Lung adenocarcinoma (LUAD) is one of the deadliest malignant tumors worldwide. Transient receptor potential vanilloid (TRPV) channels take pivotal parts in many cancers, but their impact on LUAD remains unexplored. In this study, LUAD samples were classified into two subtypes according to the expression characteristics of TRPV1-6 genes, with LUAD subtype cluster2 exhibiting significantly higher survival rates than cluster1. Subsequently, analysis of differentially expressed genes (DEGs) was performed between cluster1 and cluster2, revealing enrichment of DEGs in channel activity and Ca2+ signaling pathways. We established a protein-protein interaction network based on DEGs and constructed a LUAD prognostic model by using Cox regression analysis based on genes corresponding to 170 protein nodes. The prognostic model demonstrated good predictive ability for patient prognosis, with higher survival rates observed in the low-risk (LR) group. The risk score was validated as an independent prognostic indicator, according to Cox regression analysis. A clinically applicable nomogram was plotted. Immunological analysis indicated that the LR and high-risk (HR) groups had varied proportions of immune cell infiltration. The immunotherapy prediction indicated that LUAD patients in LR group had a greater likelihood to benefit from immune checkpoint blockade therapy. Furthermore, we hypothesized that the expression patterns of feature genes in the LUAD model were related to the sensitivity to lung cancer therapeutic drugs TAS-6417 and Erlotinib. To sum up, our LUAD prognostic model possessed clinical applicability for prognosis and immunotherapy response prediction.
Collapse
Affiliation(s)
- Jianhua Jiang
- Department of Cardiothoracic Surgery, Jingmen People's Hospital, No.39 Xiangshan Avenue, Jingmen City, 448000, Hubei Province, China
| | - Pengchao Zheng
- Department of Cardiothoracic Surgery, Jingmen People's Hospital, No.39 Xiangshan Avenue, Jingmen City, 448000, Hubei Province, China.
| | - Lei Li
- Department of Cardiothoracic Surgery, Jingmen People's Hospital, No.39 Xiangshan Avenue, Jingmen City, 448000, Hubei Province, China.
| |
Collapse
|
2
|
Crombé A, Lecomte JC, Seux M, Banaste N, Gorincour G. Using the Textual Content of Radiological Reports to Detect Emerging Diseases: A Proof-of-Concept Study of COVID-19. J Imaging Inform Med 2024; 37:620-632. [PMID: 38343242 PMCID: PMC11031522 DOI: 10.1007/s10278-023-00949-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 04/20/2024]
Abstract
Changes in the content of radiological reports at population level could detect emerging diseases. Herein, we developed a method to quantify similarities in consecutive temporal groupings of radiological reports using natural language processing, and we investigated whether appearance of dissimilarities between consecutive periods correlated with the beginning of the COVID-19 pandemic in France. CT reports from 67,368 consecutive adults across 62 emergency departments throughout France between October 2019 and March 2020 were collected. Reports were vectorized using time frequency-inverse document frequency (TF-IDF) analysis on one-grams. For each successive 2-week period, we performed unsupervised clustering of the reports based on TF-IDF values and partition-around-medoids. Next, we assessed the similarities between this clustering and a clustering from two weeks before according to the average adjusted Rand index (AARI). Statistical analyses included (1) cross-correlation functions (CCFs) with the number of positive SARS-CoV-2 tests and advanced sanitary index for flu syndromes (ASI-flu, from open-source dataset), and (2) linear regressions of time series at different lags to understand the variations of AARI over time. Overall, 13,235 chest CT reports were analyzed. AARI was correlated with ASI-flu at lag = + 1, + 5, and + 6 weeks (P = 0.0454, 0.0121, and 0.0042, respectively) and with SARS-CoV-2 positive tests at lag = - 1 and 0 week (P = 0.0057 and 0.0001, respectively). In the best fit, AARI correlated with the ASI-flu with a lag of 2 weeks (P = 0.0026), SARS-CoV-2-positive tests in the same week (P < 0.0001) and their interaction (P < 0.0001) (adjusted R2 = 0.921). Thus, our method enables the automatic monitoring of changes in radiological reports and could help capturing disease emergence.
Collapse
Affiliation(s)
- Amandine Crombé
- IMADIS, Lyon, France.
- SARCOTARGET Team, University of Bordeaux, Inserm, UMR1312, BRIC, BoRdeaux Institute of Oncology, 146 Rue Léo Saignat, Bordeaux, F-33076, France.
- Department of Radiology, Pellegrin University Hospital, CHU Bordeaux, Place Amélie Raba-Léon, Bordeaux, F-33076, France.
| | - Jean-Christophe Lecomte
- IMADIS, Lyon, France
- Centre Aquitain d'Imagerie médicale, Mérignac, France
- Centre Hospitalier de Saintes, Saintes, France
- Clinique Mutualiste Bordeaux Pessac, Pessac, France
| | | | - Nathan Banaste
- IMADIS, Lyon, France
- Clinique Convert, Ramsay, Bourg en Bresse, France
| | | |
Collapse
|
3
|
Wang X, Rao J, Zhang L, Liu X, Zhang Y. Identification of circadian rhythm-related gene classification patterns and immune infiltration analysis in heart failure based on machine learning. Heliyon 2024; 10:e27049. [PMID: 38509983 PMCID: PMC10950509 DOI: 10.1016/j.heliyon.2024.e27049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/17/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024] Open
Abstract
Background Circadian rhythms play a key role in the failing heart, but the exact molecular mechanisms linking changes in the expression of circadian rhythm-related genes to heart failure (HF) remain unclear. Methods By intersecting differentially expressed genes (DEGs) between normal and HF samples in the Gene Expression Omnibus (GEO) database with circadian rhythm-related genes (CRGs), differentially expressed circadian rhythm-related genes (DE-CRGs) were obtained. Machine learning algorithms were used to screen for feature genes, and diagnostic models were constructed based on these feature genes. Subsequently, consensus clustering algorithms and non-negative matrix factorization (NMF) algorithms were used for clustering analysis of HF samples. On this basis, immune infiltration analysis was used to score the immune infiltration status between HF and normal samples as well as among different subclusters. Gene Set Variation Analysis (GSVA) evaluated the biological functional differences among subclusters. Results 13 CRGs showed differential expression between HF patients and normal samples. Nine feature genes were obtained through cross-referencing results from four distinct machine learning algorithms. Multivariate LASSO regression and external dataset validation were performed to select five key genes with diagnostic value, including NAMPT, SERPINA3, MAPK10, NPPA, and SLC2A1. Moreover, consensus clustering analysis could divide HF patients into two distinct clusters, which exhibited different biological functions and immune characteristics. Additionally, two subgroups were distinguished using the NMF algorithm based on circadian rhythm associated differentially expressed genes. Studies on immune infiltration showed marked variances in levels of immune infiltration between these subgroups. Subgroup A had higher immune scores and more widespread immune infiltration. Finally, the Weighted Gene Co-expression Network Analysis (WGCNA) method was utilized to discern the modules that had the closest association with the two observed subgroups, and hub genes were pinpointed via protein-protein interaction (PPI) networks. GRIN2A, DLG1, ERBB4, LRRC7, and NRG1 were circadian rhythm-related hub genes closely associated with HF. Conclusion This study provides valuable references for further elucidating the pathogenesis of HF and offers beneficial insights for targeting circadian rhythm mechanisms to regulate immune responses and energy metabolism in HF treatment. Five genes identified by us as diagnostic features could be potential targets for therapy for HF.
Collapse
Affiliation(s)
- Xuefu Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Jin Rao
- Department of Cardiothoracic Surgery, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Li Zhang
- Guangxi University, Nanning, China
| | | | - Yufeng Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
- Department of Cardiothoracic Surgery, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
4
|
Bushra AA, Kim D, Kan Y, Yi G. AutoSCAN: automatic detection of DBSCAN parameters and efficient clustering of data in overlapping density regions. PeerJ Comput Sci 2024; 10:e1921. [PMID: 38660211 PMCID: PMC11042006 DOI: 10.7717/peerj-cs.1921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 02/12/2024] [Indexed: 04/26/2024]
Abstract
The density-based clustering method is considered a robust approach in unsupervised clustering technique due to its ability to identify outliers, form clusters of irregular shapes and automatically determine the number of clusters. These unique properties helped its pioneering algorithm, the Density-based Spatial Clustering on Applications with Noise (DBSCAN), become applicable in datasets where various number of clusters of different shapes and sizes could be detected without much interference from the user. However, the original algorithm exhibits limitations, especially towards its sensitivity on its user input parameters minPts and ɛ. Additionally, the algorithm assigned inconsistent cluster labels to data objects found in overlapping density regions of separate clusters, hence lowering its accuracy. To alleviate these specific problems and increase the clustering accuracy, we propose two methods that use the statistical data from a given dataset's k-nearest neighbor density distribution in order to determine the optimal ɛ values. Our approach removes the burden on the users, and automatically detects the clusters of a given dataset. Furthermore, a method to identify the accurate border objects of separate clusters is proposed and implemented to solve the unpredictability of the original algorithm. Finally, in our experiments, we show that our efficient re-implementation of the original algorithm to automatically cluster datasets and improve the clustering quality of adjoining cluster members provides increase in clustering accuracy and faster running times when compared to earlier approaches.
Collapse
Affiliation(s)
- Adil Abdu Bushra
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| | - Dongyeon Kim
- Department of Artificial Intelligence, Dongguk University, Seoul, South Korea
| | - Yejin Kan
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| | - Gangman Yi
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
- Department of Artificial Intelligence, Dongguk University, Seoul, South Korea
- Division of AI Software Convergence, Dongguk University, Seoul, South Korea
| |
Collapse
|
5
|
Li D, Li X, Lv J, Li S. Creation of signatures and identification of molecular subtypes based on disulfidptosis-related genes for glioblastoma patients' prognosis and immunological activity. Asian J Surg 2024:S1015-9584(24)00299-9. [PMID: 38462406 DOI: 10.1016/j.asjsur.2024.02.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/23/2023] [Accepted: 02/02/2024] [Indexed: 03/12/2024] Open
Abstract
BACKGROUND In recent times, disulfidptosis, an intricate form of cellular demise, has garnered attention due to its impact on prognosis, tumor progression and treatment response. Nevertheless, the exact significance of disulfidptosis-related genes (DisRGs) in glioblastoma (GBM) remains enigmatic. METHODS The GEO and TCGA databases provided transcriptional and clinically relevant data on tumor samples, while the GTEx database provided data on healthy tissues. Disulfidptosis-related genes (DisRGs) were procured from previous scholarly investigations. The expression profile of DisRGs was initially scrutinized among patients diagnosed with GBM, subsequent to which their prognostic value was explored. Through consensus clustering, we constructed DisRGs-related clusters and gene subtypes. Our results established that the DisRG-related clusters had differentially expressed genes, resulting in a DisulfidptosisScore model, which had a positive prognostic value. RESULTS The differential expression profile of 24 DisRGs between GBM samples and healthy samples was acquired. Through consensus cluster analysis, two distinct disulfidptosis subtypes, namely DisRGcluster A and DisRGcluster B, were identified. Then, the DisulfidptosisScore model including 4 characteristic genes was constructed.Notably, patients with GBM assigned with lower score demonstrated a considerably longer overall survival (OS) compared to those with higher score. CONCLUSION We have effectively devised a prognostic model associated with disulfidptosis, presenting autonomous prognostic predictions for patients with GBM. These findings serve as a valuable addition to the current comprehension of disulfidptosis and offer fresh theoretical substantiation for the development of enhanced treatment strategies.
Collapse
Affiliation(s)
- Dongjun Li
- Department of Neurosurgery, Shengjing Hospital of China Medical University, No.39 Huaxiang Road, Tiexi District, Shenyang, 110000, Liaoning, People's Republic of China
| | - Xiaodong Li
- Department of Neurosurgery, Shengjing Hospital of China Medical University, No.39 Huaxiang Road, Tiexi District, Shenyang, 110000, Liaoning, People's Republic of China
| | - Jianfeng Lv
- Department of Neurosurgery, Shengjing Hospital of China Medical University, No.39 Huaxiang Road, Tiexi District, Shenyang, 110000, Liaoning, People's Republic of China
| | - Shaoyi Li
- Department of Neurosurgery, Shengjing Hospital of China Medical University, No.39 Huaxiang Road, Tiexi District, Shenyang, 110000, Liaoning, People's Republic of China.
| |
Collapse
|
6
|
Zhuang X, Moshi MA, Quinones O, Trenholm RA, Chang CL, Cordes D, Vanderford BJ, Vo V, Gerrity D, Oh EC. Spatial and Temporal Drug Usage Patterns in Wastewater Correlate with Socioeconomic and Demographic Indicators in Southern Nevada. medRxiv 2024:2024.02.02.24302241. [PMID: 38352613 PMCID: PMC10863018 DOI: 10.1101/2024.02.02.24302241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Evaluating drug use within populations in the United States poses significant challenges due to various social, ethical, and legal constraints, often impeding the collection of accurate and timely data. Here, we aimed to overcome these barriers by conducting a comprehensive analysis of drug consumption trends and measuring their association with socioeconomic and demographic factors. From May 2022 to April 2023, we analyzed 208 wastewater samples from eight sampling locations across six wastewater treatment plants in Southern Nevada, covering a population of 2.4 million residents with 50 million annual tourists. Using bi-weekly influent wastewater samples, we employed mass spectrometry to detect 39 analytes, including pharmaceuticals and personal care products (PPCPs) and high risk substances (HRS). Our results revealed a significant increase over time in the level of stimulants such as cocaine (pFDR=1.40×10-10) and opioids, particularly norfentanyl (pFDR =1.66×10-12), while PPCPs exhibited seasonal variation such as peak usage of DEET, an active ingredient in insect repellents, during the summer (pFDR =0.05). Wastewater from socioeconomically disadvantaged or rural areas, as determined by Area Deprivation Index (ADI) and Rural-Urban Commuting Area Codes (RUCA) scores, demonstrated distinct overall usage patterns, such as higher usage/concentration of HRS, including cocaine (p=0.05) and norfentanyl (p=1.64×10-5). Our approach offers a near real-time, comprehensive tool to assess drug consumption and personal care product usage at a community level, linking wastewater patterns to socioeconomic and demographic factors. This approach has the potential to significantly enhance public health monitoring strategies in the United States.
Collapse
Affiliation(s)
- Xiaowei Zhuang
- Laboratory of Neurogenetics and Precision Medicine, College of Sciences, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Neuroscience Interdisciplinary Ph.D. program, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV
| | - Michael A. Moshi
- Laboratory of Neurogenetics and Precision Medicine, College of Sciences, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Neuroscience Interdisciplinary Ph.D. program, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
| | - Oscar Quinones
- Applied Research and Development Center, Southern Nevada Water Authority, P.O. Box 99954, Las Vegas NV, 89193, USA
| | - Rebecca A. Trenholm
- Applied Research and Development Center, Southern Nevada Water Authority, P.O. Box 99954, Las Vegas NV, 89193, USA
| | - Ching-Lan Chang
- Laboratory of Neurogenetics and Precision Medicine, College of Sciences, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Neuroscience Interdisciplinary Ph.D. program, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
| | - Dietmar Cordes
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV
| | - Brett J. Vanderford
- Applied Research and Development Center, Southern Nevada Water Authority, P.O. Box 99954, Las Vegas NV, 89193, USA
| | - Van Vo
- Laboratory of Neurogenetics and Precision Medicine, College of Sciences, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
| | - Daniel Gerrity
- Applied Research and Development Center, Southern Nevada Water Authority, P.O. Box 99954, Las Vegas NV, 89193, USA
| | - Edwin C. Oh
- Laboratory of Neurogenetics and Precision Medicine, College of Sciences, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Neuroscience Interdisciplinary Ph.D. program, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Department of Brain Health, Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
- Department of Internal Medicine, Kirk Kerkorian School of Medicine at UNLV, University of Nevada Las Vegas, Las Vegas, NV 89154
| |
Collapse
|
7
|
Chang H, Ashlock DA, Graether SP, Keller SM. Anchor Clustering for million-scale immune repertoire sequencing data. BMC Bioinformatics 2024; 25:42. [PMID: 38273275 PMCID: PMC10809746 DOI: 10.1186/s12859-024-05659-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 01/16/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND The clustering of immune repertoire data is challenging due to the computational cost associated with a very large number of pairwise sequence comparisons. To overcome this limitation, we developed Anchor Clustering, an unsupervised clustering method designed to identify similar sequences from millions of antigen receptor gene sequences. First, a Point Packing algorithm is used to identify a set of maximally spaced anchor sequences. Then, the genetic distance of the remaining sequences to all anchor sequences is calculated and transformed into distance vectors. Finally, distance vectors are clustered using unsupervised clustering. This process is repeated iteratively until the resulting clusters are small enough so that pairwise distance comparisons can be performed. RESULTS Our results demonstrate that Anchor Clustering is faster than existing pairwise comparison clustering methods while providing similar clustering quality. With its flexible, memory-saving strategy, Anchor Clustering is capable of clustering millions of antigen receptor gene sequences in just a few minutes. CONCLUSIONS This method enables the meta-analysis of immune-repertoire data from different studies and could contribute to a more comprehensive understanding of the immune repertoire data space.
Collapse
Affiliation(s)
- Haiyang Chang
- Department of Mathematics and Statistics, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada
| | - Daniel A Ashlock
- Department of Mathematics and Statistics, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada
| | - Steffen P Graether
- Department of Molecular and Cellular Biology, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada
| | - Stefan M Keller
- Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA.
| |
Collapse
|
8
|
Chen S, Li X, Ao W. Prognostic and immune infiltration features of disulfidptosis-related subtypes in breast cancer. BMC Womens Health 2024; 24:6. [PMID: 38166898 PMCID: PMC10763228 DOI: 10.1186/s12905-023-02823-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/01/2023] [Indexed: 01/05/2024] Open
Abstract
Breast cancer (BC) is a prominent cause of cancer incidence and mortality around the world. Disulfidptosis, a type of cell death, can induce tumor cell death. The purpose of this study was to analyze the potential impact of disulfidptosis-related genes (DRGs) on the prognosis and immune infiltration features of BC. Based on DRGs, we conducted an unsupervised clustering analysis on gene expression data of BC in TCGA-BRCA dataset and identified two BC subtypes, cluster1 and cluster2, with cluster1 showing a higher likelihood of favorable survival. Through immune analysis, we found that cluster1 had lower proportions of infiltration in immune-related cells, including aDCs, DCs, NK_cells, Th2_cells, and Treg. Based on the immunophenoscore (IPS) results, we inferred that cluster1 might benefit more from immune checkpoint inhibitors targeting CTLA-4 and PD1. Targeted small molecule prediction results showed that patients with cluster2 BC might respond better to antagonistic small molecule compounds, including clofazimine, lenalidomide, and epigallocatechin. Differentially expressed genes between the two subtypes were found to be enriched in signaling pathways related to steroid hormone biosynthesis, ovarian steroidogenesis, and neutrophil extracellular trap formation, according to enrichment analyses. In conclusion, this study identified BC subtypes based on DRGs so as to help predict patient prognosis and provide valuable tools for guiding clinical management and precise treatment of BC patients.
Collapse
Affiliation(s)
- Sheng Chen
- Oncology Department III, The Central Hospital of Xiaogan, No.6, Guangchang Road, Xiaogan City, 432000, Hubei Province, China
| | - Xiangrong Li
- Oncology Department III, The Central Hospital of Xiaogan, No.6, Guangchang Road, Xiaogan City, 432000, Hubei Province, China
| | - Wen Ao
- Oncology Department III, The Central Hospital of Xiaogan, No.6, Guangchang Road, Xiaogan City, 432000, Hubei Province, China.
| |
Collapse
|
9
|
Gharbi-Meliani A, Husson F, Vandendriessche H, Bayen E, Yaffe K, Bachoud-Lévi AC, Cleret de Langavant L. Identification of high likelihood of dementia in population-based surveys using unsupervised clustering: a longitudinal analysis. Alzheimers Res Ther 2023; 15:209. [PMID: 38031083 PMCID: PMC10688099 DOI: 10.1186/s13195-023-01357-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 11/21/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Dementia is defined as a cognitive decline that affects functional status. Longitudinal ageing surveys often lack a clinical diagnosis of dementia though measure cognition and daily function over time. We used unsupervised machine learning and longitudinal data to identify transition to probable dementia. METHODS Multiple Factor Analysis was applied to longitudinal function and cognitive data of 15,278 baseline participants (aged 50 years and more) from the Survey of Health, Ageing, and Retirement in Europe (SHARE) (waves 1, 2 and 4-7, between 2004 and 2017). Hierarchical Clustering on Principal Components discriminated three clusters at each wave. We estimated probable or "Likely Dementia" prevalence by sex and age, and assessed whether dementia risk factors increased the risk of being assigned probable dementia status using multistate models. Next, we compared the "Likely Dementia" cluster with self-reported dementia status and replicated our findings in the English Longitudinal Study of Ageing (ELSA) cohort (waves 1-9, between 2002 and 2019, 7840 participants at baseline). RESULTS Our algorithm identified a higher number of probable dementia cases compared with self-reported cases and showed good discriminative power across all waves (AUC ranged from 0.754 [0.722-0.787] to 0.830 [0.800-0.861]). "Likely Dementia" status was more prevalent in older people, displayed a 2:1 female/male ratio, and was associated with nine factors that increased risk of transition to dementia: low education, hearing loss, hypertension, drinking, smoking, depression, social isolation, physical inactivity, diabetes, and obesity. Results were replicated in ELSA cohort with good accuracy. CONCLUSIONS Machine learning clustering can be used to study dementia determinants and outcomes in longitudinal population ageing surveys in which dementia clinical diagnosis is lacking.
Collapse
Affiliation(s)
- Amin Gharbi-Meliani
- Neuropsychologie Interventionnelle, U955 E01, Institut Mondor de Recherche Biomédicale & Département d'études Cognitives, INSERM, Ecole Normale Supérieure, Université PSL, Université Paris-Est Créteil, Creteil, 94000, France
| | - François Husson
- Institut Agro, Univ Rennes1, CNRS, IRMAR, Rennes, 35000, France
| | - Henri Vandendriessche
- Laboratoire de Neurosciences Cognitives et Computationnelles, Département d'études Cognitives, Ecole Normale Supérieure, Université PSL, INSERM, Paris, 75005, France
| | - Eleonore Bayen
- Département de Rééducation Neurologique, Sorbonne Université, Hôpital Pitié-Salpêtrière-Assistance Publique Hôpitaux de Paris, Paris, 75013, France
- Global Brain Health Institute, University of California, San Francisco, CA, 94143, USA
| | - Kristine Yaffe
- Global Brain Health Institute, University of California, San Francisco, CA, 94143, USA
- Departments of Psychiatry, Neurology and Epidemiology and Biostatistics, University of California, San Francisco, CA, 94143, USA
| | - Anne-Catherine Bachoud-Lévi
- Neuropsychologie Interventionnelle, U955 E01, Institut Mondor de Recherche Biomédicale & Département d'études Cognitives, INSERM, Ecole Normale Supérieure, Université PSL, Université Paris-Est Créteil, Creteil, 94000, France
- Service de Neurologie, Centre de référence maladie de Huntington, Hôpital Henri Mondor, Assistance Publique Hôpitaux de Paris, 1 rue Gustave Eiffel, Creteil, 94000, France
| | - Laurent Cleret de Langavant
- Neuropsychologie Interventionnelle, U955 E01, Institut Mondor de Recherche Biomédicale & Département d'études Cognitives, INSERM, Ecole Normale Supérieure, Université PSL, Université Paris-Est Créteil, Creteil, 94000, France.
- Global Brain Health Institute, University of California, San Francisco, CA, 94143, USA.
- Service de Neurologie, Centre de référence maladie de Huntington, Hôpital Henri Mondor, Assistance Publique Hôpitaux de Paris, 1 rue Gustave Eiffel, Creteil, 94000, France.
| |
Collapse
|
10
|
Liu L, Han L, Dong L, He Z, Gao K, Chen X, Guo JC, Zhao Y. The hypoxia-associated genes in immune infiltration and treatment options of lung adenocarcinoma. PeerJ 2023; 11:e15621. [PMID: 37576511 PMCID: PMC10414028 DOI: 10.7717/peerj.15621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/01/2023] [Indexed: 08/15/2023] Open
Abstract
Background Lung adenocarcinoma (LUAD) is a common lung cancer with a poor prognosis under standard chemotherapy. Hypoxia is a crucial factor in the development of solid tumors, and hypoxia-related genes (HRGs) are closely associated with the proliferation of LUAD cells. Methods In this study, LUAD HRGs were screened, and bioinformatics analysis and experimental validation were conducted. The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) databases were used to gather LUAD RNA-seq data and accompanying clinical information. LUAD subtypes were identified by unsupervised cluster analysis, and immune infiltration analysis of subtypes was conducted by GSVA and ssGSEA. Cox regression and LASSO regression analyses were used to obtain prognosis-related HRGs. Prognostic analysis was used to evaluate HRGs. Differences in enrichment pathways and immunotherapy were observed between risk groups based on GSEA and the TIDE method. Finally, RT-PCR and in vitro experiments were used to confirm prognosis-related HRG expression in LUAD cells. Results Two hypoxia-associated subtypes of LUAD were distinguished, demonstrating significant differences in prognostic analysis and immunological characteristics between subtypes. A prognostic model based on six HRGs (HK1, PDK3, PFKL, SLC2A1, STC1, and XPNPEP1) was developed for LUAD. HK1, SLC2A1, STC1, and XPNPEP1 were found to be risk factors for LUAD. PDK3 and PFKL were protective factors in LUAD patients. Conclusion This study demonstrates the effect of hypoxia-associated genes on immune infiltration in LUAD and provides options for immunotherapy and therapeutic strategies in LUAD.
Collapse
Affiliation(s)
- Liu Liu
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Lina Han
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Lei Dong
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Zihao He
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Kai Gao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Xu Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Jin-Cheng Guo
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Yi Zhao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- The Research Center for Ubiquitous Computing Systems (CUbiCS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
11
|
Pan W, Long F, Pan J. ScInfoVAE: interpretable dimensional reduction of single cell transcription data with variational autoencoders and extended mutual information regularization. BioData Min 2023; 16:17. [PMID: 37301826 DOI: 10.1186/s13040-023-00333-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 06/05/2023] [Indexed: 06/12/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) data can serve as a good indicator of cell-to-cell heterogeneity and can aid in the study of cell growth by identifying cell types. Recently, advances in Variational Autoencoder (VAE) have demonstrated their ability to learn robust feature representations for scRNA-seq. However, it has been observed that VAEs tend to ignore the latent variables when combined with a decoding distribution that is too flexible. In this paper, we introduce ScInfoVAE, a dimensional reduction method based on the mutual information variational autoencoder (InfoVAE), which can more effectively identify various cell types in scRNA-seq data of complex tissues. A joint InfoVAE deep model and zero-inflated negative binomial distributed model design based on ScInfoVAE reconstructs the objective function to noise scRNA-seq data and learn an efficient low-dimensional representation of it. We use ScInfoVAE to analyze the clustering performance of 15 real scRNA-seq datasets and demonstrate that our method provides high clustering performance. In addition, we use simulated data to investigate the interpretability of feature extraction, and visualization results show that the low-dimensional representation learned by ScInfoVAE retains local and global neighborhood structure data well. In addition, our model can significantly improve the quality of the variational posterior.
Collapse
Affiliation(s)
- Weiquan Pan
- School of Mathematics and Statistics, Yulin Normal University, Yulin, China
| | - Faning Long
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China.
| | - Jian Pan
- School of Mathematics and Statistics, Yulin Normal University, Yulin, China
| |
Collapse
|
12
|
Tang L, Lei X, Hu H, Li Z, Zhu H, Zhan W, Zhang T. Investigation of fatty acid metabolism-related genes in breast cancer: Implications for Immunotherapy and clinical significance. Transl Oncol 2023; 34:101700. [PMID: 37247503 DOI: 10.1016/j.tranon.2023.101700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/21/2023] [Indexed: 05/31/2023] Open
Abstract
Breast cancer (BRCA) is a major global health issue, characterized by high mortality and low early diagnosis rates. The tumor immune microenvironment (TME) of BRCA is closely linked to fatty acid metabolism (FAM). This study aimed to identify FAM-related subtypes in BRCA based on gene expression and clinical data from the Cancer Genome Atlas (TCGA) database. The study found two distinct FAM-related subtypes, each with unique immune characteristics and prognostic implications. A FAM-related risk score prognostic model was developed and validated using TCGA and International Cancer Genome Consortium (GEO) cohorts, showing potential clinical applications for chemotherapy and immunotherapy. Additionally, a nomogram was established to facilitate clinical use of the risk score. These results highlight the significant correlation between FAM genes and TME in BRCA, and demonstrate the potential clinical utility of the FAM-related risk score in informing treatment decisions for BRCA patients.
Collapse
Affiliation(s)
- Liyang Tang
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China; The First Affiliated Hospital, Department of Pharmacy, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China; The First Affiliated Hospital, Chinese Traditional Medicine(TCM) research platform of major Epidemic Treatment base, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China
| | - Xiaoyong Lei
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China
| | - Haihong Hu
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China; The First Affiliated Hospital, Department of Pharmacy, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China; The First Affiliated Hospital, Chinese Traditional Medicine(TCM) research platform of major Epidemic Treatment base, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China
| | - Zhuo Li
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China; The First Affiliated Hospital, Department of Pharmacy, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China; The First Affiliated Hospital, Chinese Traditional Medicine(TCM) research platform of major Epidemic Treatment base, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China
| | - Hongxia Zhu
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China; The First Affiliated Hospital, Department of Pharmacy, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China; The First Affiliated Hospital, Chinese Traditional Medicine(TCM) research platform of major Epidemic Treatment base, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China
| | - Wendi Zhan
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China; The First Affiliated Hospital, Department of Pharmacy, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China; The First Affiliated Hospital, Chinese Traditional Medicine(TCM) research platform of major Epidemic Treatment base, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China
| | - Taolan Zhang
- School of Pharmacy, Hengyang Medical College, University of South China, 28 Western Changsheng Road, Hengyang, Hunan 421001, China; The First Affiliated Hospital, Department of Pharmacy, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China; The First Affiliated Hospital, Chinese Traditional Medicine(TCM) research platform of major Epidemic Treatment base, Hengyang Medical School, University of South China, 69 Chuanshan Road, Hengyang, Hunan, 421001, China.
| |
Collapse
|
13
|
Kyodo A, Kanaoka K, Keshi A, Nogi M, Nogi K, Ishihara S, Kamon D, Hashimoto Y, Nakada Y, Ueda T, Seno A, Nishida T, Onoue K, Soeda T, Kawakami R, Watanabe M, Nagai T, Anzai T, Saito Y. Heart failure with preserved ejection fraction phenogroup classification using machine learning. ESC Heart Fail 2023; 10:2019-2030. [PMID: 37051638 DOI: 10.1002/ehf2.14368] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 01/05/2023] [Accepted: 03/13/2023] [Indexed: 04/14/2023] Open
Abstract
AIMS Heart failure (HF) with preserved ejection fraction (HFpEF) is a complex syndrome with a poor prognosis. Phenotyping is required to identify subtype-dependent treatment strategies. Phenotypes of Japanese HFpEF patients are not fully elucidated, whose obesity is much less than Western patients. This study aimed to reveal model-based phenomapping using unsupervised machine learning (ML) for HFpEF in Japanese patients. METHODS AND RESULTS We studied 365 patients with HFpEF (left ventricular ejection fraction >50%) as a derivation cohort from the Nara Registry and Analyses for Heart Failure (NARA-HF), which registered patients with hospitalization by acute decompensated HF. We used unsupervised ML with a variational Bayesian-Gaussian mixture model (VBGMM) with common clinical variables. We also performed hierarchical clustering on the derivation cohort. We adopted 230 patients in the Japanese Heart Failure Syndrome with Preserved Ejection Fraction Registry as the validation cohort for VBGMM. The primary endpoint was defined as all-cause death and HF readmission within 5 years. Supervised ML was performed on the composite cohort of derivation and validation. The optimal number of clusters was three because of the probable distribution of VBGMM and the minimum Bayesian information criterion, and we stratified HFpEF into three phenogroups. Phenogroup 1 (n = 125) was older (mean age 78.9 ± 9.1 years) and predominantly male (57.6%), with the worst kidney function (mean estimated glomerular filtration rate 28.5 ± 9.7 mL/min/1.73 m2 ) and a high incidence of atherosclerotic factor. Phenogroup 2 (n = 200) had older individuals (mean age 78.8 ± 9.7 years), the lowest body mass index (BMI; 22.78 ± 3.94), and the highest incidence of women (57.5%) and atrial fibrillation (56.5%). Phenogroup 3 (n = 40) was the youngest (mean age 63.5 ± 11.2) and predominantly male (63.5 ± 11.2), with the highest BMI (27.46 ± 5.85) and a high incidence of left ventricular hypertrophy. We characterized these three phenogroups as atherosclerosis and chronic kidney disease, atrial fibrillation, and younger and left ventricular hypertrophy groups, respectively. At the primary endpoint, Phenogroup 1 demonstrated the worst prognosis (Phenogroups 1-3: 72.0% vs. 58.5% vs. 45%, P = 0.0036). We also successfully classified a derivation cohort into three similar phenogroups using VBGMM. Hierarchical and supervised clustering successfully showed the reproducibility of the three phenogroups. CONCLUSIONS ML could successfully stratify Japanese HFpEF patients into three phenogroups (atherosclerosis and chronic kidney disease, atrial fibrillation, and younger and left ventricular hypertrophy groups).
Collapse
Affiliation(s)
- Atsushi Kyodo
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Koshiro Kanaoka
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Ayaka Keshi
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Maki Nogi
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Kazutaka Nogi
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Satomi Ishihara
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Daisuke Kamon
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Yukihiro Hashimoto
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Yasuki Nakada
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Tomoya Ueda
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Ayako Seno
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Taku Nishida
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Kenji Onoue
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Tsuneari Soeda
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Rika Kawakami
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Makoto Watanabe
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| | - Toshiyuki Nagai
- Department of Cardiovascular Medicine, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Toshihisa Anzai
- Department of Cardiovascular Medicine, Hokkaido University Graduate School of Medicine, Sapporo, Japan
| | - Yoshihiko Saito
- Department of Cardiovascular Medicine, Nara Medical University, Kashihara, Japan
| |
Collapse
|
14
|
Dashtban A, Mizani MA, Pasea L, Denaxas S, Corbett R, Mamza JB, Gao H, Morris T, Hemingway H, Banerjee A. Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals. EBioMedicine 2023; 89:104489. [PMID: 36857859 PMCID: PMC9989643 DOI: 10.1016/j.ebiom.2023.104489] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/31/2023] [Accepted: 02/06/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING AstraZeneca UK Ltd, Health Data Research UK.
Collapse
Affiliation(s)
- Ashkan Dashtban
- Institute of Health Informatics, University College London, London, UK
| | - Mehrdad A Mizani
- Institute of Health Informatics, University College London, London, UK; British Heart Foundation Data Science Centre, Health Data Research UK, London, UK
| | - Laura Pasea
- Institute of Health Informatics, University College London, London, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
| | | | - Jil B Mamza
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - He Gao
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - Tamsin Morris
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK; Health Data Research UK, University College London, London, UK
| | - Amitava Banerjee
- Institute of Health Informatics, University College London, London, UK; Barts Health NHS Trust, London, UK; University College London Hospitals NHS Trust, London, UK.
| |
Collapse
|
15
|
Ma C, Tu D, Xu Q, Wu Y, Song X, Guo Z, Zhao X. Identification of m 7G regulator-mediated RNA methylation modification patterns and related immune microenvironment regulation characteristics in heart failure. Clin Epigenetics 2023; 15:22. [PMID: 36782329 PMCID: PMC9926673 DOI: 10.1186/s13148-023-01439-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/05/2023] [Indexed: 02/15/2023] Open
Abstract
BACKGROUND N7-methylguanosine (m7G) modification has been reported to regulate RNA expression in multiple pathophysiological processes. However, little is known about its role and association with immune microenvironment in heart failure (HF). RESULTS One hundred twenty-four HF patients and 135 nonfailing donors (NFDs) from six microarray datasets in the gene expression omnibus (GEO) database were included to evaluate the expression profiles of m7G regulators. Results revealed that 14 m7G regulators were differentially expressed in heart tissues from HF patients and NFDs. Furthermore, a five-gene m7G regulator diagnostic signature, NUDT16, NUDT4, CYFIP1, LARP1, and DCP2, which can easily distinguish HF patients and NFDs, was established by cross-combination of three machine learning methods, including best subset regression, regularization techniques, and random forest algorithm. The diagnostic value of five-gene m7G regulator signature was further validated in human samples through quantitative reverse-transcription polymerase chain reaction (qRT-PCR). In addition, consensus clustering algorithms were used to categorize HF patients into distinct molecular subtypes. We identified two distinct m7G subtypes of HF with unique m7G modification pattern, functional enrichment, and immune characteristics. Additionally, two gene subgroups based on m7G subtype-related genes were further discovered. Single-sample gene-set enrichment analysis (ssGSEA) was utilized to assess the alterations of immune microenvironment. Finally, utilizing protein-protein interaction network and weighted gene co-expression network analysis (WGCNA), we identified UQCRC1, NDUFB6, and NDUFA13 as m7G methylation-associated hub genes with significant clinical relevance to cardiac functions. CONCLUSIONS Our study discovered for the first time that m7G RNA modification and immune microenvironment are closely correlated in HF development. A five-gene m7G regulator diagnostic signature for HF (NUDT16, NUDT4, CYFIP1, LARP1, and DCP2) and three m7G methylation-associated hub genes (UQCRC1, NDUFB6, and NDUFA13) were identified, providing new insights into the underlying mechanisms and effective treatments of HF.
Collapse
Affiliation(s)
- Chaoqun Ma
- Cardiovascular Research Institute and Department of Cardiology, General Hospital of Northern Theater Command, Shenyang, 110000, Liaoning, China
| | - Dingyuan Tu
- Cardiovascular Research Institute and Department of Cardiology, General Hospital of Northern Theater Command, Shenyang, 110000, Liaoning, China
- Department of Cardiology, Changhai Hospital, Naval Medical University, 168 Changhai Rd, Shanghai, 200433, China
| | - Qiang Xu
- Department of Cardiology, Navy 905 Hospital, Naval Medical University, Shanghai, 200052, China
| | - Yan Wu
- Department of Cardiology, Navy 905 Hospital, Naval Medical University, Shanghai, 200052, China
| | - Xiaowei Song
- Department of Cardiology, Changhai Hospital, Naval Medical University, 168 Changhai Rd, Shanghai, 200433, China.
| | - Zhifu Guo
- Department of Cardiology, Changhai Hospital, Naval Medical University, 168 Changhai Rd, Shanghai, 200433, China.
| | - Xianxian Zhao
- Department of Cardiology, Changhai Hospital, Naval Medical University, 168 Changhai Rd, Shanghai, 200433, China.
| |
Collapse
|
16
|
Chang MJ, Hao JW, Qiao J, Chen MR, Wang Q, Wang Q, Zhang SX, Yu Q, He PF. A compendium of mucosal molecular characteristics provides novel perspectives on the treatment of ulcerative colitis. J Crohns Colitis 2023:6995436. [PMID: 36682023 DOI: 10.1093/ecco-jcc/jjad011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Indexed: 01/23/2023]
Abstract
BACKGROUND AND AIMS Ulcerative colitis [UC] is a complex heterogeneous disease. This study aims to reveal the underlying molecular features of UC using genome-scale transcriptomes of patients with UC and develop and validate a novel stratification scheme. METHODS A normalized compendium was created using colon tissue samples [455 patients with UC and 147 healthy controls [HCs]], covering genes from 10 microarray datasets. Up-regulated differentially expressed genes [DEGs] were subjected to functional network analysis, wherein samples were grouped using unsupervised clustering. Additionally, the robustness of subclustering was further assessed by two RNA sequencing datasets [100 patients with UC and 16 HCs]. Finally, the Xgboost classifier was applied to the independent datasets to evaluate the efficacy of different biologics in patients with UC. RESULTS Based on 267 up-regulated DEGs of the transcript profiles, UC patients were classified into three subtypes [subtype A-C] with distinct molecular and cellular signatures. Epithelial activation-related pathways were significantly enriched in subtype A [named epithelial proliferation], whereas subtype C was characterized as the immune activation subtype with prominent immune cells and proinflammatory signatures. Subtype B [named mixed] was modestly activated in all the signalling pathways. Notably, subtype A showed a stronger association with the superior response of biologics such as golimumab, infliximab, vedolizumab and ustekinumab compared to subtype C. CONCLUSIONS We conducted a deep stratification of mucosal tissue using the most comprehensive microarray and RNA sequencing data, providing critical insights into pathophysiological features of UC, which could serve as a template for stratified treatment approaches.
Collapse
Affiliation(s)
- Min-Jing Chang
- Shanxi Key Laboratory of Big Data for Clinical Decision, Shanxi Medical University, Taiyuan, China.,Ministry of Education, Key Laboratory of Cellular Physiology at Shanxi Medical University, Taiyuan, China.,School of Management, Shanxi Medical University, Taiyuan, China
| | - Jia-Wei Hao
- Ministry of Education, Key Laboratory of Cellular Physiology at Shanxi Medical University, Taiyuan, China
| | - Jun Qiao
- Ministry of Education, Key Laboratory of Cellular Physiology at Shanxi Medical University, Taiyuan, China.,Department of Rheumatology, Second Hospital of Shanxi Medical University, Taiyuan, China
| | - Miao-Ran Chen
- Ministry of Education, Key Laboratory of Cellular Physiology at Shanxi Medical University, Taiyuan, China
| | - Qian Wang
- Ministry of Education, Key Laboratory of Cellular Physiology at Shanxi Medical University, Taiyuan, China
| | - Qi Wang
- Shanxi Key Laboratory of Big Data for Clinical Decision, Shanxi Medical University, Taiyuan, China.,School of Basic Medical Sciences, Shanxi Medical University, Taiyuan, China
| | - Sheng-Xiao Zhang
- Ministry of Education, Key Laboratory of Cellular Physiology at Shanxi Medical University, Taiyuan, China.,Department of Rheumatology, Second Hospital of Shanxi Medical University, Taiyuan, China
| | - Qi Yu
- School of Management, Shanxi Medical University, Taiyuan, China
| | - Pei-Feng He
- Shanxi Key Laboratory of Big Data for Clinical Decision, Shanxi Medical University, Taiyuan, China.,Medical Data Sciences, Shanxi Medical University, China
| |
Collapse
|
17
|
Zhao W, Ma J, Liu Q, Song J, Tysklind M, Liu C, Wang D, Qu Y, Wu Y, Wu F. Comparison and application of SOFM, fuzzy c-means and k-means clustering algorithms for natural soil environment regionalization in China. Environ Res 2023; 216:114519. [PMID: 36252833 DOI: 10.1016/j.envres.2022.114519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 09/28/2022] [Accepted: 10/04/2022] [Indexed: 06/16/2023]
Abstract
Soil attributes and their environmental drivers exhibit different patterns in different geographical directions, along with distinct regional characteristics, which may have important effects on substance migration and transformation such as organic matter and soil elements or the environmental impacts of pollutants. Therefore, regional soil characteristics should be considered in the process of regionalization for environmental management. However, no comprehensive evaluation or systematic classification of the natural soil environment has been established for China. Here, we established an index system for natural soil environmental regionalization (NSER) by combining literature data obtained based on bibliometrics with the analytic hierarchy process (AHP). Based on the index system, we collected spatial distribution data for 14 indexes at the national scale. In addition, three clustering algorithms-self-organizing feature mapping (SOFM), fuzzy c-means (FCM) and k-means (KM)-were used to classify and define the natural soil environment. We imported four cluster validity indexes (CVI) to evaluate different models: Davies-Bouldin index (DB), Silhouette index (Sil) and Calinski-Harabasz index (CH) for FCM and KM, clustering quality index (CQI) for SOFM. Analysis and comparison of the results showed that when the number of clusters was 13, the FCM clustering algorithm achieved the optimal clustering results (DB = 1.16, Sil = 0.78, CH = 6.77 × 106), allowing the natural soil environment of China to be divided into 12 regions with distinct characteristics. Our study provides a set of comprehensive scientific research methods for regionalization research based on spatial data, it has important reference value for improving soil environmental management based on local conditions in China.
Collapse
Affiliation(s)
- Wenhao Zhao
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Jin Ma
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China.
| | - Qiyuan Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Jing Song
- State Key Laboratory of Soil Environment and Pollution Remediation, Institute of Soil Science, Chinese Academy of Sciences, Nanjing, 210008, China
| | - Mats Tysklind
- Department of Chemistry, Umeå University, Umeå, 90187, Sweden
| | - Chengshuai Liu
- State Key Laboratory of Environmental Geochemistry, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang, 550081, China
| | - Dong Wang
- Department of Chemistry, Umeå University, Umeå, 90187, Sweden
| | - Yajing Qu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Yihang Wu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Fengchang Wu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| |
Collapse
|
18
|
Moreno G, Ruiz-Botella M, Martín-Loeches I, Gómez Álvarez J, Jiménez Herrera M, Bodí M, Armestar F, Marques Parra A, Estella Á, Trefler S, Jorge García R, Murcia Paya J, Vidal Cortes P, Díaz E, Ferrer R, Albaya-Moreno A, Socias-Crespi L, Bonell Goytisolo J, Sancho Chinesta S, Loza A, Forcelledo Espina L, Pozo Laderas J, deAlba-Aparicio M, Sánchez Montori L, Vallverdú Perapoch I, Hidalgo V, Fraile Gutiérrez V, Casamitjana Ortega A, Martín Serrano F, Nieto M, Blasco Cortes M, Marín-Corral J, Solé-Violán J, Rodríguez A. A differential therapeutic consideration for use of corticosteroids according to established COVID-19 clinical phenotypes in critically ill patients. Med Intensiva 2023; 47:23-33. [PMID: 36272908 PMCID: PMC9579897 DOI: 10.1016/j.medine.2021.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 10/02/2021] [Indexed: 11/06/2022]
Abstract
OBJECTIVE To determine if the use of corticosteroids was associated with Intensive Care Unit (ICU) mortality among whole population and pre-specified clinical phenotypes. DESIGN A secondary analysis derived from multicenter, observational study. SETTING Critical Care Units. PATIENTS Adult critically ill patients with confirmed COVID-19 disease admitted to 63 ICUs in Spain. INTERVENTIONS Corticosteroids vs. no corticosteroids. MAIN VARIABLES OF INTEREST Three phenotypes were derived by non-supervised clustering analysis from whole population and classified as (A: severe, B: critical and C: life-threatening). We performed a multivariate analysis after propensity optimal full matching (PS) for whole population and weighted Cox regression (HR) and Fine-Gray analysis (sHR) to assess the impact of corticosteroids on ICU mortality according to the whole population and distinctive patient clinical phenotypes. RESULTS A total of 2017 patients were analyzed, 1171 (58%) with corticosteroids. After PS, corticosteroids were shown not to be associated with ICU mortality (OR: 1.0; 95% CI: 0.98-1.15). Corticosteroids were administered in 298/537 (55.5%) patients of "A" phenotype and their use was not associated with ICU mortality (HR=0.85 [0.55-1.33]). A total of 338/623 (54.2%) patients in "B" phenotype received corticosteroids. No effect of corticosteroids on ICU mortality was observed when HR was performed (0.72 [0.49-1.05]). Finally, 535/857 (62.4%) patients in "C" phenotype received corticosteroids. In this phenotype HR (0.75 [0.58-0.98]) and sHR (0.79 [0.63-0.98]) suggest a protective effect of corticosteroids on ICU mortality. CONCLUSION Our finding warns against the widespread use of corticosteroids in all critically ill patients with COVID-19 at moderate dose. Only patients with the highest inflammatory levels could benefit from steroid treatment.
Collapse
Affiliation(s)
- G. Moreno
- ICU, Hospital Universitario Joan XXIII/URV/IISPV, Tarragona, Spain
| | - M. Ruiz-Botella
- Tarragona Health Data Research Working Group (THeDaR) – ICU Hospital Universitario Joan XXIII, Tarragona, Spain
| | - I. Martín-Loeches
- Department of Intensive Care Medicine, Multidisciplinary Intensive Care Research Organization (MICRO), St. James's Hospital, Dublin, Ireland
| | - J. Gómez Álvarez
- Tarragona Health Data Research Working Group (THeDaR) – ICU Hospital Universitario Joan XXIII, Tarragona, Spain
| | | | - M. Bodí
- ICU, Hospital Universitario Joan XXIII/URV/IISPV, Tarragona, Spain,CIBERES/CIBERESUCICOVID
| | - F. Armestar
- ICU, Hospital Universitario German Trias i Pujol, Badalona, Spain
| | | | - Á. Estella
- ICU, Hospital Universitario de Jerez, Jerez de la Frontera, Spain
| | - S. Trefler
- ICU, Hospital Universitario Joan XXIII/URV/IISPV, Tarragona, Spain
| | | | | | - P. Vidal Cortes
- UCI, Complejo Hospitalario Universitario de Ourense, Orense, Spain
| | - E. Díaz
- UCI, Hospital Parc Taulí/UAB/CIBERES, Barcelona, Spain
| | - R. Ferrer
- UCI, Hospital Universitario Vall d’Hebron, Barcelona, Spain
| | | | - L. Socias-Crespi
- UCI, Hospital Universitario Son Llátzer, Palma de Mallorca, Spain
| | | | | | - A. Loza
- ICU, Hospital Universitario Nuestra Señora de Valme, Sevilla, Spain
| | - L. Forcelledo Espina
- ICU, Hospital Central de Asturias, Grupo de Investigación de Microbiología Traslacional del ISPA, Oviedo, Spain
| | | | | | | | | | - V. Hidalgo
- ICU, Hospital Complejo Asistencial de Segovia, Segovia, Spain
| | | | - A.M. Casamitjana Ortega
- UCI, Complejo Hospitalario Universitario Insular – Materno Infantil, Las Palmas de Gran Canaria, Spain
| | | | - M. Nieto
- UCI, Hospital Clínico San Carlos, Madrid, Spain
| | | | - J. Marín-Corral
- ICU, Hospital del Mar/GREPAC – IMIM, Barcelona, Spain,Division of Pulmonary Diseases & Critical Care Medicine, UTH San Antonio, San Antonio, TX, USA
| | - J. Solé-Violán
- ICU, Hospital Universitario Dr. Negrín, Las Palmas de Gran Canaria, Spain
| | - A. Rodríguez
- ICU, Hospital Universitario Joan XXIII/URV/IISPV, Tarragona, Spain,CIBERES/CIBERESUCICOVID,Corresponding author
| | | |
Collapse
|
19
|
Moreno G, Ruiz-Botella M, Martín-Loeches I, Gómez Álvarez J, Jiménez Herrera M, Bodí M, Armestar F, Marques Parra A, Estella Á, Trefler S, Jorge García R, Murcia Paya J, Vidal Cortes P, Díaz E, Ferrer R, Albaya-Moreno A, Socias-Crespi L, Bonell Goytisolo JM, Sancho Chinesta S, Loza A, Forcelledo Espina L, Pozo Laderas JC, deAlba-Aparicio M, Sánchez Montori L, Vallverdú Perapoch I, Hidalgo V, Fraile Gutiérrez V, Casamitjana Ortega AM, Martín Serrano F, Nieto M, Blasco Cortes M, Marín-Corral J, Solé-Violán J, Rodríguez A; on behalf COVID-19 SEMICYUC Working Group. A differential therapeutic consideration for use of corticosteroids according to established COVID-19 clinical phenotypes in critically ill patients. Med Intensiva 2023; 47:23-33. [PMID: 34720310 DOI: 10.1016/j.medin.2021.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 10/02/2021] [Indexed: 01/04/2023]
Abstract
Objective To determine if the use of corticosteroids was associated with Intensive Care Unit (ICU) mortality among whole population and pre-specified clinical phenotypes. Design A secondary analysis derived from multicenter, observational study. Setting Critical Care Units. Patients Adult critically ill patients with confirmed COVID-19 disease admitted to 63 ICUs in Spain. Interventions Corticosteroids vs. no corticosteroids. Main variables of interest Three phenotypes were derived by non-supervised clustering analysis from whole population and classified as (A: severe, B: critical and C: life-threatening). We performed a multivariate analysis after propensity optimal full matching (PS) for whole population and weighted Cox regression (HR) and Fine-Gray analysis (sHR) to assess the impact of corticosteroids on ICU mortality according to the whole population and distinctive patient clinical phenotypes. Results A total of 2017 patients were analyzed, 1171 (58%) with corticosteroids. After PS, corticosteroids were shown not to be associated with ICU mortality (OR: 1.0; 95% CI: 0.98-1.15). Corticosteroids were administered in 298/537 (55.5%) patients of "A" phenotype and their use was not associated with ICU mortality (HR = 0.85 [0.55-1.33]). A total of 338/623 (54.2%) patients in "B" phenotype received corticosteroids. No effect of corticosteroids on ICU mortality was observed when HR was performed (0.72 [0.49-1.05]). Finally, 535/857 (62.4%) patients in "C" phenotype received corticosteroids. In this phenotype HR (0.75 [0.58-0.98]) and sHR (0.79 [0.63-0.98]) suggest a protective effect of corticosteroids on ICU mortality. Conclusion Our finding warns against the widespread use of corticosteroids in all critically ill patients with COVID-19 at moderate dose. Only patients with the highest inflammatory levels could benefit from steroid treatment.
Collapse
|
20
|
Beccuti M, Calogero RA. Single-Cell RNAseq Clustering. Methods Mol Biol 2022; 2584:241-250. [PMID: 36495454 DOI: 10.1007/978-1-0716-2756-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) allows the creation of large collections of individual cells transcriptome. Unsupervised clustering is an essential element for the analysis of these data, and it represents the initial step for the identification of different cell types to investigate the cell subpopulation organization of a sample. In this chapter, we describe how to approach the clustering of single-cell RNAseq transcriptomics data using various clustering tools, and we provide some information on the limitations affecting the clustering procedure.
Collapse
Affiliation(s)
- Marco Beccuti
- Department of Computer Science, University of Torino, Turin, Italy.
| | | |
Collapse
|
21
|
Mrukwa G, Polanska J. DiviK: divisive intelligent K-means for hands-free unsupervised clustering in big biological data. BMC Bioinformatics 2022; 23:538. [PMID: 36503372 PMCID: PMC9743550 DOI: 10.1186/s12859-022-05093-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 12/01/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Investigating molecular heterogeneity provides insights into tumour origin and metabolomics. The increasing amount of data gathered makes manual analyses infeasible-therefore, automated unsupervised learning approaches are utilised for discovering tissue heterogeneity. However, automated analyses require experience setting the algorithms' hyperparameters and expert knowledge about the analysed biological processes. Moreover, feature engineering is needed to obtain valuable results because of the numerous features measured. RESULTS We propose DiviK: a scalable stepwise algorithm with local data-driven feature space adaptation for segmenting high-dimensional datasets. The algorithm is compared to the optional solutions (regular k-means, spatial and spectral approaches) combined with different feature engineering techniques (None, PCA, EXIMS, UMAP, Neural Ions). Three quality indices: Dice Index, Rand Index and EXIMS score, focusing on the overall composition of the clustering, coverage of the tumour region and spatial cluster consistency, are used to assess the quality of unsupervised analyses. Algorithms were validated on mass spectrometry imaging (MSI) datasets-2D human cancer tissue samples and 3D mouse kidney images. DiviK algorithm performed the best among the four clustering algorithms compared (overall quality score 1.24, 0.58 and 162 for d(0, 0, 0), d(1, 1, 1) and the sum of ranks, respectively), with spectral clustering being mostly second. Feature engineering techniques impact the overall clustering results less than the algorithms themselves (partial [Formula: see text] effect size: 0.141 versus 0.345, Kendall's concordance index: 0.424 versus 0.138 for d(0, 0, 0)). CONCLUSIONS DiviK could be the default choice in the exploration of MSI data. Thanks to its unique, GMM-based local optimisation of the feature space and deglomerative schema, DiviK results do not strongly depend on the feature engineering technique applied and can reveal the hidden structure in a tissue sample. Additionally, DiviK shows high scalability, and it can process at once the big omics data with more than 1.5 mln instances and a few thousand features. Finally, due to its simplicity, DiviK is easily generalisable to an even more flexible framework. Therefore, it is helpful for other -omics data (as single cell spatial transcriptomic) or tabular data in general (including medical images after appropriate embedding). A generic implementation is freely available under Apache 2.0 license at https://github.com/gmrukwa/divik .
Collapse
Affiliation(s)
- Grzegorz Mrukwa
- grid.6979.10000 0001 2335 3149Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland ,Netguru, Małe Garbary 9, 61-756 Poznań, Poland
| | - Joanna Polanska
- grid.6979.10000 0001 2335 3149Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| |
Collapse
|
22
|
Jiang Z, Li X, Guo L. Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases. Interdiscip Sci 2022; 14:795-803. [PMID: 35639335 DOI: 10.1007/s12539-022-00526-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
Metagenomics can directly extract the genetic material of all microorganisms from the environment, and obtain metagenomic samples with a large number of unknown DNA sequences. Binning of metagenomic contigs is a hot topic in metagenomics research. There are two key challenges for the current unsupervised metagenomic clustering algorithms. First, unsupervised metagenomic clustering methods rarely use reference databases, causing a certain waste of resources. Second, unsupervised metagenomic clustering methods are restricted by the characteristics of the sequences and the clustering algorithms, and the binning effect is limited. Therefore, a new binning method for metagenomic contigs using unsupervised clustering methods and reference databases is proposed to address these challenges, to make full use of the advantages of unsupervised clustering methods and reference databases constructed by scientists to improve the overall binning effect. This method uses the integrated SVM classification model to further bin the unsupervised clustering parts that do not perform well. Our proposed method was tested on simulated datasets and a real dataset and compared with other state-of-the-art metagenomic clustering methods including CONCOCT, Metabin2.0, Autometa, and MetaBAT. The results show that our method can achieve higher precision rate and improve the binning effect.
Collapse
Affiliation(s)
- Zhongjun Jiang
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| | - Xiaobo Li
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, 321004, China.
| | - Lijun Guo
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| |
Collapse
|
23
|
Chakraborty S, Mali K. SUFEMO: A superpixel based fuzzy image segmentation method for COVID-19 radiological image elucidation. Appl Soft Comput 2022; 129:109625. [PMID: 36124000 PMCID: PMC9474408 DOI: 10.1016/j.asoc.2022.109625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/15/2022] [Accepted: 09/05/2022] [Indexed: 11/27/2022]
Abstract
COVID-19 causes an ongoing worldwide pandemic situation. The non-discovery of specialized drugs and/or any other kind of medicines makes the situation worse. Early diagnosis of this disease will be certainly helpful to start the treatment early and also to bring down the dire spread of this highly infectious virus. This article describes the proposed novel unsupervised segmentation method to segment the radiological image samples of the chest area that are accumulated from the COVID-19 infected patients. The proposed approach is helpful for physicians, medical technologists, and other related experts in the quick and early diagnosis of COVID-19 infection. The proposed approach will be the SUFEMO (SUperpixel based Fuzzy Electromagnetism-like Optimization). This approach is developed depending on some well-known theories like the Electromagnetism-like optimization algorithm, the type-2 fuzzy logic, and the superpixels. The proposed approach brings down the processing burden that is required to deal with a considerably large amount of spatial information by assimilating the notion of the superpixel. In this work, the EMO approach is modified by utilizing the type 2 fuzzy framework. The EMO approach updates the cluster centers without using the cluster center updation equation. This approach is independent of the choice of the initial cluster centers. To decrease the related computational overhead of handling a lot of spatial data, a novel superpixel-based approach is proposed in which the noise-sensitiveness of the watershed-based superpixel formation approach is dealt with by computing the nearby minima from the gradient image. Also, to take advantage of the superpixels, the fuzzy objective function is modified. The proposed approach was evaluated using both qualitatively and quantitatively using 310 chest CT scan images that are gathered from various sources. Four standard cluster validity indices are taken into consideration to quantify the results. It is observed that the proposed approach gives better performance compared to some of the state-of-the-art approaches in terms of both qualitative and quantitative outcomes. On average, the proposed approach attains Davies-Bouldin index value of 1.812008792, Xie-Beni index value of 1.683281, Dunn index value 2.588595748, and β index value 3.142069236 for 5 clusters. Apart from this, the proposed approach is also found to be superior with regard to the rate of convergence. Rigorous experiments prove the effectiveness of the proposed approach and establish the real-life applicability of the proposed method for the initial filtering of the COVID-19 patients.
Collapse
Affiliation(s)
- Shouvik Chakraborty
- Department of Computer Science and Engineering, University of Kalyani, India
| | - Kalyani Mali
- Department of Computer Science and Engineering, University of Kalyani, India
| |
Collapse
|
24
|
Åkerlund CAI, Holst A, Stocchetti N, Steyerberg EW, Menon DK, Ercole A, Nelson DW. Clustering identifies endotypes of traumatic brain injury in an intensive care cohort: a CENTER-TBI study. Crit Care 2022; 26:228. [PMID: 35897070 PMCID: PMC9327174 DOI: 10.1186/s13054-022-04079-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 07/02/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While the Glasgow coma scale (GCS) is one of the strongest outcome predictors, the current classification of traumatic brain injury (TBI) as 'mild', 'moderate' or 'severe' based on this fails to capture enormous heterogeneity in pathophysiology and treatment response. We hypothesized that data-driven characterization of TBI could identify distinct endotypes and give mechanistic insights. METHODS We developed an unsupervised statistical clustering model based on a mixture of probabilistic graphs for presentation (< 24 h) demographic, clinical, physiological, laboratory and imaging data to identify subgroups of TBI patients admitted to the intensive care unit in the CENTER-TBI dataset (N = 1,728). A cluster similarity index was used for robust determination of optimal cluster number. Mutual information was used to quantify feature importance and for cluster interpretation. RESULTS Six stable endotypes were identified with distinct GCS and composite systemic metabolic stress profiles, distinguished by GCS, blood lactate, oxygen saturation, serum creatinine, glucose, base excess, pH, arterial partial pressure of carbon dioxide, and body temperature. Notably, a cluster with 'moderate' TBI (by traditional classification) and deranged metabolic profile, had a worse outcome than a cluster with 'severe' GCS and a normal metabolic profile. Addition of cluster labels significantly improved the prognostic precision of the IMPACT (International Mission for Prognosis and Analysis of Clinical trials in TBI) extended model, for prediction of both unfavourable outcome and mortality (both p < 0.001). CONCLUSIONS Six stable and clinically distinct TBI endotypes were identified by probabilistic unsupervised clustering. In addition to presenting neurology, a profile of biochemical derangement was found to be an important distinguishing feature that was both biologically plausible and associated with outcome. Our work motivates refining current TBI classifications with factors describing metabolic stress. Such data-driven clusters suggest TBI endotypes that merit investigation to identify bespoke treatment strategies to improve care. Trial registration The core study was registered with ClinicalTrials.gov, number NCT02210221 , registered on August 06, 2014, with Resource Identification Portal (RRID: SCR_015582).
Collapse
Affiliation(s)
- Cecilia A I Åkerlund
- Section of Perioperative Medicine and Intensive Care, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden. .,School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Anders Holst
- School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Nino Stocchetti
- Neuroscience Intensive Care Unit, Department of Pathophysiology and Transplants, Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico, University of Milan, Milan, Italy
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David K Menon
- Division of Anaesthesia, Department of Medicine, University of Cambridge, Cambridge, UK
| | - Ari Ercole
- Division of Anaesthesia, Department of Medicine, University of Cambridge, Cambridge, UK.,Centre for Artificial Intelligence in Medicine, University of Cambridge, Cambridge, UK
| | - David W Nelson
- Section of Perioperative Medicine and Intensive Care, Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | | |
Collapse
|
25
|
Chen C, Luo J, Wang X. Identification of prostate cancer subtypes based on immune signature scores in bulk and single-cell transcriptomes. Med Oncol 2022; 39:123. [PMID: 35716212 DOI: 10.1007/s12032-022-01719-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 03/09/2022] [Indexed: 10/18/2022]
Abstract
Prostate cancer (PC) is heterogeneous in the tumor immune microenvironment (TIME). Subtyping of PC based on the TIME could provide new insights into intratumor heterogeneity and its correlates of clinical features. Based on the enrichment scores of 28 immune cell types in the TIME, we performed unsupervised clustering to identify immune-specific subtypes of PC. The clustering analysis was performed in ten different bulk tumor transcriptomic datasets and in a single-cell RNA-Seq (scRNA-seq) dataset, respectively. We identified two PC subtypes: PC immunity high (PC-ImH) and PC immunity low (PC-ImL), consistently in these datasets. Compared to PC-ImL, PC-ImH displayed stronger immune signatures, worse clinical outcomes, higher epithelial-mesenchymal transition (EMT) signature, tumor stemness, intratumor heterogeneity (ITH) and genomic instability, and lower incidence of TMPRSS2-ERG fusion. Tumor mutation burden (TMB) showed no significant difference between PC-ImH and PC-ImL, while copy number alteration (CNA) was more significant in PC-ImL than in PC-ImH. PC-ImH could be further divided into two subgroups, which had significantly different immune infiltration levels and clinical features. In conclusion, "hot" PCs have stronger anti-tumor immune response, while worse clinical outcomes versus "cold" PCs. CNA instead of TMB plays a crucial role in the regulation of TIME in PC. TMPRSS2-ERG fusion correlates with decreased anti-tumor immune response while better disease-free survival in PC. The identification of immune-specific subtypes has potential clinical implications for PC immunotherapy.
Collapse
|
26
|
Polouliakh N, Hase T, Ghosh S, Kitano H. Toxicity Analysis of Pentachlorophenol Data with a Bioinformatics Tool Set. Methods Mol Biol 2022; 2486:105-125. [PMID: 35437721 DOI: 10.1007/978-1-0716-2265-0_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Rapid progress in technologies opened the new era of computer-leaded analytics, leaving humans more space for experimental design and decision making. Here we demonstrate the machine learning analysis workflow represented by spectral clustering, elucidation of evolutionary conserved transcription regulation, and network analysis using reverse engineering. Analysis of genes induced by the Pentachlorophenol toxic chemical revealed two subnetworks, one orchestrated by Interferon and another by Nuclear receptor factor 2 (NRF2) gene. Furthermore, network-inference based analysis identified a gene network module composed of genes associated with interferon signaling and their regulatory interaction with downstream genes, especially TRIM family proteins involved in responses of innate immune systems.
Collapse
Affiliation(s)
- Natalia Polouliakh
- Sony Computer Science Laboratories Inc., Tokyo, Japan. .,Department of Ophthalmology and Visual Science, Yokohama City University, Yokohama, Japan. .,Systems Biology Institute, Tokyo, Japan.
| | - Takeshi Hase
- Systems Biology Institute, Tokyo, Japan.,Tokyo Medical and Dental University, Tokyo, Japan.,Faculty of Pharmacy, Keio University, Tokyo, Japan
| | | | - Hiroaki Kitano
- Sony Computer Science Laboratories Inc., Tokyo, Japan.,Systems Biology Institute, Tokyo, Japan.,Faculty of Pharmacy, Keio University, Tokyo, Japan.,Okinawa Institute for Science and Technology Graduate School, Okinawa, Japan
| |
Collapse
|
27
|
Shi Y, Zhang L, Peterson CB, Do KA, Jenq RR. Performance determinants of unsupervised clustering methods for microbiome data. Microbiome 2022; 10:25. [PMID: 35120564 PMCID: PMC8817542 DOI: 10.1186/s40168-021-01199-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 11/15/2021] [Indexed: 05/04/2023]
Abstract
BACKGROUND In microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically compared beta diversity and clustering methods commonly used in microbiome analyses. We applied these to four published datasets where highly distinct microbiome profiles could be seen between sample groups, as well a clinical dataset with less clear separation between groups. RESULTS Although no single method outperformed the others consistently, we did identify the key scenarios where certain methods can underperform. Specifically, the Bray Curtis (BC) metric resulted in poor clustering in a dataset where high-abundance OTUs were relatively rare. In contrast, the unweighted UniFrac (UU) metric clustered poorly on dataset with a high prevalence of low-abundance OTUs. To explore these hypotheses about BC and UU, we systematically modified the properties of the poorly performing datasets and found that this approach resulted in improved BC and UU performance. Based on these observations, we rationally combined BC and UU to generate a novel metric. We tested its performance while varying the relative contributions of each metric and also compared it with another combined metric, the generalized UniFrac distance. The proposed metric showed high performance across all datasets. CONCLUSIONS Our systematic evaluation of clustering performance in these five datasets demonstrates that there is no existing clustering method that universally performs best across all datasets. We propose a combined metric of BC and UU that capitalizes on the complementary strengths of the two metrics. Video abstract.
Collapse
Affiliation(s)
- Yushu Shi
- Department of Statistics, The University of Missouri, Columbia, 209D Middlebush Hall, Columbia, 65201 MO USA
| | - Liangliang Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| | - Christine B. Peterson
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1400 Pressler St, 4th Floor, Houston, 77030 TX USA
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1400 Pressler St, 4th Floor, Houston, 77030 TX USA
| | - Robert R. Jenq
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, 1881 East Road, 3SCR5.4102, Unit 1954, Houston, 77054 TX USA
- Department of Stem Cell Transplantation and Cellular Therapy, CPRIT Scholar in Cancer Research, Texas, USA
| |
Collapse
|
28
|
Jiang Z, Li X, Guo L. MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity. BMC Bioinformatics 2022; 22:315. [PMID: 35045830 PMCID: PMC8772042 DOI: 10.1186/s12859-021-04227-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 01/02/2023] Open
Abstract
Background Metagenomics technology can directly extract microbial genetic material from the environmental samples to obtain their sequencing reads, which can be further assembled into contigs through assembly tools. Clustering methods of contigs are subsequently applied to recover complete genomes from environmental samples. The main problems with current clustering methods are that they cannot recover more high-quality genes from complex environments. Firstly, there are multiple strains under the same species, resulting in assembly of chimeras. Secondly, different strains under the same species are difficult to be classified. Thirdly, it is difficult to determine the number of strains during the clustering process. Results In view of the shortcomings of current clustering methods, we propose an unsupervised clustering method which can improve the ability to recover genes from complex environments and a new method for selecting the number of sample’s strains in clustering process. The sequence composition characteristics (tetranucleotide frequency) and co-abundance are combined to train the probability model for clustering. A new recursive method that can continuously reduce the complexity of the samples is proposed to improve the ability to recover genes from complex environments. The new clustering method was tested on both simulated and real metagenomic datasets, and compared with five state-of-the-art methods including CONCOCT, Maxbin2.0, MetaBAT, MyCC and COCACOLA. In terms of the number and quality of recovered genes from metagenomic datasets, the results show that our proposed method is more effective. Conclusions A new contigs clustering method is proposed, which can recover more high-quality genes from complex environmental samples.
Collapse
Affiliation(s)
- Zhongjun Jiang
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| | - Xiaobo Li
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, 321004, China. .,College of Engineering, Lishui University, Lishui, 323000, China.
| | - Lijun Guo
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| |
Collapse
|
29
|
Hong Y, Zhang L, Tian X, Xiang X, Yu Y, Zeng Z, Cao Y, Chen S, Sun A. Identification of immune subtypes of Ph-neg B-ALL with ferroptosis related genes and the potential implementation of Sorafenib. BMC Cancer 2021; 21:1331. [PMID: 34906116 PMCID: PMC8670244 DOI: 10.1186/s12885-021-09076-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 11/30/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The clinical outcome of Philadelphia chromosome-negative B cell acute lymphoblastic leukemia (Ph-neg B-ALL) varies considerably from one person to another after clinical treatment due to lack of targeted therapies and leukemia's heterogeneity. Ferroptosis is a recently discovered programmed cell death strongly correlated with cancers. Nevertheless, few related studies have reported its significance in acute lymphoblastic leukemia. METHODS Herein, we collected clinical data of 80 Ph-neg B-ALL patients diagnosed in our center and performed RNA-seq with their initial bone marrow fluid samples. Throughout unsupervised machine learning K-means clustering with 24 ferroptosis related genes (FRGs), the clustered patients were parted into three variant risk groups and were performed with bioinformatics analysis. RESULTS As a result, we discovered significant heterogeneity of both immune microenvironment and genomic variance. Furthermore, the immune check point inhibitors response and potential implementation of Sorafenib in Ph-neg B-ALL was also analyzed in our cohort. Lastly, one prognostic model based on 8 FRGs was developed to evaluate the risk of Ph-neg B-ALL patients. CONCLUSION Jointly, our study proved the crucial role of ferroptosis in Ph-neg B-ALL and Sorafenib is likely to improve the survival of high-risk Ph-neg B-ALL patients.
Collapse
Affiliation(s)
- Yang Hong
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Ling Zhang
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Xiaopeng Tian
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Xin Xiang
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Yan Yu
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Zhao Zeng
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Yaqing Cao
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Suning Chen
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China.,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Aining Sun
- Department of Hematology, The First Affiliated Hospital of Soochow University, Jiangsu Institute of Hematology, National Clinical Research Center for Hematologic Diseases, Suzhou, China. .,Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China.
| |
Collapse
|
30
|
Kim SK, Jung SM, Park KS, Kim KJ. Integrative analysis of lung molecular signatures reveals key drivers of idiopathic pulmonary fibrosis. BMC Pulm Med 2021; 21:404. [PMID: 34876074 PMCID: PMC8650281 DOI: 10.1186/s12890-021-01749-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
Background Idiopathic pulmonary fibrosis (IPF) is a devastating disease with a high clinical burden. The molecular signatures of IPF were analyzed to distinguish molecular subgroups and identify key driver genes and therapeutic targets. Methods Thirteen datasets of lung tissue transcriptomics including 585 IPF patients and 362 normal controls were obtained from the databases and subjected to filtration of differentially expressed genes (DEGs). A functional enrichment analysis, agglomerative hierarchical clustering, network-based key driver analysis, and diffusion scoring were performed, and the association of enriched pathways and clinical parameters was evaluated. Results A total of 2,967 upregulated DEGs was filtered during the comparison of gene expression profiles of lung tissues between IPF patients and healthy controls. The core molecular network of IPF featured p53 signaling pathway and cellular senescence. IPF patients were classified into two molecular subgroups (C1, C2) via unsupervised clustering. C1 was more enriched in the p53 signaling pathway and ciliated cells and presented a worse prognostic score, while C2 was more enriched for cellular senescence, profibrosing pathways, and alveolar epithelial cells. The p53 signaling pathway was closely correlated with a decline in forced vital capacity and carbon monoxide diffusion capacity and with the activation of cellular senescence. CDK1/2, CKDNA1A, CSNK1A1, HDAC1/2, FN1, VCAM1, and ITGA4 were the key regulators as evidence by high diffusion scores in the disease module. Currently available and investigational drugs showed differential diffusion scores in terms of their target molecules. Conclusions An integrative molecular analysis of IPF lungs identified two molecular subgroups with distinct pathobiological characteristics and clinical prognostic scores. Inhibition against CDKs or HDACs showed great promise for controlling lung fibrosis. This approach provided molecular insights to support the prediction of clinical outcomes and the selection of therapeutic targets in IPF patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-021-01749-3.
Collapse
Affiliation(s)
- Sung Kyoung Kim
- Division of Pulmonology, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Seung Min Jung
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Kyung-Su Park
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Ki-Jo Kim
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea. .,Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, The Catholic University of Korea, 93 Jungbu-daero, Paldal-gu, Suwon, Gyeonggi-do, 16247, Republic of Korea.
| |
Collapse
|
31
|
Testa D, Jourde-Chiche N, Mancini J, Varriale P, Radoszycki L, Chiche L. Unsupervised clustering analysis of data from an online community to identify lupus patient profiles with regards to treatment preferences. Lupus 2021; 30:1837-1843. [PMID: 34313509 DOI: 10.1177/09612033211033977] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Lupus is a chronic complex autoimmune disease. Non-adherence to treatment can affect patient outcomes. Considering patients' preferences into medical decisions may increase acceptance to their medication. The PREFERLUP study used unsupervised clustering analysis to identify profiles of patients with similar treatment preferences in an online community of French lupus patients. METHODS An online survey was conducted in adult lupus patients from the Carenity community between August 2018 and April 2019. Multiple Correspondence Analysis (MCA) was used with three unsupervised clustering methods (hierarchical, kmeans and partitioning around medoids). Several indicators (measure of connectivity, Dunn index and Silhouette width) were used to select the best clustering algorithm and choose the number of clusters. RESULTS The 268 participants were mostly female (96%), with a mean age of 44.3 years 83% fulfilled the American College of Rheumatology (ACR) self-reported diagnostic criteria for systemic lupus erythematosus. Overall, the preferred route of administration was oral (62%) and the most important feature of an ideal drug was a low risk of side-effects (32%). Hierarchical clustering identified three clusters. Cluster 1 (59%) comprised patients with few comorbidities and a poor ability to identify oncoming flares; 84% of these patients desired oral treatments with limited side-effects. Cluster 2 (13%) comprised younger patients, who had already participated in a clinical trial, were willing to use implants and valued the compatibility of treatments with pregnancy. Cluster 3 (28%) comprised patients with a longer lupus duration, poorer control of the disease and more comorbidities; these patients mainly valued implants and injections and expected a reduction of corticosteroid intake. CONCLUSIONS Different profiles of lupus patients were identified according to their drug preferences. These clusters could help physicians tailor their therapeutic proposals to take into account individual patient preferences, which could have a positive impact on treatment acceptance and then adherence. The study highlights the value of data acquired directly from patient communities.
Collapse
Affiliation(s)
| | - Noémie Jourde-Chiche
- Aix-Marseille Univ, C2VN, INSERM 1263, INRAE 1260, et AP-HM, Centre de Néphrologie et Transplantation Rénale, Hôpital de la Conception, Marseille, France
| | - Julien Mancini
- Aix-Marseille Univ, APHM, INSERM, IRD, SESSTIM, Public Health Department, 36900APHM, La Timone Hospital, BIOSTIC, Marseille, France
| | | | | | - Laurent Chiche
- Service de Médecine Interne, Hôpital Européen, Marseille, France
| |
Collapse
|
32
|
Holmberg-Thyden S, Grønbæk K, Gang AO, El Fassi D, Hadrup SR. A user's guide to multicolor flow cytometry panels for comprehensive immune profiling. Anal Biochem 2021; 627:114210. [PMID: 34033799 DOI: 10.1016/j.ab.2021.114210] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 04/13/2021] [Indexed: 12/12/2022]
Abstract
Multicolor flow cytometry is an essential tool for studying the immune system in health and disease, allowing users to extract longitudinal multiparametric data from patient samples. The process is complicated by substantial variation in performance between each flow cytometry instrument, and analytical errors are therefore common. Here, we present an approach to overcome such limitations by applying a systematic workflow for pairing colors to markers optimized for the equipment intended to run the experiments. The workflow is exemplified by the design of four comprehensive flow cytometry panels for patients with hematological cancer. Methods for quality control, titration of antibodies, compensation, and staining of cells for obtaining optimal results are also addressed. Finally, to handle the large amounts of data generated by multicolor flow cytometry, unsupervised clustering techniques are used to identify significant subpopulations not detected by conventional sequential gating.
Collapse
|
33
|
Chakraborty S, Mali K. A morphology-based radiological image segmentation approach for efficient screening of COVID-19. Biomed Signal Process Control 2021; 69:102800. [PMID: 34031636 PMCID: PMC8133384 DOI: 10.1016/j.bspc.2021.102800] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 05/09/2021] [Accepted: 05/15/2021] [Indexed: 12/22/2022]
Abstract
Computer-aided radiological image interpretation systems can be helpful to reshape the overall workflow of the COVID-19 diagnosis process. This article describes an unsupervised CT scan image segmentation approach. This approach begins by performing a morphological reconstruction operation that is useful to remove the effect of the external disturbances on the infected regions and to locate different regions of interest precisely. The optimal size of the structuring element is selected using the Edge Content-based contrast matrix approach. After performing the opening by using the morphological reconstruction operation, further noise is eliminated using the closing-based morphological reconstruction operation. The original pixel space is restored and the obtained image is divided into some non-overlapping smaller blocks and the mean intensity value for each block is computed that is used as the local threshold value for the binarization purpose. It is preferable to manually determine the range of the infected region. If a region is greater than the upper bound then that region will be considered as an exceptional region and processed separately. Three standard metrics MSE, PSNR, and SSIM are used to quantify the outcomes. Both quantitative and qualitative comparisons prove the efficiency and real-life adaptability of this approach. The proposed approach is evaluated with the help of 400 different images and on average, the proposed approach achieves MSE 307.1888625, PSNR 23.7246505, and SSIM 0.831718459. Moreover, the comparative study shows that the proposed approach outperforms some of the standard methods and obtained results are encouraging to support the battle against the COVID-19.
Collapse
Affiliation(s)
- Shouvik Chakraborty
- Department of Computer Science and Engineering, University of Kalyani, India
| | - Kalyani Mali
- Department of Computer Science and Engineering, University of Kalyani, India
| |
Collapse
|
34
|
Chen Z, Yang Z, Yuan X, Zhang X, Hao P. scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy. BMC Bioinformatics 2021; 22:211. [PMID: 33888056 PMCID: PMC8063398 DOI: 10.1186/s12859-021-04136-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 03/26/2021] [Indexed: 11/26/2022] Open
Abstract
Background Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation. Result In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as “sensitive genes”. To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools. Conclusion Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04136-1.
Collapse
Affiliation(s)
- Zechuan Chen
- College of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China
| | - Zeruo Yang
- Natural Medicine Institute of Zhejiang YangShengTang Co., Ltd., No. 181, Geyazhuang, Xihu District, Hangzhou, Zhejiang, China
| | - Xiaojun Yuan
- College of Life Sciences, Shanghai University, Shanghai, China
| | - Xiaoming Zhang
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China.
| | - Pei Hao
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
35
|
Corbi A, Burgos D. Connection between sleeping patterns and cognitive deterioration in women with Alzheimer's disease. Sleep Breath 2021. [PMID: 33792886 DOI: 10.1007/s11325-021-02327-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/01/2021] [Accepted: 02/12/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) causes symptoms such as dementia, memory loss, disorientation, and even aggressiveness, and is more common in women than in men. AD may also manifest itself in changes in sleep patterns. However, the relationship between AD (in all stages) and bedtime behavior has not been thoroughly investigated. METHODS In a prospective, cross-sectional survey, we evaluated 74 women categorized in two different stages of cognitive decline associated with AD (mild and severe) along with 37 women with no cognitive decline who served as controls. We obtained demographic and medical information such as age, health status, and medication, as well as psychiatrically confirmed staging of AD. We also collected actigraphy data for several nights in a row with a medical grade wristband using a 3-axis accelerometer and solid-state on-board memory. These data served as parameters for a clustering machine learning (ML) algorithm. RESULTS The ML process was able to unsupervisedly identify 85% of the participants according to their pre-assigned degree of dementia. When the clustering was carried out in a binary fashion (i.e., only taking into account healthy members vs. severely affected AD patients), it was possible to correctly classify 91% of the cases. CONCLUSIONS This study revealed a strong connection between the severity of the intellectual decline and the features distilled from actigraphically derived sleep parameters.
Collapse
|
36
|
Russo ET, Laio A, Punta M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics 2021; 22:121. [PMID: 33711918 PMCID: PMC7955657 DOI: 10.1186/s12859-021-04013-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 02/09/2021] [Indexed: 11/24/2022] Open
Abstract
Background The identification of protein families is of outstanding practical importance for in silico protein annotation and is at the basis of several bioinformatic resources. Pfam is possibly the most well known protein family database, built in many years of work by domain experts with extensive use of manual curation. This approach is generally very accurate, but it is quite time consuming and it may suffer from a bias generated from the hand-curation itself, which is often guided by the available experimental evidence. Results We introduce a procedure that aims to identify automatically putative protein families. The procedure is based on Density Peak Clustering and uses as input only local pairwise alignments between protein sequences. In the experiment we present here, we ran the algorithm on about 4000 full-length proteins with at least one domain classified by Pfam as belonging to the Pseudouridine synthase and Archaeosine transglycosylase (PUA) clan. We obtained 71 automatically-generated sequence clusters with at least 100 members. While our clusters were largely consistent with the Pfam classification, showing good overlap with either single or multi-domain Pfam family architectures, we also observed some inconsistencies. The latter were inspected using structural and sequence based evidence, which suggested that the automatic classification captured evolutionary signals reflecting non-trivial features of protein family architectures. Based on this analysis we identified a putative novel pre-PUA domain as well as alternative boundaries for a few PUA or PUA-associated families. As a first indication that our approach was unlikely to be clan-specific, we performed the same analysis on the P53 clan, obtaining comparable results. Conclusions The clustering procedure described in this work takes advantage of the information contained in a large set of pairwise alignments and successfully identifies a set of putative families and family architectures in an unsupervised manner. Comparison with the Pfam classification highlights significant overlap and points to interesting differences, suggesting that our new algorithm could have potential in applications related to automatic protein classification. Testing this hypothesis, however, will require further experiments on large and diverse sequence datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04013-x.
Collapse
Affiliation(s)
| | | | - Marco Punta
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, SM2 5NG, UK.,Center for Omics Sciences, IRCCS San Raffaele Hospital, 20132, Milan, Italy
| |
Collapse
|
37
|
Pothula KR, Geraets JA, Ferber II, Schröder GF. Clustering polymorphs of tau and IAPP fibrils with the CHEP algorithm. Prog Biophys Mol Biol 2021; 160:16-25. [PMID: 33556421 DOI: 10.1016/j.pbiomolbio.2020.11.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 11/16/2020] [Accepted: 11/24/2020] [Indexed: 01/03/2023]
Abstract
Recent steps towards automation have improved the quality and efficiency of the entire cryo-electron microscopy workflow, from sample preparation to image processing. Most of the image processing steps are now quite automated, but there are still a few steps which need the specific intervention of researchers. One such step is the identification and separation of helical protein polymorphs at early stages of image processing. Here, we tested and evaluated our recent clustering approach on three datasets containing amyloid fibrils, demonstrating that the proposed unsupervised clustering method automatically and effectively identifies the polymorphs from cryo-EM images. As an automated polymorph separation method, it has the potential to complement automated helical picking, which typically cannot easily distinguish between polymorphs with subtle differences in morphology, and is therefore a useful tool for the image processing and structure determination of helical proteins.
Collapse
Affiliation(s)
- Karunakar R Pothula
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - James A Geraets
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - Inda I Ferber
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - Gunnar F Schröder
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany; Physics Department, Heinrich-Heine-Universität Düsseldorf, 40225, Düsseldorf, Germany.
| |
Collapse
|
38
|
Zhang L, Zhang M, Chen X, He Y, Chen R, Zhang J, Huang J, Ouyang C, Shi G. Identification of the tubulointerstitial infiltrating immune cell landscape and immune marker related molecular patterns in lupus nephritis using bioinformatics analysis. Ann Transl Med 2021; 8:1596. [PMID: 33437795 PMCID: PMC7791250 DOI: 10.21037/atm-20-7507] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Background Systemic lupus erythematosus (SLE) is a multisystem autoimmune disease that commonly affects the kidneys. Research into markers that can predict the prognosis of tubulointerstitial lupus nephritis (LN) has been impeded by the lack of well-designed studies. Methods In this study, we selected and merged 3 sets of renal biopsy tubulointerstitial data from GSE32591, GSE69438, and GSE127797, including 95 LN and 15 living healthy donors. CIBERSORTx was utilized for differentially infiltrating immune cell (DIIC) analysis. Weighted Gene Co-Expression network analysis (WGCNA) was employed to explore differentially expressed gene (DEG) related modules. Combined WGCNA hub genes and protein-protein interaction (PPI) validation was used for immune marker identification. Lastly, unsupervised clustering was carried out to validate the correlation between these markers and clinical characteristics. Results Our findings unveiled TYROBP, C1QB, LAPTM5, CTSS, PTPRC as the 5 immune markers, which were negatively correlated with glomerular filtration rate (GFR). Specifically, the expression levels of TYROBP and C1QB were significantly different between proliferative LN (PLN) and membranous LN (MLN). Unsupervised clustering could aggregate LN by these immune marker expression spectrums. Conclusions This study is the first to identify infiltrating immune cells and associated molecular patterns in the tubulointerstitium of LN by utilizing bioinformatics methods. These findings contribute to a better understanding of the mechanisms behind LN, and promote more precise diagnosis.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Nephrology, The First Affiliated Hospital of Xiamen University, Xiamen, China.,The Fifth Hospital of Xiamen, Xiang'an Branch, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Mengqin Zhang
- Department of Rheumatology, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Xing Chen
- Department of Nephrology, The First Affiliated Hospital of Xiamen University, Xiamen, China.,The Fifth Hospital of Xiamen, Xiang'an Branch, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Yan He
- Department of Rheumatology, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Rongjuan Chen
- Department of Rheumatology, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Jun Zhang
- Department of Nephrology, The First Affiliated Hospital of Xiamen University, Xiamen, China.,The Fifth Hospital of Xiamen, Xiang'an Branch, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Jiyi Huang
- Department of Nephrology, The First Affiliated Hospital of Xiamen University, Xiamen, China.,The Fifth Hospital of Xiamen, Xiang'an Branch, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Chun Ouyang
- Department of Nephrology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Guixiu Shi
- Department of Rheumatology, The First Affiliated Hospital of Xiamen University, Xiamen, China
| |
Collapse
|
39
|
Abstract
K-mer based comparisons have emerged as powerful complements to BLAST-like alignment algorithms, particularly when the sequences being compared lack direct evolutionary relationships. In this chapter, we describe methods to compare k-mer content between groups of long noncoding RNAs (lncRNAs), to identify communities of lncRNAs with related k-mer contents, to identify the enrichment of protein-binding motifs in lncRNAs, and to scan for domains of related k-mer contents in lncRNAs. Our step-by-step instructions are complemented by Python code deposited in Github. Though our chapter focuses on lncRNAs, the methods we describe could be applied to any set of nucleic acid sequences.
Collapse
Affiliation(s)
- Jessime M Kirk
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Invitae Corporation, San Francisco, CA, USA
| | - Daniel Sprague
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Curriculum in Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Flagship Pioneering, Boston, MA, USA
| | - J Mauro Calabrese
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Curriculum in Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
40
|
Li J, Jiang W, Han H, Liu J, Liu B, Wang Y. ScGSLC: An unsupervised graph similarity learning framework for single-cell RNA-seq data clustering. Comput Biol Chem 2020; 90:107415. [PMID: 33307360 DOI: 10.1016/j.compbiolchem.2020.107415] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 09/30/2020] [Accepted: 10/06/2020] [Indexed: 01/18/2023]
Abstract
Accurate clustering of cells from single-cell RNA sequencing (scRNA-seq) data is an essential step for biological analysis such as putative cell type identification. However, scRNA-seq data has high dimension and high sparsity, which makes traditional clustering methods less effective to reflect the similarity between cells. Since genetic network fundamentally defines the functions of cell and deep learning shows strong advantages in network representation learning, we propose a novel scRNA-seq clustering framework ScGSLC based on graph similarity learning. ScGSLC effectively integrates scRNA-seq data and protein-protein interaction network to a graph. Then graph convolution network is employed by ScGSLC to embedding graph and clustering the cells by the calculated similarity between graphs. Unsupervised clustering results of nine public data sets demonstrate that ScGSLC shows better performance than the state-of-the-art methods.
Collapse
Affiliation(s)
- Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.
| | - Wei Jiang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Henry Han
- Department of Computer and Information Science, Fordham University, New York, NY 10023, USA; School of Computer Science, Qinghai Normal University, Xining 810008, China
| | - Jing Liu
- South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, Guangdong 510530, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China; Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| |
Collapse
|
41
|
Salmanpour MR, Shamsaei M, Saberi A, Hajianfar G, Soltanian-Zadeh H, Rahmim A. Robust identification of Parkinson's disease subtypes using radiomics and hybrid machine learning. Comput Biol Med 2021; 129:104142. [PMID: 33260101 DOI: 10.1016/j.compbiomed.2020.104142] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/20/2020] [Accepted: 11/21/2020] [Indexed: 12/21/2022]
Abstract
OBJECTIVES It is important to subdivide Parkinson's disease (PD) into subtypes, enabling potentially earlier disease recognition and tailored treatment strategies. We aimed to identify reproducible PD subtypes robust to variations in the number of patients and features. METHODS We applied multiple feature-reduction and cluster-analysis methods to cross-sectional and timeless data, extracted from longitudinal datasets (years 0, 1, 2 & 4; Parkinson's Progressive Marker Initiative; 885 PD/163 healthy-control visits; 35 datasets with combinations of non-imaging, conventional-imaging, and radiomics features from DAT-SPECT images). Hybrid machine-learning systems were constructed invoking 16 feature-reduction algorithms, 8 clustering algorithms, and 16 classifiers (C-index clustering evaluation used on each trajectory). We subsequently performed: i) identification of optimal subtypes, ii) multiple independent tests to assess reproducibility, iii) further confirmation by a statistical approach, iv) test of reproducibility to the size of the samples. RESULTS When using no radiomics features, the clusters were not robust to variations in features, whereas, utilizing radiomics information enabled consistent generation of clusters through ensemble analysis of trajectories. We arrived at 3 distinct subtypes, confirmed using the training and testing process of k-means, as well as Hotelling's T2 test. The 3 identified PD subtypes were 1) mild; 2) intermediate; and 3) severe, especially in terms of dopaminergic deficit (imaging), with some escalating motor and non-motor manifestations. CONCLUSION Appropriate hybrid systems and independent statistical tests enable robust identification of 3 distinct PD subtypes. This was assisted by utilizing radiomics features from SPECT images (segmented using MRI). The PD subtypes provided were robust to the number of the subjects, and features.
Collapse
|
42
|
Abstract
BACKGROUND Unsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. RESULTS We present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Users can efficiently evaluate a huge range of clustering results from multiple models and hyperparameters to identify an optimal model. CONCLUSIONS Hypercluster improves ease of use, robustness and reproducibility for unsupervised clustering application for high throughput biology. Hypercluster is available on pip and bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster .
Collapse
Affiliation(s)
- Lili Blumenberg
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Kelly V. Ruggles
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| |
Collapse
|
43
|
Chung NC, Choi H, Wang D, Mirza B, Pelletier AR, Sigdel D, Wang W, Ping P. Identifying temporal molecular signatures underlying cardiovascular diseases: A data science platform. J Mol Cell Cardiol 2020; 145:54-58. [PMID: 32504647 PMCID: PMC7583079 DOI: 10.1016/j.yjmcc.2020.05.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 05/18/2020] [Accepted: 05/31/2020] [Indexed: 11/26/2022]
Abstract
OBJECTIVE During cardiovascular disease progression, molecular systems of myocardium (e.g., a proteome) undergo diverse and distinct changes. Dynamic, temporally-regulated alterations of individual molecules underlie the collective response of the heart to pathological drivers and the ultimate development of pathogenesis. Advances in high-throughput omics technologies have enabled cost-effective, temporal profiling of targeted systems in animal models of human diseases. However, computational analysis of temporal patterns from omics data remains challenging. In particular, bioinformatic pipelines involving unsupervised statistical approaches to support cardiovascular investigations are lacking, which hinders one's ability to extract biomedical insights from these complex datasets. APPROACH AND RESULTS We developed a non-parametric data analysis platform to resolve computational challenges unique to temporal omics datasets. Our platform consists of three modules. Module I preprocesses the temporal data using either cubic splines or principal component analysis (PCA), and it simultaneously accomplishes the tasks on missing data imputation and denoising. Module II performs an unsupervised classification by K-means or hierarchical clustering. Module III evaluates and identifies biological entities (e.g., molecular events) that exhibit strong associations to specific temporal patterns. The jackstraw method for cluster membership has been applied to estimate p-values and posterior inclusion probabilities (PIPs), both of which guided feature selection. To demonstrate the utility of the analysis platform, we employed a temporal proteomics dataset that captured the proteome-wide dynamics of oxidative stress induced post-translational modifications (O-PTMs) in mouse hearts undergoing isoproterenol (ISO)-induced hypertrophy. CONCLUSION We have created a platform, CV.Signature.TCP, to identify distinct temporal clusters in omics datasets. We presented a cardiovascular use case to demonstrate its utility in unveiling biological insights underlying O-PTM regulations in cardiac remodeling. This platform is implemented in an open source R package (https://github.com/UCLA-BD2K/CV.Signature.TCP).
Collapse
Affiliation(s)
- Neo Christopher Chung
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California (UCLA), Los Angeles, USA; Departments of Physiology and Medicine (Cardiology) at UCLA School of Medicine, USA; Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics University of Warsaw, Warsaw, Poland.
| | - Howard Choi
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California (UCLA), Los Angeles, USA; Departments of Physiology and Medicine (Cardiology) at UCLA School of Medicine, USA; Bioinformatics and Medical Informatics at UCLA School of Engineering, Los Angeles, CA 90095, USA; Scalable Analytics Institute (ScAi) at UCLA School of Engineering, Los Angeles, CA 90095, USA
| | - Ding Wang
- Departments of Physiology and Medicine (Cardiology) at UCLA School of Medicine, USA
| | - Bilal Mirza
- Departments of Physiology and Medicine (Cardiology) at UCLA School of Medicine, USA
| | - Alexander R Pelletier
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California (UCLA), Los Angeles, USA; Bioinformatics and Medical Informatics at UCLA School of Engineering, Los Angeles, CA 90095, USA; Scalable Analytics Institute (ScAi) at UCLA School of Engineering, Los Angeles, CA 90095, USA
| | - Dibakar Sigdel
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California (UCLA), Los Angeles, USA; Departments of Physiology and Medicine (Cardiology) at UCLA School of Medicine, USA
| | - Wei Wang
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California (UCLA), Los Angeles, USA; Bioinformatics and Medical Informatics at UCLA School of Engineering, Los Angeles, CA 90095, USA; Scalable Analytics Institute (ScAi) at UCLA School of Engineering, Los Angeles, CA 90095, USA
| | - Peipei Ping
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California (UCLA), Los Angeles, USA; Departments of Physiology and Medicine (Cardiology) at UCLA School of Medicine, USA; Bioinformatics and Medical Informatics at UCLA School of Engineering, Los Angeles, CA 90095, USA; Scalable Analytics Institute (ScAi) at UCLA School of Engineering, Los Angeles, CA 90095, USA.
| |
Collapse
|
44
|
Sardaar S, Qi B, Dionne-Laporte A, Rouleau GA, Rabbany R, Trakadis YJ. Machine learning analysis of exome trios to contrast the genomic architecture of autism and schizophrenia. BMC Psychiatry 2020; 20:92. [PMID: 32111185 DOI: 10.1186/s12888-020-02503-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 02/17/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Machine learning (ML) algorithms and methods offer great tools to analyze large complex genomic datasets. Our goal was to compare the genomic architecture of schizophrenia (SCZ) and autism spectrum disorder (ASD) using ML. METHODS In this paper, we used regularized gradient boosted machines to analyze whole-exome sequencing (WES) data from individuals SCZ and ASD in order to identify important distinguishing genetic features. We further demonstrated a method of gene clustering to highlight which subsets of genes identified by the ML algorithm are mutated concurrently in affected individuals and are central to each disease (i.e., ASD vs. SCZ "hub" genes). RESULTS In summary, after correcting for population structure, we found that SCZ and ASD cases could be successfully separated based on genetic information, with 86-88% accuracy on the testing dataset. Through bioinformatic analysis, we explored if combinations of genes concurrently mutated in patients with the same condition ("hub" genes) belong to specific pathways. Several themes were found to be associated with ASD, including calcium ion transmembrane transport, immune system/inflammation, synapse organization, and retinoid metabolic process. Moreover, ion transmembrane transport, neurotransmitter transport, and microtubule/cytoskeleton processes were highlighted for SCZ. CONCLUSIONS Our manuscript introduces a novel comparative approach for studying the genetic architecture of genetically related diseases with complex inheritance and highlights genetic similarities and differences between ASD and SCZ.
Collapse
|
45
|
Kaku H, Ozturk M, Viswanathan A, Shahed J, Sheth SA, Kumar S, Ince NF. Unsupervised clustering reveals spatially varying single neuronal firing patterns in the subthalamic nucleus of patients with Parkinson's disease. Clin Park Relat Disord 2020; 3:100032. [PMID: 34316618 DOI: 10.1016/j.prdoa.2019.100032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 10/29/2019] [Accepted: 12/17/2019] [Indexed: 11/30/2022] Open
Abstract
Introduction Subthalamic nucleus (STN) is an effective target for deep brain stimulation (DBS) to reduce the motor symptoms of Parkinson's disease (PD). It is important to identify firing patterns within the structure for a better understanding of the electro-pathophysiology of the disease. Using recently established metrics, our study aims to autonomously identify the discharge patterns of individual cells and examine their spatial distribution within the STN. Methods We recorded single unit activity (SUA) from 12 awake PD patients undergoing a standard clinical DBS surgery. Three extracted features from raw SUA (local variation, bursting index and prominence of peak) were used with k-means clustering to achieve the aforementioned unsupervised grouping of firing patterns. Results 279 neurons were isolated and four distinct firing patterns were identified across patients: tonic (11%), irregular (55%), periodic (9%) and non-periodic bursts (25%). The mean firing rates for irregular discharges were significantly lower (p < 0.05) than the rest. Tonic firings were significantly ventral (p < 0.05) while periodic (p < 0.05) and non-periodic (p < 0.01) bursts were dorsal. The percentage of periodically bursting neurons in dorsal region and entire STN were significantly correlated with off state UPDRS tremor scores (r = 0.51, p = 0.04) and improvement in bradykinesia and rigidity (r = 0.57, p = 0.02) respectively. Conclusion Strengthening the application of unsupervised clustering for firing patterns of individual cells, this study shows a unique spatial affinity of tonic activity towards the ventral and bursting activity towards the dorsal region of STN in PD patients. This spatial preference, together with the correlation of clinical scores, can provide a clue towards understanding Parkinsonian symptom generation.
Collapse
|
46
|
Min HK, Moon SJ, Park KS, Kim KJ. Integrated systems analysis of salivary gland transcriptomics reveals key molecular networks in Sjögren's syndrome. Arthritis Res Ther 2019; 21:294. [PMID: 31856901 PMCID: PMC6921432 DOI: 10.1186/s13075-019-2082-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 12/04/2019] [Indexed: 02/08/2023] Open
Abstract
Background Treatment of patients with Sjögren’s syndrome (SjS) is a clinical challenge with high unmet needs. Gene expression profiling and integrative network-based approaches to complex disease can offer an insight on molecular characteristics in the context of clinical setting. Methods An integrated dataset was created from salivary gland samples of 30 SjS patients. Pathway-driven enrichment profiles made by gene set enrichment analysis were categorized using hierarchical clustering. Differentially expressed genes (DEGs) were subjected to functional network analysis, where the elements of the core subnetwork were used for key driver analysis. Results We identified 310 upregulated DEGs, including nine known genetic risk factors and two potential biomarkers. The core subnetwork was enriched with the processes associated with B cell hyperactivity. Pathway-based subgrouping revealed two clusters with distinct molecular signatures for the relevant pathways and cell subsets. Cluster 2, with low-grade inflammation, showed a better response to rituximab therapy than cluster 1, with high-grade inflammation. Fourteen key driver genes appeared to be essential signaling mediators downstream of the B cell receptor (BCR) signaling pathway and to have a positive relationship with histopathology scores. Conclusion Integrative network-based approaches provide deep insights into the modules and pathways causally related to SjS and allow identification of key targets for disease. Intervention adjusted to the molecular traits of the disease would allow the achievement of better outcomes, and the BCR signaling pathway and its leading players are promising therapeutic targets.
Collapse
Affiliation(s)
- Hong Ki Min
- Division of Rheumatology, Department of Internal Medicine, Konkuk University Medical Center, Seoul, Republic of Korea
| | - Su-Jin Moon
- Division of Rheumatology, Department of Internal Medicine, Uijeongbu St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Kyung-Su Park
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Ki-Jo Kim
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
| |
Collapse
|
47
|
Green MJ, Girshkin L, Kremerskothen K, Watkeys O, Quidé Y. A Systematic Review of Studies Reporting Data-Driven Cognitive Subtypes across the Psychosis Spectrum. Neuropsychol Rev 2020; 30:446-60. [PMID: 31853717 DOI: 10.1007/s11065-019-09422-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 12/02/2019] [Indexed: 10/25/2022]
Abstract
The delineation of cognitive subtypes of schizophrenia and bipolar disorder may offer a means of determining shared genetic markers and neuropathology among individuals with these conditions. We systematically reviewed the evidence from published studies reporting the use of data-driven (i.e., unsupervised) clustering methods to delineate cognitive subtypes among adults diagnosed with schizophrenia, schizoaffective disorder, or bipolar disorder. We reviewed 24 studies in total, contributing data to 13 analyses of schizophrenia spectrum patients, 8 analyses of bipolar disorder, and 5 analyses of mixed samples of schizophrenia and bipolar disorder participants. Studies of bipolar disorder most consistently revealed a 3-cluster solution, comprising a subgroup with 'near-normal' (cognitively spared) cognition and two other subgroups demonstrating graded deficits across cognitive domains. In contrast, there was no clear consensus regarding the number of cognitive subtypes among studies of cognitive subtypes in schizophrenia, while four of the five studies of mixed diagnostic groups reported a 4-cluster solution. Common to all cluster solutions was a severe cognitive deficit subtype with cognitive impairments of moderate to large effect size relative to healthy controls. Our review highlights several key factors (e.g., symptom profile, sample size, statistical procedures, and cognitive domains examined) that may influence the results of data-driven clustering methods, and which were largely inconsistent across the studies reviewed. This synthesis of findings suggests caution should be exercised when interpreting the utility of particular cognitive subtypes for biological investigation, and demonstrates much heterogeneity among studies using unsupervised clustering approaches to cognitive subtyping within and across the psychosis spectrum.
Collapse
|
48
|
Qian J, Comin M. MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage. BMC Bioinformatics 2019; 20:367. [PMID: 31757198 PMCID: PMC6873667 DOI: 10.1186/s12859-019-2904-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 05/15/2019] [Indexed: 11/30/2022] Open
Abstract
Motivation Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling and down-streaming functional analysis. Taxonomic analysis of microbial communities requires contig clustering, a process referred to as binning, that is still one of the most challenging tasks when analyzing metagenomic data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, sequencing errors, and the limitations due to binning contig of different lengths. Results In this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage. MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, also contigs of different length are clustered in two separate phases. The effectiveness of MetaCon is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, MaxBin and MetaBAT. Electronic supplementary material The online version of this article (10.1186/s12859-019-2904-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Qian
- Department of Information Engineering, University of Padova, Via Giovanni Gradenigo 6, Padova, Italy
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Via Giovanni Gradenigo 6, Padova, Italy.
| |
Collapse
|
49
|
Zhang Y, Poler SM, Li J, Abedi V, Pendergrass SA, Williams MS, Lee MTM. Dissecting genetic factors affecting phenylephrine infusion rates during anesthesia: a genome-wide association study employing EHR data. BMC Med 2019; 17:168. [PMID: 31455332 PMCID: PMC6712853 DOI: 10.1186/s12916-019-1405-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 08/07/2019] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The alpha-adrenergic agonist phenylephrine is often used to treat hypotension during anesthesia. In clinical situations, low blood pressure may require prompt intervention by intravenous bolus or infusion. Differences in responsiveness to phenylephrine treatment are commonly observed in clinical practice. Candidate gene studies indicate genetic variants may contribute to this variable response. METHODS Pharmacological and physiological data were retrospectively extracted from routine clinical anesthetic records. Response to phenylephrine boluses could not be reliably assessed, so infusion rates were used for analysis. Unsupervised k-means clustering was conducted on clean data containing 4130 patients based on phenylephrine infusion rate and blood pressure parameters, to identify potential phenotypic subtypes. Genome-wide association studies (GWAS) were performed against average infusion rates in two cohorts: phase I (n = 1205) and phase II (n = 329). Top genetic variants identified from the meta-analysis were further examined to see if they could differentiate subgroups identified by k-means clustering. RESULTS Three subgroups of patients with different response to phenylephrine were clustered and characterized: resistant (high infusion rate yet low mean systolic blood pressure (SBP)), intermediate (low infusion rate and low SBP), and sensitive (low infusion rate with high SBP). Differences among clusters were tabulated to assess for possible confounding influences. Comorbidity hierarchical clustering showed the resistant group had a higher prevalence of confounding factors than the intermediate and sensitive groups although overall prevalence is below 6%. Three loci with P < 1 × 10-6 were associated with phenylephrine infusion rate. Only rs11572377 with P = 6.09 × 10-7, a 3'UTR variant of EDN2, encoding a secretory vasoconstricting peptide, could significantly differentiate resistant from sensitive groups (P = 0.015 and 0.018 for phase I and phase II) or resistant from pooled sensitive and intermediate groups (P = 0.047 and 0.018). CONCLUSIONS Retrospective analysis of electronic anesthetic records data coupled with the genetic data identified genetic variants contributing to variable sensitivity to phenylephrine infusion during anesthesia. Although the identified top gene, EDN2, has robust biological relevance to vasoconstriction by binding to endothelin type A (ETA) receptors on arterial smooth muscle cells, further functional as well as replication studies are necessary to confirm this association.
Collapse
Affiliation(s)
- Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, PA, 17822, USA
| | - S Mark Poler
- Department of Anesthesiology, Geisinger, Danville, PA, 17822, USA
| | - Jiang Li
- Biomedical Translational Informatics Institute, Geisinger, Danville, PA, 17822, USA
| | - Vida Abedi
- Biomedical Translational Informatics Institute, Geisinger, Danville, PA, 17822, USA
| | - Sarah A Pendergrass
- Biomedical Translational Informatics Institute, Geisinger, Bethesda, MD, USA
| | - Marc S Williams
- Genomic Medicine Institute, Geisinger, Danville, PA, 17822, USA
| | - Ming Ta Michael Lee
- Genomic Medicine Institute, Geisinger, Danville, PA, 17822, USA. .,Lab 218, Weis Center for Research, Geisinger, 100 North Academy Ave, Danville, 17822-2620, PA, USA.
| |
Collapse
|
50
|
Estiri H, Klann JG, Murphy SN. A clustering approach for detecting implausible observation values in electronic health records data. BMC Med Inform Decis Mak 2019; 19:142. [PMID: 31337390 DOI: 10.1186/s12911-019-0852-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 06/26/2019] [Indexed: 12/03/2022] Open
Abstract
Background Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs. Methods The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance. Results We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases. Conclusion Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data. Electronic supplementary material The online version of this article (10.1186/s12911-019-0852-6) contains supplementary material, which is available to authorized users.
Collapse
|