1
|
Woodman RJ, Bryant K, Sorich MJ, Thompson CH, Russell P, Pilotto A, Mangoni AA. Phenotyping to predict 12-month health outcomes of older general medicine patients. Aging Clin Exp Res 2025; 37:42. [PMID: 39985621 PMCID: PMC11846751 DOI: 10.1007/s40520-024-02924-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Accepted: 12/30/2024] [Indexed: 02/24/2025]
Abstract
BACKGROUND A variety of unsupervised learning algorithms have been used to phenotype older patients, enabling directed care and personalised treatment plans. However, the ability of the clusters to accurately discriminate for the risk of older patients, may vary depending on the methods employed. AIMS To compare seven clustering algorithms in their ability to develop patient phenotypes that accurately predict health outcomes. METHODS Data was collected for N = 737 older medical inpatients during their hospital stay for five different types of medical data (ICD-10 codes, ATC drug codes, laboratory, clinic and frailty data). We trialled five unsupervised learning algorithms (K-means, K-modes, hierarchical clustering, latent class analysis (LCA), and DBSCAN) and two graph-based approaches to create separate clusters for each method and datatype. These were used as input for a random forest classifier to predict eleven health outcomes: mortality at one, three, six and 12 months, in-hospital falls and delirium, length-of-stay, outpatient visits, and readmissions at one, three and six months. RESULTS The overall median area-under-the-curve (AUC) across the eleven outcomes for the seven methods were (from highest to lowest) 0.758 (hierarchical), 0.739 (K-means), 0.722 (KG-Louvain), 0.704 (KNN-Louvain), 0.698 (LCA), 0.694 (DBSCAN) and 0.656 (K-modes). Overall, frailty data was most important data type for predicting mortality, ICD-10 disease codes for predicting readmissions, and laboratory data the most important for predicting falls. CONCLUSIONS Clusters created using hierarchical, K-means and Louvain community detection algorithms identified well-separated patient phenotypes that were consistently associated with age-related adverse health outcomes. Frailty data was the most valuable data type for predicting most health outcomes.
Collapse
Affiliation(s)
- Richard John Woodman
- Discipline of Biostatistics, College of Medicine and Public Health, Flinders University, Adelaide, Australia.
| | - Kimberly Bryant
- College of Medicine and Public Health, Flinders University and Flinders Medical Centre, Adelaide, Australia
| | - Michael J Sorich
- Discipline of Clinical Pharmacology, College of Medicine and Public Health, Flinders University, Adelaide, Australia
| | - Campbell H Thompson
- General Medicine, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
| | - Patrick Russell
- Internal Medicine, Royal Adelaide Hospital, Adelaide, Australia
| | - Alberto Pilotto
- Department of Interdisciplinary Medicine, University of Bari, Bari, Italy
- Department of Geriatric Care, Neurology and Rehabilitation, Galliera Hospitals, Genova, Italy
| | - Aleksander A Mangoni
- Discipline of Clinical Pharmacology, College of Medicine and Public Health, Flinders University, Adelaide, Australia
- Department of Clinical Pharmacology, Southern Adelaide Local Health Network, Adelaide, Australia
| |
Collapse
|
2
|
Nelson A, Somerville P, Patel S, Matta J. The Etiology of Autism Spectrum Disorder and Gender Dysphoria. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039933 DOI: 10.1109/embc53108.2024.10781615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
This paper investigates the genetic correlations between Autism Spectrum Disorder (ASD) and Gender Dysphoria (GD) using network science techniques applied to data from the National Institute of Health's All of Us research program. Despite extensive research on the genetic etiology of ASD and the phenotypic overlaps between ASD and GD, a genetic component linking the two has not been explored thoroughly. Our study addresses this gap by integrating phenotypic data and genetic variations, specifically single nucleotide polymorphisms (SNPs), to construct a network graph that reveals potential genetic intersections between these conditions. We identify a single gene linking ASD and GD, indicating a potential genetic overlap. This finding is significant as the gene has been associated with several psychiatric conditions, including ASD.Clinical relevance-The study's methodology extends existing research by combining phenotypic and genetic data to analyze the comorbidity of two different conditions. Our results not only provide insights into the genetic correlations between ASD and GD but also demonstrate the utility of network science in medical research. The approach used here could be generalized to other conditions, offering a new way to understand genetic relationships in neurodevelopmental and psychiatric disorders.
Collapse
|
3
|
Prince N, Chu SH, Chen Y, Mendez KM, Hanson E, Green-Snyder L, Brooks E, Korrick S, Lasky-Su JA, Kelly RS. Phenotypically driven subgroups of ASD display distinct metabolomic profiles. Brain Behav Immun 2023; 111:21-29. [PMID: 37004757 PMCID: PMC11099628 DOI: 10.1016/j.bbi.2023.03.026] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/08/2023] [Accepted: 03/28/2023] [Indexed: 04/04/2023] Open
Abstract
Autism Spectrum Disorder (ASD) is a heterogeneous condition that includes a broad range of characteristics and associated comorbidities; however, the biology underlying the variability in phenotypes is not well understood. As ASD impacts approximately 1 in 100 children globally, there is an urgent need to better understand the biological mechanisms that contribute to features of ASD. In this study, we leveraged rich phenotypic and diagnostic information related to ASD in 2001 individuals aged 4 to 17 years from the Simons Simplex Collection to derive phenotypically driven subgroups and investigate their respective metabolomes. We performed hierarchical clustering on 40 phenotypes spanning four ASD clinical domains, resulting in three subgroups with distinct phenotype patterns. Using global plasma metabolomic profiling generated by ultrahigh-performance liquid chromatography mass spectrometry, we characterized the metabolome of individuals in each subgroup to interrogate underlying biology related to the subgroups. Subgroup 1 included children with the least maladaptive behavioral traits (N = 862); global decreases in lipid metabolites and concomitant increases in amino acid and nucleotide pathways were observed for children in this subgroup. Subgroup 2 included children with the highest degree of challenges across all phenotype domains (N = 631), and their metabolome profiles demonstrated aberrant metabolism of membrane lipids and increases in lipid oxidation products. Subgroup 3 included children with maladaptive behaviors and co-occurring conditions that showed the highest IQ scores (N = 508); these individuals had increases in sphingolipid metabolites and fatty acid byproducts. Overall, these findings indicated distinct metabolic patterns within ASD subgroups, which may reflect the biological mechanisms giving rise to specific patterns of ASD characteristics. Our results may have important clinical applications relevant to personalized medicine approaches towards managing ASD symptoms.
Collapse
Affiliation(s)
- Nicole Prince
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Su H Chu
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Yulu Chen
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Kevin M Mendez
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ellen Hanson
- Divisions of Neurology and Developmental Medicine, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | - Susan Korrick
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jessica A Lasky-Su
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Rachel S Kelly
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Matta J, Singh V, Auten T, Sanjel P. Inferred networks, machine learning, and health data. PLoS One 2023; 18:e0280910. [PMID: 36689443 PMCID: PMC9870174 DOI: 10.1371/journal.pone.0280910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 01/09/2023] [Indexed: 01/24/2023] Open
Abstract
This paper presents a network science approach to investigate a health information dataset, the Sexual Acquisition and Transmission of HIV Cooperative Agreement Program (SATHCAP), to uncover hidden relationships that can be used to suggest targeted health interventions. From the data, four key target variables are chosen: HIV status, injecting drug use, homelessness, and insurance status. These target variables are converted to a graph format using four separate graph inference techniques: graphical lasso, Meinshausen Bühlmann (MB), k-Nearest Neighbors (kNN), and correlation thresholding (CT). The graphs are then clustered using four clustering methods: Louvain, Leiden, and NBR-Clust with VAT and integrity. Promising clusters are chosen using internal evaluation measures and are visualized and analyzed to identify marker attributes and key relationships. The kNN and CT inference methods are shown to give useful results when combined with NBR-Clust clustering. Examples of cluster analysis indicate that the methodology produces results that will be relevant to the public health community.
Collapse
Affiliation(s)
- John Matta
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, Illinois, United States of America
| | - Virender Singh
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, Illinois, United States of America
| | - Trevor Auten
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, Illinois, United States of America
| | - Prashant Sanjel
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, Illinois, United States of America
| |
Collapse
|
5
|
Howlett-Prieto Q, Oommen C, Carrithers MD, Wunsch DC, Hier DB. Subtypes of relapsing-remitting multiple sclerosis identified by network analysis. Front Digit Health 2023; 4:1063264. [PMID: 36714613 PMCID: PMC9874946 DOI: 10.3389/fdgth.2022.1063264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
We used network analysis to identify subtypes of relapsing-remitting multiple sclerosis subjects based on their cumulative signs and symptoms. The electronic medical records of 113 subjects with relapsing-remitting multiple sclerosis were reviewed, signs and symptoms were mapped to classes in a neuro-ontology, and classes were collapsed into sixteen superclasses by subsumption. After normalization and vectorization of the data, bipartite (subject-feature) and unipartite (subject-subject) network graphs were created using NetworkX and visualized in Gephi. Degree and weighted degree were calculated for each node. Graphs were partitioned into communities using the modularity score. Feature maps visualized differences in features by community. Network analysis of the unipartite graph yielded a higher modularity score (0.49) than the bipartite graph (0.25). The bipartite network was partitioned into five communities which were named fatigue, behavioral, hypertonia/weakness, abnormal gait/sphincter, and sensory, based on feature characteristics. The unipartite network was partitioned into five communities which were named fatigue, pain, cognitive, sensory, and gait/weakness/hypertonia based on features. Although we did not identify pure subtypes (e.g., pure motor, pure sensory, etc.) in this cohort of multiple sclerosis subjects, we demonstrated that network analysis could partition these subjects into different subtype communities. Larger datasets and additional partitioning algorithms are needed to confirm these findings and elucidate their significance. This study contributes to the literature investigating subtypes of multiple sclerosis by combining feature reduction by subsumption with network analysis.
Collapse
Affiliation(s)
- Quentin Howlett-Prieto
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Chelsea Oommen
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Michael D. Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Donald C. Wunsch
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States
| | - Daniel B. Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States
| |
Collapse
|
6
|
Matta J, Dobrino D, Yeboah D, Howard S, EL-Manzalawy Y, Obafemi-Ajayi T. Connecting phenotype to genotype: PheWAS-inspired analysis of autism spectrum disorder. Front Hum Neurosci 2022; 16:960991. [PMID: 36310845 PMCID: PMC9605200 DOI: 10.3389/fnhum.2022.960991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/14/2022] [Indexed: 04/13/2024] Open
Abstract
Autism Spectrum Disorder (ASD) is extremely heterogeneous clinically and genetically. There is a pressing need for a better understanding of the heterogeneity of ASD based on scientifically rigorous approaches centered on systematic evaluation of the clinical and research utility of both phenotype and genotype markers. This paper presents a holistic PheWAS-inspired method to identify meaningful associations between ASD phenotypes and genotypes. We generate two types of phenotype-phenotype (p-p) graphs: a direct graph that utilizes only phenotype data, and an indirect graph that incorporates genotype as well as phenotype data. We introduce a novel methodology for fusing the direct and indirect p-p networks in which the genotype data is incorporated into the phenotype data in varying degrees. The hypothesis is that the heterogeneity of ASD can be distinguished by clustering the p-p graph. The obtained graphs are clustered using network-oriented clustering techniques, and results are evaluated. The most promising clusterings are subsequently analyzed for biological and domain-based relevance. Clusters obtained delineated different aspects of ASD, including differentiating ASD-specific symptoms, cognitive, adaptive, language and communication functions, and behavioral problems. Some of the important genes associated with the clusters have previous known associations to ASD. We found that clusters based on integrated genetic and phenotype data were more effective at identifying relevant genes than clusters constructed from phenotype information alone. These genes included five with suggestive evidence of ASD association and one known to be a strong candidate.
Collapse
Affiliation(s)
- John Matta
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Daniel Dobrino
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Dacosta Yeboah
- Department of Computer Science, Missouri State University, Springfield, MO, United States
| | - Swade Howard
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Yasser EL-Manzalawy
- Department of Translational Data Science and Informatics, Geisinger, Danville, PA, United States
| | - Tayo Obafemi-Ajayi
- Engineering Program, Missouri State University, Springfield, MO, United States
| |
Collapse
|
7
|
Applications of Unsupervised Machine Learning in Autism Spectrum Disorder Research: a Review. REVIEW JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS 2022. [DOI: 10.1007/s40489-021-00299-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractLarge amounts of autism spectrum disorder (ASD) data is created through hospitals, therapy centers, and mobile applications; however, much of this rich data does not have pre-existing classes or labels. Large amounts of data—both genetic and behavioral—that are collected as part of scientific studies or a part of treatment can provide a deeper, more nuanced insight into both diagnosis and treatment of ASD. This paper reviews 43 papers using unsupervised machine learning in ASD, including k-means clustering, hierarchical clustering, model-based clustering, and self-organizing maps. The aim of this review is to provide a survey of the current uses of unsupervised machine learning in ASD research and provide insight into the types of questions being answered with these methods.
Collapse
|
8
|
Matta J, Dobrino D, Howard S, Yeboah D, Kopel J, El-Manzalawy Y, Obafemi-Ajayi T. A PheWAS Model of Autism Spectrum Disorder. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:2110-2114. [PMID: 34891705 DOI: 10.1109/embc46164.2021.9629533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Children with Autism Spectrum Disorder (ASD) exhibit a wide diversity in type, number, and severity of social deficits as well as communicative and cognitive difficulties. It is a challenge to categorize the phenotypes of a particular ASD patient with their unique genetic variants. There is a need for a better understanding of the connections between genotype information and the phenotypes to sort out the heterogeneity of ASD. In this study, single nucleotide polymorphism (SNP) and phenotype data obtained from a simplex ASD sample are combined using a PheWAS-inspired approach to construct a phenotype-phenotype network. The network is clustered, yielding groups of etiologically related phenotypes. These clusters are analyzed to identify relevant genes associated with each set of phenotypes. The results identified multiple discriminant SNPs associated with varied phenotype clusters such as ASD aberrant behavior (self-injury, compulsiveness and hyperactivity), as well as IQ and language skills. Overall, these SNPs were linked to 22 significant genes. An extensive literature search revealed that eight of these are known to have strong evidence of association with ASD. The others have been linked to related disorders such as mental conditions, cognition, and social functioning.Clinical relevance- This study further informs on connections between certain groups of ASD phenotypes and their unique genetic variants. Such insight regarding the heterogeneity of ASD would support clinicians to advance more tailored interventions and improve outcomes for ASD patients.
Collapse
|
9
|
Agelink van Rentergem JA, Deserno MK, Geurts HM. Validation strategies for subtypes in psychiatry: A systematic review of research on autism spectrum disorder. Clin Psychol Rev 2021; 87:102033. [PMID: 33962352 DOI: 10.1016/j.cpr.2021.102033] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 02/14/2021] [Accepted: 04/14/2021] [Indexed: 12/11/2022]
Abstract
Heterogeneity within autism spectrum disorder (ASD) is recognized as a challenge to both biological and psychological research, as well as clinical practice. To reduce unexplained heterogeneity, subtyping techniques are often used to establish more homogeneous subtypes based on metrics of similarity and dissimilarity between people. We review the ASD literature to create a systematic overview of the subtyping procedures and subtype validation techniques that are used in this field. We conducted a systematic review of 156 articles (2001-June 2020) that subtyped participants (range N of studies = 17-20,658), of which some or all had an ASD diagnosis. We found a large diversity in (parametric and non-parametric) methods and (biological, psychological, demographic) variables used to establish subtypes. The majority of studies validated their subtype results using variables that were measured concurrently, but were not included in the subtyping procedure. Other investigations into subtypes' validity were rarer. In order to advance clinical research and the theoretical and clinical usefulness of identified subtypes, we propose a structured approach and present the SUbtyping VAlidation Checklist (SUVAC), a checklist for validating subtyping results.
Collapse
Affiliation(s)
- Joost A Agelink van Rentergem
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands; Dutch Autism & ADHD Research Center, the Netherlands.
| | - Marie K Deserno
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands; Dutch Autism & ADHD Research Center, the Netherlands
| | - Hilde M Geurts
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands; Dutch Autism & ADHD Research Center, the Netherlands; Dr. Leo Kannerhuis, the Netherlands
| |
Collapse
|
10
|
Kramer J, Boone L, Clifford T, Bruce J, Matta J. Analysis of Medical Data Using Community Detection on Inferred Networks. IEEE J Biomed Health Inform 2020; 24:3136-3143. [PMID: 32749973 DOI: 10.1109/jbhi.2020.3003827] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Performing network-based analysis on medical and biological data makes a wide variety of machine learning tools available. Clustering, which can be used for classification, presents opportunities for identifying hard-to-reach groups for the development of customized health interventions. Due to a desire to convert abundant DNA gene co-expression data into networks, many graph inference methods have been developed. Likewise there are many clustering and classification tools. This paper presents a comparison of techniques for graph inference and clustering, using different numbers of features, in order to select the best tuple of graph inference method, clustering method, and number of features according to a particular phenotype. An extensive machine learning based analysis of the REGARDS dataset is conducted, evaluating the CoNet and K-Nearest Neighbors (KNN) network inference methods, along with the Louvain, Leiden and NBR-Clust clustering techniques. Results from analysis involving five internal cluster evaluation indices show the traditional KNN inference method and NBR-Clust and Louvain clustering produce the most promising clusters with medical phenotype data. It is also shown that visualization can aid in interpreting the clusters, and that the clusters produced can identify meaningful groups indicating customized interventions.
Collapse
|
11
|
Nakashima S, Nacher JC, Song J, Akutsu T. An Overview of Bioinformatics Methods for Analyzing Autism Spectrum Disorders. Curr Pharm Des 2020; 25:4552-4559. [PMID: 31713477 DOI: 10.2174/1381612825666191111154837] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 11/07/2019] [Indexed: 02/06/2023]
Abstract
Autism Spectrum Disorders (ASD) are a group of neurodevelopmental disorders and are well recognized to be biologically heterogeneous in which various factors are associated, including genetic, metabolic, and environmental ones. Despite its high prevalence, only a few drugs have been approved for the treatment of ASD. Therefore, extensive studies have been conducted to identify ASD risk genes and novel drug targets. Since many genes and many other factors are associated with ASD, various bioinformatics methods have also been developed for the analysis of ASD. In this paper, we review bioinformatics methods for analyzing ASD data with the focus on computational aspects. We classify existing methods into two categories: (i) methods based on genomic variants and gene expression data, and (ii) methods using biological networks, which include gene co-expression networks and protein-protein interaction networks. Next, for each method, we provide an overall flow and elaborate on the computational techniques used. We also briefly review other approaches and discuss possible future directions and strategies for developing bioinformatics approaches to analyze ASD.
Collapse
Affiliation(s)
- Shogo Nakashima
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Jose C Nacher
- Department of Information Science, Faculty of Science, Toho University, Kyoto, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Clayton VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
12
|
Matta J, Zhao J, Ercal G, Obafemi-Ajayi T. Applications of node-based resilience graph theoretic framework to clustering autism spectrum disorders phenotypes. APPLIED NETWORK SCIENCE 2018; 3:38. [PMID: 30839816 PMCID: PMC6214326 DOI: 10.1007/s41109-018-0093-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 08/08/2018] [Indexed: 06/09/2023]
Abstract
With the growing ubiquity of data in network form, clustering in the context of a network, represented as a graph, has become increasingly important. Clustering is a very useful data exploratory machine learning tool that allows us to make better sense of heterogeneous data by grouping data with similar attributes based on some criteria. This paper investigates the application of a novel graph theoretic clustering method, Node-Based Resilience clustering (NBR-Clust), to address the heterogeneity of Autism Spectrum Disorder (ASD) and identify meaningful subgroups. The hypothesis is that analysis of these subgroups would reveal relevant biomarkers that would provide a better understanding of ASD phenotypic heterogeneity useful for further ASD studies. We address appropriate graph constructions suited for representing the ASD phenotype data. The sample population is drawn from a very large rigorous dataset: Simons Simplex Collection (SSC). Analysis of the results performed using graph quality measures, internal cluster validation measures, and clinical analysis outcome demonstrate the potential usefulness of resilience measure clustering for biomedical datasets. We also conduct feature extraction analysis to characterize relevant biomarkers that delineate the resulting subgroups. The optimal results obtained favored predominantly a 5-cluster configuration.
Collapse
Affiliation(s)
- John Matta
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL USA
| | - Junya Zhao
- Department of Computer Science, Missouri State University, Springfield, MO USA
| | - Gunes Ercal
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL USA
| | | |
Collapse
|