1
|
Zadok N, Ast G, Sharan R. A network-based method for associating genes with autism spectrum disorder. FRONTIERS IN BIOINFORMATICS 2024; 4:1295600. [PMID: 38525240 PMCID: PMC10960359 DOI: 10.3389/fbinf.2024.1295600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 02/26/2024] [Indexed: 03/26/2024] Open
Abstract
Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.
Collapse
Affiliation(s)
- Neta Zadok
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Gil Ast
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
2
|
Hazuga MA, Grant SFA. Awakening new sleep biology with machine learning. Sleep 2023; 46:zsac284. [PMID: 36422063 PMCID: PMC9905772 DOI: 10.1093/sleep/zsac284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Indexed: 11/27/2022] Open
Affiliation(s)
- Mary Ann Hazuga
- Division of Human Genetics, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Center for Spatial and Functional Genomics, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Struan F A Grant
- Division of Human Genetics, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Center for Spatial and Functional Genomics, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Diabetes and Endocrinology, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Institute for Diabetes, Obesity, and Metabolism, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
3
|
Lee YY, Endale M, Wu G, Ruben MD, Francey LJ, Morris AR, Choo NY, Anafi RC, Smith DF, Liu AC, Hogenesch JB. Integration of genome-scale data identifies candidate sleep regulators. Sleep 2023; 46:zsac279. [PMID: 36462188 PMCID: PMC9905783 DOI: 10.1093/sleep/zsac279] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/02/2022] [Indexed: 12/05/2022] Open
Abstract
STUDY OBJECTIVES Genetics impacts sleep, yet, the molecular mechanisms underlying sleep regulation remain elusive. In this study, we built machine learning models to predict sleep genes based on their similarity to genes that are known to regulate sleep. METHODS We trained a prediction model on thousands of published datasets, representing circadian, immune, sleep deprivation, and many other processes, using a manually curated list of 109 sleep genes. RESULTS Our predictions fit with prior knowledge of sleep regulation and identified key genes and pathways to pursue in follow-up studies. As an example, we focused on the NF-κB pathway and showed that chronic activation of NF-κB in a genetic mouse model impacted the sleep-wake patterns. CONCLUSION Our study highlights the power of machine learning in integrating prior knowledge and genome-wide data to study genetic regulation of complex behaviors such as sleep.
Collapse
Affiliation(s)
- Yin Yeng Lee
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
- Department of Pharmacology and Systems Physiology, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Mehari Endale
- Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
| | - Gang Wu
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Marc D Ruben
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Lauren J Francey
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Andrew R Morris
- Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
| | - Natalie Y Choo
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Ron C Anafi
- Department of Medicine, Chronobiology and Sleep Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David F Smith
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Division of Pulmonary Medicine and the Sleep Center, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Center for Circadian Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Andrew C Liu
- Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
| | - John B Hogenesch
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
- Center for Circadian Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| |
Collapse
|
4
|
Matta J, Dobrino D, Yeboah D, Howard S, EL-Manzalawy Y, Obafemi-Ajayi T. Connecting phenotype to genotype: PheWAS-inspired analysis of autism spectrum disorder. Front Hum Neurosci 2022; 16:960991. [PMID: 36310845 PMCID: PMC9605200 DOI: 10.3389/fnhum.2022.960991] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/14/2022] [Indexed: 04/13/2024] Open
Abstract
Autism Spectrum Disorder (ASD) is extremely heterogeneous clinically and genetically. There is a pressing need for a better understanding of the heterogeneity of ASD based on scientifically rigorous approaches centered on systematic evaluation of the clinical and research utility of both phenotype and genotype markers. This paper presents a holistic PheWAS-inspired method to identify meaningful associations between ASD phenotypes and genotypes. We generate two types of phenotype-phenotype (p-p) graphs: a direct graph that utilizes only phenotype data, and an indirect graph that incorporates genotype as well as phenotype data. We introduce a novel methodology for fusing the direct and indirect p-p networks in which the genotype data is incorporated into the phenotype data in varying degrees. The hypothesis is that the heterogeneity of ASD can be distinguished by clustering the p-p graph. The obtained graphs are clustered using network-oriented clustering techniques, and results are evaluated. The most promising clusterings are subsequently analyzed for biological and domain-based relevance. Clusters obtained delineated different aspects of ASD, including differentiating ASD-specific symptoms, cognitive, adaptive, language and communication functions, and behavioral problems. Some of the important genes associated with the clusters have previous known associations to ASD. We found that clusters based on integrated genetic and phenotype data were more effective at identifying relevant genes than clusters constructed from phenotype information alone. These genes included five with suggestive evidence of ASD association and one known to be a strong candidate.
Collapse
Affiliation(s)
- John Matta
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Daniel Dobrino
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Dacosta Yeboah
- Department of Computer Science, Missouri State University, Springfield, MO, United States
| | - Swade Howard
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Yasser EL-Manzalawy
- Department of Translational Data Science and Informatics, Geisinger, Danville, PA, United States
| | - Tayo Obafemi-Ajayi
- Engineering Program, Missouri State University, Springfield, MO, United States
| |
Collapse
|
5
|
Zhou X, Feliciano P, Shu C, Wang T, Astrovskaya I, Hall JB, Obiajulu JU, Wright JR, Murali SC, Xu SX, Brueggeman L, Thomas TR, Marchenko O, Fleisch C, Barns SD, Snyder LG, Han B, Chang TS, Turner TN, Harvey WT, Nishida A, O'Roak BJ, Geschwind DH, Michaelson JJ, Volfovsky N, Eichler EE, Shen Y, Chung WK. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat Genet 2022; 54:1305-1319. [PMID: 35982159 PMCID: PMC9470534 DOI: 10.1038/s41588-022-01148-2] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 06/28/2022] [Indexed: 12/16/2022]
Abstract
To capture the full spectrum of genetic risk for autism, we performed a two-stage analysis of rare de novo and inherited coding variants in 42,607 autism cases, including 35,130 new cases recruited online by SPARK. We identified 60 genes with exome-wide significance (P < 2.5 × 10-6), including five new risk genes (NAV3, ITSN1, MARK2, SCAF1 and HNRNPUL2). The association of NAV3 with autism risk is primarily driven by rare inherited loss-of-function (LoF) variants, with an estimated relative risk of 4, consistent with moderate effect. Autistic individuals with LoF variants in the four moderate-risk genes (NAV3, ITSN1, SCAF1 and HNRNPUL2; n = 95) have less cognitive impairment than 129 autistic individuals with LoF variants in highly penetrant genes (CHD8, SCN2A, ADNP, FOXP1 and SHANK3) (59% vs 88%, P = 1.9 × 10-6). Power calculations suggest that much larger numbers of autism cases are needed to identify additional moderate-risk genes.
Collapse
Affiliation(s)
- Xueya Zhou
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Chang Shu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Department of Medical Genetics, Center for Medical Genetics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Neuroscience Research Institute, Department of Neurobiology, School of Basic Medical Sciences, Peking University Health Science Center; Key Laboratory for Neuroscience, Ministry of Education of China & National Health Commission of China, Beijing, China
| | | | | | - Joseph U Obiajulu
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA.,Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | | | - Shwetha C Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Leo Brueggeman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Taylor R Thomas
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | | | | | | | - Bing Han
- Simons Foundation, New York, NY, USA
| | - Timothy S Chang
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Tychele N Turner
- Department of Genetics, Washington University, St. Louis, MO, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrew Nishida
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Brian J O'Roak
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Daniel H Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Jacob J Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA.,Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA. .,Simons Foundation, New York, NY, USA. .,Department of Medicine, Columbia University Medical Center, New York, NY, USA.
| |
Collapse
|
6
|
Joudar SS, Albahri AS, Hamid RA. Triage and priority-based healthcare diagnosis using artificial intelligence for autism spectrum disorder and gene contribution: A systematic review. Comput Biol Med 2022; 146:105553. [PMID: 35561591 DOI: 10.1016/j.compbiomed.2022.105553] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 04/03/2022] [Accepted: 04/20/2022] [Indexed: 11/03/2022]
Abstract
The exact nature, harmful effects and aetiology of autism spectrum disorder (ASD) have caused widespread confusion. Artificial intelligence (AI) science helps solve challenging diagnostic problems in the medical field through extensive experiments. Disease severity is closely related to triage decisions and prioritisation contexts in medicine because both have been widely used to diagnose various diseases via AI, machine learning and automated decision-making techniques. Recently, taking advantage of high-performance AI algorithms has achieved accessible success in diagnosing and predicting risks from clinical and biological data. In contrast, less progress has been made with ASD because of obscure reasons. According to academic literature, ASD diagnosis works from a specific perspective, and much of the confusion arises from the fact that how AI techniques are currently integrated with the diagnosis of ASD concerning the triage and priority strategies and gene contributions. To this end, this study sought to describe a systematic review of the literature to assess the respective AI methods using the available datasets, highlight the tools and strategies used for diagnosing ASD and investigate how AI trends contribute in distinguishing triage and priority for ASD and gene contributions. Accordingly, this study checked the Science Direct, IEEE Xplore Digital Library, Web of Science (WoS), PubMed, and Scopus databases. A set of 363 articles from 2017 to 2022 is collected to reveal a clear picture and a better understanding of all the academic literature through a final set of 18 articles. The retrieved articles were filtered according to the defined inclusion and exclusion criteria and classified into three categories. The first category includes 'Triage patients based on diagnosis methods' which accounts for 16.66% (n = 3/18). The second category includes 'Prioritisation for Risky Genes' which accounts for 66.6% (n = 12/18) and is classified into two subcategories: 'Mutations observation based', 'Biomarkers and toxic chemical observations'. The third category includes 'E-triage using telehealth' which accounts for 16.66% (n = 3/18). This multidisciplinary systematic review revealed the taxonomy, motivations, recommendations and challenges of ASD research that need synergistic attention. Thus, this systematic review performs a comprehensive science mapping analysis and discusses the open issues that help perform and improve the recommended solution of ASD research direction. In addition, this study critically reviews the literature and attempts to address the current research gaps in knowledge and highlights weaknesses that require further research. Finally, a new developed methodology has been suggested as future work for triaging and prioritising ASD patients according to their severity levels by using decision-making techniques.
Collapse
Affiliation(s)
- Shahad Sabbar Joudar
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq; University of Technology, Baghdad, Iraq
| | - A S Albahri
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq.
| | - Rula A Hamid
- Informatics Institute for Postgraduate Studies (IIPS), Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq; College of Business Informatics, University of Information Technology and Communications (UOITC), Baghdad, Iraq
| |
Collapse
|
7
|
Arpi MNT, Simpson TI. SFARI genes and where to find them; modelling Autism Spectrum Disorder specific gene expression dysregulation with RNA-seq data. Sci Rep 2022; 12:10158. [PMID: 35710789 PMCID: PMC9203566 DOI: 10.1038/s41598-022-14077-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 06/01/2022] [Indexed: 11/09/2022] Open
Abstract
Autism Spectrum Disorders (ASD) have a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease one approach that is gaining popularity is the combination of gene expression and clinical genetic data, often using the SFARI-gene database, which comprises lists of curated genes considered to have causative roles in ASD when mutated in patients. We build a gene co-expression network to study the relationship between ASD-specific transcriptomic data and SFARI genes and then analyse it at different levels of granularity. No significant evidence is found of association between SFARI genes and differential gene expression patterns when comparing ASD samples to a control group, nor statistical enrichment of SFARI genes in gene co-expression network modules that have a strong correlation with ASD diagnosis. However, classification models that incorporate topological information from the whole ASD-specific gene co-expression network can predict novel SFARI candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. A statistically significant association is also found between the absolute level of gene expression and SFARI's genes and Scores, which can confound the analysis if uncorrected. We propose a novel approach to correct for this that is general enough to be applied to other problems affected by continuous sources of bias. It was found that only co-expression network analyses that integrate information from the whole network are able to reveal signatures linked to ASD diagnosis and novel candidate genes for the study of ASD, which individual gene or module analyses fail to do. It was also found that the influence of SFARI genes permeates not only other ASD scoring systems, but also lists of genes believed to be involved in other neurodevelopmental disorders.
Collapse
Affiliation(s)
| | - T Ian Simpson
- School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK. .,Simons Initiative for the Developing Brain (SIDB), Centre for Brain Discovery Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
8
|
Davis KW, Bilancia CG, Martin M, Vanzo R, Rimmasch M, Hom Y, Uddin M, Serrano MA. NeuroSCORE is a genome-wide omics-based model that identifies candidate disease genes of the central nervous system. Sci Rep 2022; 12:5427. [PMID: 35361823 PMCID: PMC8971396 DOI: 10.1038/s41598-022-08938-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023] Open
Abstract
To identify candidate disease genes of central nervous system (CNS) phenotypes, we created the Neurogenetic Systematic Correlation of Omics-Related Evidence (NeuroSCORE). We identified five genome-wide metrics highly associated with CNS phenotypes to score 19,601 protein-coding genes. Genes scored one point per metric (range: 0-5), identifying 8298 scored genes (scores ≥ 1) and 1601 "high scoring" genes (scores ≥ 3). Using logistic regression, we determined the odds ratio that genes with a NeuroSCORE from 1 to 5 would be associated with known CNS-related phenotypes compared to genes that scored zero. We tested NeuroSCORE using microarray copy number variants (CNVs) in case-control cohorts and aggregate mouse model data. High scoring genes are associated with CNS phenotypes (OR = 5.5, p < 2E-16), enriched in case CNVs, and mouse ortholog genes that cause behavioral and nervous system abnormalities. We identified 1058 high scoring genes with no disease association in OMIM. Transforming the logistic regression results indicates high scoring genes have an 84-92% chance of being associated with a CNS phenotype. Top scoring genes include GRIA1, MAP4K4, SF1, TNPO2, and ZSWIM8. Finally, we interrogated CNVs in the Clinical Genome Resource, finding the majority of clinically significant CNVs contain high scoring genes. These findings can direct future research and improve molecular diagnostics.
Collapse
Affiliation(s)
- Kyle W Davis
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA
| | - Colleen G Bilancia
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA
| | - Megan Martin
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA
| | - Rena Vanzo
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA
| | - Megan Rimmasch
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA
| | - Yolanda Hom
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA
| | - Mohammed Uddin
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
- Cellular Intelligence (Ci) Lab, GenomeArc Inc., Toronto, ON, Canada
| | - Moises A Serrano
- Bionano Genomics, Lineagen Division, Inc., 9540 Towne Center, Dr. #100, San Diego, CA, 92121, USA.
| |
Collapse
|
9
|
Jiang Y, Urresti J, Pagel KA, Pramod AB, Iakoucheva LM, Radivojac P. Prioritizing de novo autism risk variants with calibrated gene- and variant-scoring models. Hum Genet 2021; 141:1595-1613. [PMID: 34549350 DOI: 10.1007/s00439-021-02356-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 08/26/2021] [Indexed: 12/17/2022]
Abstract
Whole-exome and whole-genome sequencing studies in autism spectrum disorder (ASD) have identified hundreds of thousands of exonic variants. Only a handful of them, primarily loss-of-function variants, have been shown to increase the risk for ASD, while the contributory roles of other variants, including most missense variants, remain unknown. New approaches that combine tissue-specific molecular profiles with patients' genetic data can thus play an important role in elucidating the functional impact of exonic variation and improve understanding of ASD pathogenesis. Here, we integrate spatio-temporal gene co-expression networks from the developing human brain and protein-protein interaction networks to first reach accurate prioritization of ASD risk genes based on their connectivity patterns with previously known high-confidence ASD risk genes. We subsequently integrate these gene scores with variant pathogenicity predictions to further prioritize individual exonic variants based on the positive-unlabeled learning framework with gene- and variant-score calibration. We demonstrate that this approach discriminates among variants between cases and controls at the high end of the prediction range. Finally, we experimentally validate our top-scoring de novo mutation NP_001243143.1:p.Phe309Ser in the sodium/potassium-transporting ATPase ATP1A3 to disrupt protein binding with different partners.
Collapse
Affiliation(s)
- Yuxiang Jiang
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Jorge Urresti
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Kymberleigh A Pagel
- Department of Computer Science, Indiana University, Bloomington, IN, USA.,Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Akula Bala Pramod
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Lilia M Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
10
|
Gunning M, Pavlidis P. "Guilt by association" is not competitive with genetic association for identifying autism risk genes. Sci Rep 2021; 11:15950. [PMID: 34354131 PMCID: PMC8342445 DOI: 10.1038/s41598-021-95321-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/16/2021] [Indexed: 12/25/2022] Open
Abstract
Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.
Collapse
Affiliation(s)
- Margot Gunning
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
- Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
- Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
11
|
Wang J, Wu X, Li M. Microcanonical and Canonical Ensembles for fMRI Brain Networks in Alzheimer's Disease. ENTROPY (BASEL, SWITZERLAND) 2021; 23:216. [PMID: 33579012 PMCID: PMC7916760 DOI: 10.3390/e23020216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/03/2021] [Accepted: 02/08/2021] [Indexed: 12/22/2022]
Abstract
This paper seeks to advance the state-of-the-art in analysing fMRI data to detect onset of Alzheimer's disease and identify stages in the disease progression. We employ methods of network neuroscience to represent correlation across fMRI data arrays, and introduce novel techniques for network construction and analysis. In network construction, we vary thresholds in establishing BOLD time series correlation between nodes, yielding variations in topological and other network characteristics. For network analysis, we employ methods developed for modelling statistical ensembles of virtual particles in thermal systems. The microcanonical ensemble and the canonical ensemble are analogous to two different fMRI network representations. In the former case, there is zero variance in the number of edges in each network, while in the latter case the set of networks have a variance in the number of edges. Ensemble methods describe the macroscopic properties of a network by considering the underlying microscopic characterisations which are in turn closely related to the degree configuration and network entropy. When applied to fMRI data in populations of Alzheimer's patients and controls, our methods demonstrated levels of sensitivity adequate for clinical purposes in both identifying brain regions undergoing pathological changes and in revealing the dynamics of such changes.
Collapse
Affiliation(s)
- Jianjia Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China;
- Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, China
| | - Xichen Wu
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China;
| | - Mingrui Li
- Department of Computer Science, University of York, York YO10 5GH, UK;
| |
Collapse
|
12
|
Emberti Gialloreti L, Enea R, Di Micco V, Di Giovanni D, Curatolo P. Clustering Analysis Supports the Detection of Biological Processes Related to Autism Spectrum Disorder. Genes (Basel) 2020; 11:genes11121476. [PMID: 33316975 PMCID: PMC7763205 DOI: 10.3390/genes11121476] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/27/2020] [Accepted: 12/07/2020] [Indexed: 12/27/2022] Open
Abstract
Genome sequencing has identified a large number of putative autism spectrum disorder (ASD) risk genes, revealing possible disrupted biological pathways; however, the genetic and environmental underpinnings of ASD remain mostly unanswered. The presented methodology aimed to identify genetically related clusters of ASD individuals. By using the VariCarta dataset, which contains data retrieved from 13,069 people with ASD, we compared patients pairwise to build “patient similarity matrices”. Hierarchical-agglomerative-clustering and heatmapping were performed, followed by enrichment analysis (EA). We analyzed whole-genome sequencing retrieved from 2062 individuals, and isolated 11,609 genetic variants shared by at least two people. The analysis yielded three clusters, composed, respectively, by 574 (27.8%), 507 (24.6%), and 650 (31.5%) individuals. Overall, 4187 variants (36.1%) were common to the three clusters. The EA revealed that the biological processes related to the shared genetic variants were mainly involved in neuron projection guidance and morphogenesis, cell junctions, synapse assembly, and in observational, imitative, and vocal learning. The study highlighted genetic networks, which were more frequent in a sample of people with ASD, compared to the overall population. We suggest that itemizing not only single variants, but also gene networks, might support ASD etiopathology research. Future work on larger databases will have to ascertain the reproducibility of this methodology.
Collapse
Affiliation(s)
- Leonardo Emberti Gialloreti
- Department of Biomedicine and Prevention, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
- Correspondence:
| | - Roberto Enea
- IMME Research Centre, Via Giotto 43, 81100 Caserta, Italy;
| | - Valentina Di Micco
- Child Neurology and Psychiatry Unit, Systems Medicine Department, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy; (V.D.M.); (P.C.)
| | - Daniele Di Giovanni
- Department of Industrial Engineering, University of Rome Tor Vergata, Via del Politecnico 1, 00133 Rome, Italy;
| | - Paolo Curatolo
- Child Neurology and Psychiatry Unit, Systems Medicine Department, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy; (V.D.M.); (P.C.)
| |
Collapse
|
13
|
Lin Y, Afshar S, Rajadhyaksha AM, Potash JB, Han S. A Machine Learning Approach to Predicting Autism Risk Genes: Validation of Known Genes and Discovery of New Candidates. Front Genet 2020; 11:500064. [PMID: 33133139 PMCID: PMC7513695 DOI: 10.3389/fgene.2020.500064] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 08/13/2020] [Indexed: 11/17/2022] Open
Abstract
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role of de novo mutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to predict ASD risk genes using features from spatiotemporal gene expression patterns in human brain, gene-level constraint metrics, and other gene variation features. The genes identified through our prediction model were enriched for independent sets of ASD risk genes, and tended to be down-expressed in ASD brains, especially in frontal and parietal cortex. The highest-ranked genes not only included those with strong prior evidence for involvement in ASD (for example, NBEA, HERC1, and TCF20), but also indicated potentially novel candidates, such as, MYCBP2 and CAND1, which are involved in protein ubiquitination. We also showed that our method outperformed state-of-the-art scoring systems for ranking curated ASD candidate genes. Gene ontology enrichment analysis of our predicted risk genes revealed biological processes clearly relevant to ASD, including neuronal signaling, neurogenesis, and chromatin remodeling, but also highlighted other potential mechanisms that might underlie ASD, such as regulation of RNA alternative splicing and ubiquitination pathway related to protein degradation. Our study demonstrates that human brain spatiotemporal gene expression patterns and gene-level constraint metrics can help predict ASD risk genes. Our gene ranking system provides a useful resource for prioritizing ASD candidate genes.
Collapse
Affiliation(s)
- Ying Lin
- Department of Industrial Engineering, University of Houston, Houston, TX, United States
| | - Shiva Afshar
- Department of Industrial Engineering, University of Houston, Houston, TX, United States
| | - Anjali M Rajadhyaksha
- Division of Pediatric Neurology, Department of Pediatrics, Weill Cornell Medicine, New York, NY, United States.,Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, New York, NY, United States.,Weill Cornell Autism Research Program, Weill Cornell Medicine, New York, NY, United States
| | - James B Potash
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, United States
| | - Shizhong Han
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, United States.,Lieber Institute for Brain Development, Baltimore, MD, United States
| |
Collapse
|