Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhou J, Park CY, Theesfeld CL, Wong AK, Yuan Y, Scheckel C, Fak JJ, Funk J, Yao K, Tajima Y, Packer A, Darnell RB, Troyanskaya OG. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat Genet 2019;51:973-980. [PMID: 31133750 PMCID: PMC6758908 DOI: 10.1038/s41588-019-0420-0] [Citation(s) in RCA: 146] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 04/12/2019] [Indexed: 12/19/2022]

For:	Zhou J, Park CY, Theesfeld CL, Wong AK, Yuan Y, Scheckel C, Fak JJ, Funk J, Yao K, Tajima Y, Packer A, Darnell RB, Troyanskaya OG. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat Genet 2019;51:973-980. [PMID: 31133750 PMCID: PMC6758908 DOI: 10.1038/s41588-019-0420-0] [Citation(s) in RCA: 146] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 04/12/2019] [Indexed: 12/19/2022]

Number	Cited by Other Article(s)
1	Noncoding de novo mutations in SCN2A are associated with autism spectrum disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.05.24306908. [PMID: 38766206 PMCID: PMC11100849 DOI: 10.1101/2024.05.05.24306908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Abstract Coding de novo mutations (DNMs) contribute to the risk for autism spectrum disorders (ASD), but the contribution of noncoding DNMs remains relatively unexplored. Here we use whole genome sequencing (WGS) data of 12,411 individuals (including 3,508 probands and 2,218 unaffected siblings) from 3,357 families collected in Simons Foundation Powering Autism Research for Knowledge (SPARK) to detect DNMs associated with ASD, while examining Simons Simplex Collection (SSC) with 6383 individuals from 2274 families to replicate the results. For coding DNMs, SCN2A reached exome-wide significance (p=2.06×10 -11 ) in SPARK. The 618 known dominant ASD genes as a group are strongly enriched for coding DNMs in cases than sibling controls (fold change=1.51, p =1.13×10 -5 for SPARK; fold change=1.86, p =2.06×10 -9 for SSC). For noncoding DNMs, we used two methods to assess statistical significance: a point-based test that analyzes sites with a Combined Annotation Dependent Depletion (CADD) score ≥15, and a segment-based test that analyzes 1kb genomic segments with segment-specific background mutation rates (inferred from expected rare mutations in Gnocchi genome constraint scores). The point-based test identified SCN2A as marginally significant ( p =6.12×10 -4 ) in SPARK, yet segment-based test identified CSMD1 , RBFOX1 and CHD13 as exome-wide significant. We did not identify significant enrichment of noncoding DNMs (in all 1kb segments or those with Gnocchi>4) in the 618 known ASD genes as a group in cases than sibling controls. When combining evidence from both coding and noncoding DNMs, we found that SCN2A with 11 coding and 5 noncoding DNMs exhibited the strongest significance (p=4.15×10 -13 ). In summary, we identified both coding and noncoding DNMs in SCN2A associated with ASD, while nominating additional candidates for further examination in future studies. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
2	From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024;25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open Abstract Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape. Collapse Key Words RNA-Seq technology bioinformatic deep learning genome annotation genome sequence Collapse MESH Headings Humans Deep Learning Genome Algorithms Software Computational Biology/methods Molecular Sequence Annotation Collapse Grants 2021YFF1000900 National Key Research and Development Program of China RCYX20210706092103024 Shenzhen Science and Technology Program 32222019 National Natural Science Foundation of China Collapse Affiliation(s) Collapse
3	Fucosyltransferase 8 regulates adult neurogenesis and cognition of mice by modulating the Itga6-PI3K/Akt signaling pathway. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2510-0. [PMID: 38523237 DOI: 10.1007/s11427-023-2510-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 11/14/2023] [Indexed: 03/26/2024] Abstract Fucosyltransferase 8 (Fut8) and core fucosylation play critical roles in regulating various biological processes, including immune response, signal transduction, proteasomal degradation, and energy metabolism. However, the function and underlying mechanism of Fut8 and core fucosylation in regulating adult neurogenesis remains unknown. We have shown that Fut8 and core fucosylation display dynamic features during the differentiation of adult neural stem/progenitor cells (aNSPCs) and postnatal brain development. Fut8 depletion reduces the proliferation of aNSPCs and inhibits neuronal differentiation of aNSPCs in vitro and in vivo, respectively. Additionally, Fut8 deficiency impairs learning and memory in mice. Mechanistically, Fut8 directly interacts with integrin α6 (Itga6), an upstream regulator of the PI3k-Akt signaling pathway, and catalyzes core fucosylation of Itga6. Deletion of Fut8 enhances the ubiquitination of Itga6 by promoting the binding of ubiquitin ligase Trim21 to Itga6. Low levels of Itga6 inhibit the activity of the PI3K/Akt signaling pathway. Moreover, the Akt agonist SC79 can rescue neurogenic and behavioral deficits caused by Fut8 deficiency. In summary, our study uncovers an essential function of Fut8 and core fucosylation in regulating adult neurogenesis and sheds light on the underlying mechanisms. Collapse Key Words Fut8 Itga6 PI3K/Akt adult neural stem/progenitor cells neurogenesis Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
4	Identifying associations of de novo noncoding variants with autism through integration of gene expression, sequence and sex information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.20.585624. [PMID: 38562739 PMCID: PMC10983996 DOI: 10.1101/2024.03.20.585624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Abstract Whole-genome sequencing (WGS) data is facilitating genome-wide identification of rare noncoding variants, while elucidating their roles in disease remains challenging. Towards this end, we first revisit a reported significant brain-related association signal of autism spectrum disorder (ASD) detected from de novo noncoding variants attributed to deep-learning and show that local GC content can capture similar association signals. We further show that the association signal appears driven by variants from male proband-female sibling pairs that are upstream of assigned genes. We then develop Expression Neighborhood Sequence Association Study (ENSAS), which utilizes gene expression correlations and sequence information, to more systematically identify phenotype-associated variant sets. Applying ENSAS to the same set of de novo variants, we identify gene expression-based neighborhoods showing significant ASD association signal, enriched for synapse-related gene ontology terms. For these top neighborhoods, we also identify chromatin states annotations of variants that are predictive of the proband-sibling local GC content differences. Our work provides new insights into associations of non-coding de novo mutations in ASD and presents an analytical framework applicable to other phenotypes. Collapse Key Words Collapse MESH Headings Collapse Grants U01 HG012079 NHGRI NIH HHS R01 MH110927 NIMH NIH HHS U01 MH105578 NIMH NIH HHS R01 MH109912 NIMH NIH HHS DP1 DA044371 NIDA NIH HHS U01 MH130995 NIMH NIH HHS Collapse Affiliation(s) Collapse
5	Classification of Neisseria meningitidis genomes with a bag-of-words approach and machine learning. iScience 2024;27:109257. [PMID: 38439962 PMCID: PMC10910294 DOI: 10.1016/j.isci.2024.109257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 12/13/2023] [Accepted: 02/13/2024] [Indexed: 03/06/2024] Open Abstract Whole genome sequencing of bacteria is important to enable strain classification. Using entire genomes as an input to machine learning (ML) models would allow rapid classification of strains while using information from multiple genetic elements. We developed a "bag-of-words" approach to encode, using SentencePiece or k-mer tokenization, entire bacterial genomes and analyze these with ML. Initial model selection identified SentencePiece with 8,000 and 32,000 words as the best approach for genome tokenization. We then classified in Neisseria meningitidis genomes the capsule B group genotype with 99.6% accuracy and the multifactor invasive phenotype with 90.2% accuracy, in an independent test set. Subsequently, in silico knockouts of 2,808 genes confirmed that the ML model predictions aligned with our current understanding of the underlying biology. To our knowledge, this is the first ML method using entire bacterial genomes to classify strains and identify genes considered relevant by the classifier. Collapse Key Words Classification of bioinformatical subject Machine learning Microbial genomics Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
6	Tapioca: a platform for predicting de novo protein-protein interactions in dynamic contexts. Nat Methods 2024;21:488-500. [PMID: 38361019 DOI: 10.1038/s41592-024-02179-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 01/12/2024] [Indexed: 02/17/2024] Abstract Protein-protein interactions (PPIs) drive cellular processes and responses to environmental cues, reflecting the cellular state. Here we develop Tapioca, an ensemble machine learning framework for studying global PPIs in dynamic contexts. Tapioca predicts de novo interactions by integrating mass spectrometry interactome data from thermal/ion denaturation or cofractionation workflows with protein properties and tissue-specific functional networks. Focusing on the thermal proximity coaggregation method, we improved the experimental workflow. Finely tuned thermal denaturation afforded increased throughput, while cell lysis optimization enhanced protein detection from different subcellular compartments. The Tapioca workflow was next leveraged to investigate viral infection dynamics. Temporal PPIs were characterized during the reactivation from latency of the oncogenic Kaposi's sarcoma-associated herpesvirus. Together with functional assays, NUCKS was identified as a proviral hub protein, and a broader role was uncovered by integrating PPI networks from alpha- and betaherpesvirus infections. Altogether, Tapioca provides a web-accessible platform for predicting PPIs in dynamic contexts. Collapse Key Words Collapse MESH Headings Sarcoma, Kaposi/metabolism Viral Proteins/metabolism Manihot/metabolism Virus Latency Herpesvirus 8, Human/metabolism Collapse Grants R01GM114141 U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) T32GM007388 U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) R01GM071966 U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) AI174515 U.S. Department of Health & Human Services \| NIH \| National Institute of Allergy and Infectious Diseases (NIAID) 3.1416 EIF \| Stand Up To Cancer (SU2C) DGE-2039656 National Science Foundation (NSF) COCR23PRF019 State of New Jersey Department of Health (NJ Health) R01HG005998 U.S. Department of Health & Human Services \| NIH \| National Human Genome Research Institute (NHGRI) 395506 Simons Foundation Collapse Affiliation(s) Collapse
7	Proteome-Wide Assessment of Clustering of Missense Variants in Neurodevelopmental Disorders Versus Cancer. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.02.24302238. [PMID: 38352539 PMCID: PMC10863034 DOI: 10.1101/2024.02.02.24302238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024] Abstract Missense de novo variants (DNVs) and missense somatic variants contribute to neurodevelopmental disorders (NDDs) and cancer, respectively. Proteins with statistical enrichment based on analyses of these variants exhibit convergence in the differing NDD and cancer phenotypes. Herein, the question of why some of the same proteins are identified in both phenotypes is examined through investigation of clustering of missense variation at the protein level. Our hypothesis is that missense variation is present in different protein locations in the two phenotypes leading to the distinct phenotypic outcomes. We tested this hypothesis in 1D protein space using our software CLUMP. Furthermore, we newly developed 3D-CLUMP that uses 3D protein structures to spatially test clustering of missense variation for proteome-wide significance. We examined missense DNVs in 39,883 parent-child sequenced trios with NDDs and missense somatic variants from 10,543 sequenced tumors covering five TCGA cancer types and two COSMIC pan-cancer aggregates of tissue types. There were 57 proteins with proteome-wide significant missense variation clustering in NDDs when compared to cancers and 79 proteins with proteome-wide significant missense clustering in cancers compared to NDDs. While our main objective was to identify differences in patterns of missense variation, we also identified a novel NDD protein BLTP2. Overall, our study is innovative, provides new insights into differential missense variation in NDDs and cancer at the protein-level, and contributes necessary information toward building a framework for thinking about prognostic and therapeutic aspects of these proteins. Collapse Key Words Collapse MESH Headings Collapse Grants P50 HD103525 NICHD NIH HHS R00 MH117165 NIMH NIH HHS R01 MH126933 NIMH NIH HHS U24 CA258393 NCI NIH HHS Collapse Affiliation(s) Collapse
8	Strategies for dissecting the complexity of neurodevelopmental disorders. Trends Genet 2024;40:187-202. [PMID: 37949722 PMCID: PMC10872993 DOI: 10.1016/j.tig.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/20/2023] [Accepted: 10/16/2023] [Indexed: 11/12/2023] Abstract Neurodevelopmental disorders (NDDs) are associated with a wide range of clinical features, affecting multiple pathways involved in brain development and function. Recent advances in high-throughput sequencing have unveiled numerous genetic variants associated with NDDs, which further contribute to disease complexity and make it challenging to infer disease causation and underlying mechanisms. Herein, we review current strategies for dissecting the complexity of NDDs using model organisms, induced pluripotent stem cells, single-cell sequencing technologies, and massively parallel reporter assays. We further highlight single-cell CRISPR-based screening techniques that allow genomic investigation of cellular transcriptomes with high efficiency, accuracy, and throughput. Overall, we provide an integrated review of experimental approaches that can be applicable for investigating a broad range of complex disorders. Collapse Key Words CRISPR MPRA autism genomics iPSC model organism Collapse MESH Headings Humans Neurodevelopmental Disorders/genetics Genomics Genome Collapse Grants R01 GM121907 NIGMS NIH HHS R21 NS122398 NINDS NIH HHS Collapse Affiliation(s) Collapse
9	Reanalysis of Trio Whole-Genome Sequencing Data Doubles the Yield in Autism Spectrum Disorder: De Novo Variants Present in Half. Int J Mol Sci 2024;25:1192. [PMID: 38256266 PMCID: PMC10816071 DOI: 10.3390/ijms25021192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 01/14/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open Abstract Autism spectrum disorder (ASD) is a common condition with lifelong implications. The last decade has seen dramatic improvements in DNA sequencing and related bioinformatics and databases. We analyzed the raw DNA sequencing files on the Variantyx® bioinformatics platform for the last 50 ASD patients evaluated with trio whole-genome sequencing (trio-WGS). "Qualified" variants were defined as coding, rare, and evolutionarily conserved. Primary Diagnostic Variants (PDV), additionally, were present in genes directly linked to ASD and matched clinical correlation. A PDV was identified in 34/50 (68%) of cases, including 25 (50%) cases with heterozygous de novo and 10 (20%) with inherited variants. De novo variants in genes directly associated with ASD were far more likely to be Qualifying than non-Qualifying versus a control group of genes (p = 0.0002), validating that most are indeed disease related. Sequence reanalysis increased diagnostic yield from 28% to 68%, mostly through inclusion of de novo PDVs in genes not yet reported as ASD associated. Thirty-three subjects (66%) had treatment recommendation(s) based on DNA analyses. Our results demonstrate a high yield of trio-WGS for revealing molecular diagnoses in ASD, which is greatly enhanced by reanalyzing DNA sequencing files. In contrast to previous reports, de novo variants dominate the findings, mostly representing novel conditions. This has implications to the cause and rising prevalence of autism. Collapse Key Words DNA sequencing autism diagnostic yield novel disorders Collapse MESH Headings Humans Autism Spectrum Disorder/genetics Whole Genome Sequencing Sequence Analysis, DNA Autistic Disorder Computational Biology Collapse Grants none none Collapse Affiliation(s) Collapse
10	Quantifying negative selection in human 3' UTRs uncovers constrained targets of RNA-binding proteins. Nat Commun 2024;15:85. [PMID: 38168060 PMCID: PMC10762232 DOI: 10.1038/s41467-023-44456-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 12/14/2023] [Indexed: 01/05/2024] Open Abstract Many non-coding variants associated with phenotypes occur in 3' untranslated regions (3' UTRs), and may affect interactions with RNA-binding proteins (RBPs) to regulate gene expression post-transcriptionally. However, identifying functional 3' UTR variants has proven difficult. We use allele frequencies from the Genome Aggregation Database (gnomAD) to identify classes of 3' UTR variants under strong negative selection in humans. We develop intergenic mutability-adjusted proportion singleton (iMAPS), a generalized measure related to MAPS, to quantify negative selection in non-coding regions. This approach, in conjunction with in vitro and in vivo binding data, identifies precise RBP binding sites, miRNA target sites, and polyadenylation signals (PASs) under strong selection. For each class of sites, we identify thousands of gnomAD variants under selection comparable to missense coding variants, and find that sites in core 3' UTR regions upstream of the most-used PAS are under strongest selection. Together, this work improves our understanding of selection on human genes and validates approaches for interpreting genetic variants in human 3' UTRs. Collapse Key Words genetic variation molecular biology molecular evolution Collapse MESH Headings Humans 3' Untranslated Regions/genetics MicroRNAs/genetics MicroRNAs/metabolism Binding Sites/genetics Polyadenylation RNA-Binding Proteins/genetics RNA-Binding Proteins/metabolism Collapse Grants R01 GM085319 NIGMS NIH HHS R01 HG002439 NHGRI NIH HHS Foundation for the National Institutes of Health (Foundation for the National Institutes of Health, Inc.) Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC Canadian Network for Research and Innovation in Machining Technology) Collapse Affiliation(s) Collapse
11	Systematic bibliometric and visualized analysis of research hotspots and trends in artificial intelligence in autism spectrum disorder. Front Neuroinform 2023;17:1310400. [PMID: 38125308 PMCID: PMC10731312 DOI: 10.3389/fninf.2023.1310400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open Abstract Background Artificial intelligence (AI) has been the subject of studies in autism spectrum disorder (ASD) and may affect its identification, diagnosis, intervention, and other medical practices in the future. Although previous studies have used bibliometric techniques to analyze and investigate AI, there has been little research on the adoption of AI in ASD. This study aimed to explore the broad applications and research frontiers of AI used in ASD. Methods Citation data were retrieved from the Web of Science Core Collection (WoSCC) database to assess the extent to which AI is used in ASD. CiteSpace.5.8. R3 and VOSviewer, two online tools for literature metrology analysis, were used to analyze the data. Results A total of 776 publications from 291 countries and regions were analyzed; of these, 256 publications were from the United States and 173 publications were from China, and England had the largest centrality of 0.33; Stanford University had the highest H-index of 17; and the largest cluster label of co-cited references was machine learning. In addition, keywords with a high number of occurrences in this field were autism spectrum disorder (295), children (255), classification (156) and diagnosis (77). The burst keywords from 2021 to 2023 were infants and feature selection, and from 2022 to 2023, the burst keyword was corpus callosum. Conclusion This research provides a systematic analysis of the literature concerning AI used in ASD, presenting an overall demonstration in this field. In this area, the United States and China have the largest number of publications, England has the greatest influence, and Stanford University is the most influential. In addition, the research on AI used in ASD mostly focuses on classification and diagnosis, and "infants, feature selection, and corpus callosum are at the forefront, providing directions for future research. However, the use of AI technologies to identify ASD will require further research. Collapse Key Words CiteSpace VOSviewer artificial intelligence autism spectrum disorder bibliometric data visualization Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
12	Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat Genet 2023;55:2060-2064. [PMID: 38036778 DOI: 10.1038/s41588-023-01524-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 09/08/2023] [Indexed: 12/02/2023] Abstract Deep learning methods have recently become the state of the art in a variety of regulatory genomic tasks1-6, including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions; however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluate their utility as personal DNA interpreters. We used paired whole genome sequencing and gene expression from 839 individuals in the ROSMAP study7 to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learned sequence motif grammar and suggest new model training strategies to improve performance. Collapse Key Words Collapse MESH Headings Humans Benchmarking Base Sequence Neural Networks, Computer DNA Gene Expression Collapse Grants P30 AG072975 NIA NIH HHS Collapse Affiliation(s) Collapse
13	Single-cell long-read sequencing in human cerebral organoids uncovers cell-type-specific and autism-associated exons. Cell Rep 2023;42:113335. [PMID: 37889749 PMCID: PMC10842930 DOI: 10.1016/j.celrep.2023.113335] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open Abstract Dysregulation of alternative splicing has been repeatedly associated with neurodevelopmental disorders, but the extent of cell-type-specific splicing in human neural development remains largely uncharted. Here, single-cell long-read sequencing in induced pluripotent stem cell (iPSC)-derived cerebral organoids identifies over 31,000 uncatalogued isoforms and 4,531 cell-type-specific splicing events. Long reads uncover coordinated splicing and cell-type-specific intron retention events, which are challenging to study with short reads. Retained neuronal introns are enriched in RNA splicing regulators, showing shorter lengths, higher GC contents, and weaker 5' splice sites. We use this dataset to explore the biological processes underlying neurological disorders, focusing on autism. In comparison with prior transcriptomic data, we find that the splicing program in autistic brains is closer to the progenitor state than differentiated neurons. Furthermore, cell-type-specific exons harbor significantly more de novo mutations in autism probands than in siblings. Overall, these results highlight the importance of cell-type-specific splicing in autism and neuronal gene regulation. Collapse Key Words CP: Molecular biology CP: Neuroscience RNA-binding protein alternative splicing coordinated splicing de novo mutation full-length transcripts retained intron scIso-seq scRNA-seq splice isoform uncatalogued isoform Collapse MESH Headings Humans Autistic Disorder/genetics Alternative Splicing/genetics RNA Splicing/genetics Protein Isoforms/genetics Exons/genetics Introns/genetics RNA Splice Sites Collapse Grants DP2 GM137423 NIGMS NIH HHS R01 MH130594 NIMH NIH HHS Collapse Affiliation(s) Collapse
14	Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.04.560808. [PMID: 37873116 PMCID: PMC10592962 DOI: 10.1101/2023.10.04.560808] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Abstract Ectopic expression of OCT4, SOX2, KLF4 and MYC (OSKM) transforms differentiated cells into induced pluripotent stem cells. To refine our mechanistic understanding of reprogramming, especially during the early stages, we profiled chromatin accessibility and gene expression at single-cell resolution across a densely sampled time course of human fibroblast reprogramming. Using neural networks that map DNA sequence to ATAC-seq profiles at base-resolution, we annotated cell-state-specific predictive transcription factor (TF) motif syntax in regulatory elements, inferred affinity- and concentration-dependent dynamics of Tn5-bias corrected TF footprints, linked peaks to putative target genes, and elucidated rewiring of TF-to-gene cis-regulatory networks. Our models reveal that early in reprogramming, OSK, at supraphysiological concentrations, rapidly open transient regulatory elements by occupying non-canonical low-affinity binding sites. As OSK concentration falls, the accessibility of these transient elements decays as a function of motif affinity. We find that these OSK-dependent transient elements sequester the somatic TF AP-1. This redistribution is strongly associated with the silencing of fibroblast-specific genes within individual nuclei. Together, our integrated single-cell resource and models reveal insights into the cis-regulatory code of reprogramming at unprecedented resolution, connect TF stoichiometry and motif syntax to diversification of cell fate trajectories, and provide new perspectives on the dynamics and role of transient regulatory elements in somatic silencing. Collapse Key Words Collapse MESH Headings Collapse Grants R01 GM136737 NIGMS NIH HHS R01 HG009674 NHGRI NIH HHS R61 AR076815 NIAMS NIH HHS Collapse Affiliation(s) Collapse
15	Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.532969. [PMID: 36993652 PMCID: PMC10055057 DOI: 10.1101/2023.03.16.532969] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Abstract Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks1-6 including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions, however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluates their utility as personal DNA interpreters. We used paired Whole Genome Sequencing and gene expression from 839 individuals in the ROSMAP study7 to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learnt sequence motif grammar, and suggest new model training strategies to improve performance. Collapse Key Words Collapse MESH Headings Collapse Grants P30 AG072975 NIA NIH HHS U01 AG046152 NIA NIH HHS U01 AG061356 NIA NIH HHS R01 AG017917 NIA NIH HHS R01 AG057911 NIA NIH HHS P30 AG010161 NIA NIH HHS R01 AG036836 NIA NIH HHS R01 AG015819 NIA NIH HHS Collapse Affiliation(s) Collapse
16	Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023;24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open Abstract Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients. Collapse Key Words RNA sequencing diagnostics machine learning rare disease splicing variant interpretation Collapse MESH Headings Humans Rare Diseases/diagnosis Rare Diseases/genetics Transcriptome RNA Splicing Proteins Machine Learning Mutation Collapse Grants T32 HG000046 NHGRI NIH HHS R01GM088342 NIH HHS R56 HG012310 NHGRI NIH HHS R01 GM121827 NIGMS NIH HHS U54 NS115198 NINDS NIH HHS K08 NS118119 NINDS NIH HHS Collapse Affiliation(s) Collapse
17	A systematic benchmark of machine learning methods for protein-RNA interaction prediction. Brief Bioinform 2023;24:bbad307. [PMID: 37635383 PMCID: PMC10516373 DOI: 10.1093/bib/bbad307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/15/2023] [Accepted: 07/18/2023] [Indexed: 08/29/2023] Open Abstract RNA-binding proteins (RBPs) are central actors of RNA post-transcriptional regulation. Experiments to profile-binding sites of RBPs in vivo are limited to transcripts expressed in the experimental cell type, creating the need for computational methods to infer missing binding information. While numerous machine-learning based methods have been developed for this task, their use of heterogeneous training and evaluation datasets across different sets of RBPs and CLIP-seq protocols makes a direct comparison of their performance difficult. Here, we compile a set of 37 machine learning (primarily deep learning) methods for in vivo RBP-RNA interaction prediction and systematically benchmark a subset of 11 representative methods across hundreds of CLIP-seq datasets and RBPs. Using homogenized sample pre-processing and two negative-class sample generation strategies, we evaluate methods in terms of predictive performance and assess the impact of neural network architectures and input modalities on model performance. We believe that this study will not only enable researchers to choose the optimal prediction method for their tasks at hand, but also aid method developers in developing novel, high-performing methods by introducing a standardized framework for their evaluation. Collapse Key Words RNA biology RNA-binding proteins benchmark deep learning Collapse MESH Headings Benchmarking Binding Sites Chromatin Immunoprecipitation Sequencing Machine Learning RNA/genetics Collapse Grants Helmholtz Association Munich School for Data Science Deutsche Forschungsgemeinschaft Collapse Affiliation(s) Collapse
18	Artificial intelligence in psychiatry research, diagnosis, and therapy. Asian J Psychiatr 2023;87:103705. [PMID: 37506575 DOI: 10.1016/j.ajp.2023.103705] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/16/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023] Abstract Psychiatric disorders are now responsible for the largest proportion of the global burden of disease, and even more challenges have been seen during the COVID-19 pandemic. Artificial intelligence (AI) is commonly used to facilitate the early detection of disease, understand disease progression, and discover new treatments in the fields of both physical and mental health. The present review provides a broad overview of AI methodology and its applications in data acquisition and processing, feature extraction and characterization, psychiatric disorder classification, potential biomarker detection, real-time monitoring, and interventions in psychiatric disorders. We also comprehensively summarize AI applications with regard to the early warning, diagnosis, prognosis, and treatment of specific psychiatric disorders, including depression, schizophrenia, autism spectrum disorder, attention-deficit/hyperactivity disorder, addiction, sleep disorders, and Alzheimer's disease. The advantages and disadvantages of AI in psychiatry are clarified. We foresee a new wave of research opportunities to facilitate and improve AI technology and its long-term implications in psychiatry during and after the COVID-19 era. Collapse Key Words Artificial intelligence Diagnosis Prognosis Psychiatric disorders Treatment Collapse MESH Headings Humans Artificial Intelligence Pandemics Autism Spectrum Disorder/diagnosis Autism Spectrum Disorder/therapy COVID-19 Psychiatry COVID-19 Testing Collapse Grants Collapse Affiliation(s) Collapse
19	The contributions of rare inherited and polygenic risk to ASD in multiplex families. Proc Natl Acad Sci U S A 2023;120:e2215632120. [PMID: 37506195 PMCID: PMC10400943 DOI: 10.1073/pnas.2215632120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 06/13/2023] [Indexed: 07/30/2023] Open Abstract Autism spectrum disorder (ASD) has a complex genetic architecture involving contributions from both de novo and inherited variation. Few studies have been designed to address the role of rare inherited variation or its interaction with common polygenic risk in ASD. Here, we performed whole-genome sequencing of the largest cohort of multiplex families to date, consisting of 4,551 individuals in 1,004 families having two or more autistic children. Using this study design, we identify seven previously unrecognized ASD risk genes supported by a majority of rare inherited variants, finding support for a total of 74 genes in our cohort and a total of 152 genes after combined analysis with other studies. Autistic children from multiplex families demonstrate an increased burden of rare inherited protein-truncating variants in known ASD risk genes. We also find that ASD polygenic score (PGS) is overtransmitted from nonautistic parents to autistic children who also harbor rare inherited variants, consistent with combinatorial effects in the offspring, which may explain the reduced penetrance of these rare variants in parents. We also observe that in addition to social dysfunction, language delay is associated with ASD PGS overtransmission. These results are consistent with an additive complex genetic risk architecture of ASD involving rare and common variation and further suggest that language delay is a core biological feature of ASD. Collapse Key Words autism spectrum disorder (ASD) genetics inherited multiplex families polygenic score (PGS) Collapse MESH Headings Child Humans Autism Spectrum Disorder/genetics Multifactorial Inheritance/genetics Parents Whole Genome Sequencing Language Development Disorders Genetic Predisposition to Disease Collapse Grants U24 MH081810 NIMH NIH HHS K08 AG065519 NIA NIH HHS R01 MH100027 NIMH NIH HHS UM1 HG008901 NHGRI NIH HHS S10 OD011939 NIH HHS P50 HD055784 NICHD NIH HHS R01 MH064547 NIMH NIH HHS P50 DC018006 NIDCD NIH HHS HHS \| National Institutes of Health (NIH) Collapse Affiliation(s) Collapse
20	Challenges in screening for de novo noncoding variants contributing to genetically complex phenotypes. HGG ADVANCES 2023;4:100210. [PMID: 37305558 PMCID: PMC10248550 DOI: 10.1016/j.xhgg.2023.100210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 05/15/2023] [Indexed: 06/13/2023] Open Abstract Understanding the genetic basis for complex, heterogeneous disorders, such as autism spectrum disorder (ASD), is a persistent challenge in human medicine. Owing to their phenotypic complexity, the genetic mechanisms underlying these disorders may be highly variable across individual patients. Furthermore, much of their heritability is unexplained by known regulatory or coding variants. Indeed, there is evidence that much of the causal genetic variation stems from rare and de novo variants arising from ongoing mutation. These variants occur mostly in noncoding regions, likely affecting regulatory processes for genes linked to the phenotype of interest. However, because there is no uniform code for assessing regulatory function, it is difficult to separate these mutations into likely functional and nonfunctional subsets. This makes finding associations between complex diseases and potentially causal de novo single-nucleotide variants (dnSNVs) a difficult task. To date, most published studies have struggled to find any significant associations between dnSNVs from ASD patients and any class of known regulatory elements. We sought to identify the underlying reasons for this and present strategies for overcoming these challenges. We show that, contrary to previous claims, the main reason for failure to find robust statistical enrichments is not only the number of families sampled, but also the quality and relevance to ASD of the annotations used to prioritize dnSNVs, and the reliability of the set of dnSNVs itself. We present a list of recommendations for designing future studies of this sort that will help researchers avoid common pitfalls. Collapse Key Words autism genetics de novo mutation noncoding variation Collapse MESH Headings Humans Autism Spectrum Disorder/diagnosis Reproducibility of Results Cell Movement Medicine Phenotype Collapse Grants U24 HG009293 NHGRI NIH HHS U41 HG009293 NHGRI NIH HHS T32 GM070449 NIGMS NIH HHS Collapse Affiliation(s) Collapse
21	DeepASDPred: a CNN-LSTM-based deep learning method for Autism spectrum disorders risk RNA identification. BMC Bioinformatics 2023;24:261. [PMID: 37349705 DOI: 10.1186/s12859-023-05378-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/06/2023] [Indexed: 06/24/2023] Open Abstract BACKGROUND Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders characterized by difficulty communicating with society and others, behavioral difficulties, and a brain that processes information differently than normal. Genetics has a strong impact on ASD associated with early onset and distinctive signs. Currently, all known ASD risk genes are able to encode proteins, and some de novo mutations disrupting protein-coding genes have been demonstrated to cause ASD. Next-generation sequencing technology enables high-throughput identification of ASD risk RNAs. However, these efforts are time-consuming and expensive, so an efficient computational model for ASD risk gene prediction is necessary. RESULTS In this study, we propose DeepASDPerd, a predictor for ASD risk RNA based on deep learning. Firstly, we use K-mer to feature encode the RNA transcript sequences, and then fuse them with corresponding gene expression values to construct a feature matrix. After combining chi-square test and logistic regression to select the best feature subset, we input them into a binary classification prediction model constructed by convolutional neural network and long short-term memory for training and classification. The results of the tenfold cross-validation proved our method outperformed the state-of-the-art methods. Dataset and source code are available at https://github.com/Onebear-X/DeepASDPred is freely available. CONCLUSIONS Our experimental results show that DeepASDPred has outstanding performance in identifying ASD risk RNA genes. Collapse Key Words ASD risk RNA Deep learning DeepASDPred K-mer feature extraction Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
22	Multi-omics Integration and Epilepsy: Towards a Better Understanding of Biological Mechanisms. Prog Neurobiol 2023:102480. [PMID: 37286031 DOI: 10.1016/j.pneurobio.2023.102480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/09/2023] Abstract The epilepsies are a group of complex neurological disorders characterised by recurrent seizures. Approximately 30% of patients fail to respond to anti-seizure medications, despite the recent introduction of many new drugs. The molecular processes underlying epilepsy development are not well understood and this knowledge gap impedes efforts to identify effective targets and develop novel therapies against epilepsy. Omics studies allow a comprehensive characterisation of a class of molecules. Omics-based biomarkers have led to clinically validated diagnostic and prognostic tests for personalised oncology, and more recently for non-cancer diseases. We believe that, in epilepsy, the full potential of multi-omics research is yet to be realised and we envisage that this review will serve as a guide to researchers planning to undertake omics-based mechanistic studies. Collapse Key Words Metabolomics Multi-omics Network medicine Proteomics Temporal lobe epilepsy Transcriptomics Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
23	Integrative identification of non-coding regulatory regions driving metastatic prostate cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.14.535921. [PMID: 37398273 PMCID: PMC10312451 DOI: 10.1101/2023.04.14.535921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023] Abstract Large-scale sequencing efforts of thousands of tumor samples have been undertaken to understand the mutational landscape of the coding genome. However, the vast majority of germline and somatic variants occur within non-coding portions of the genome. These genomic regions do not directly encode for specific proteins, but can play key roles in cancer progression, for example by driving aberrant gene expression control. Here, we designed an integrative computational and experimental framework to identify recurrently mutated non-coding regulatory regions that drive tumor progression. Application of this approach to whole-genome sequencing (WGS) data from a large cohort of metastatic castration-resistant prostate cancer (mCRPC) revealed a large set of recurrently mutated regions. We used (i) in silico prioritization of functional non-coding mutations, (ii) massively parallel reporter assays, and (iii) in vivo CRISPR-interference (CRISPRi) screens in xenografted mice to systematically identify and validate driver regulatory regions that drive mCRPC. We discovered that one of these enhancer regions, GH22I030351, acts on a bidirectional promoter to simultaneously modulate expression of U2-associated splicing factor SF3A1 and chromosomal protein CCDC157. We found that both SF3A1 and CCDC157 are promoters of tumor growth in xenograft models of prostate cancer. We nominated a number of transcription factors, including SOX6, to be responsible for higher expression of SF3A1 and CCDC157. Collectively, we have established and confirmed an integrative computational and experimental approach that enables the systematic detection of non-coding regulatory regions that drive the progression of human cancers. Collapse Key Words Collapse MESH Headings Collapse Grants DP2 CA239597 NCI NIH HHS R01 CA240984 NCI NIH HHS R01 CA244634 NCI NIH HHS S10 OD028511 NIH HHS Collapse Affiliation(s) Collapse
24	Harnessing deep learning into hidden mutations of neurological disorders for therapeutic challenges. Arch Pharm Res 2023:10.1007/s12272-023-01450-5. [PMID: 37261600 DOI: 10.1007/s12272-023-01450-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 05/26/2023] [Indexed: 06/02/2023] Abstract The relevant study of transcriptome-wide variations and neurological disorders in the evolved field of genomic data science is on the rise. Deep learning has been highlighted utilizing algorithms on massive amounts of data in a human-like manner, and is expected to predict the dependency or druggability of hidden mutations within the genome. Enormous mutational variants in coding and noncoding transcripts have been discovered along the genome by far, despite of the fine-tuned genetic proofreading machinery. These variants could be capable of inducing various pathological conditions, including neurological disorders, which require lifelong care. Several limitations and questions emerge, including the use of conventional processes via limited patient-driven sequence acquisitions and decoding-based inferences as well as how rare variants can be deduced as a population-specific etiology. These puzzles require harnessing of advanced systems for precise disease prediction, drug development and drug applications. In this review, we summarize the pathophysiological discoveries of pathogenic variants in both coding and noncoding transcripts in neurological disorders, and the current advantage of deep learning applications. In addition, we discuss the challenges encountered and how to outperform them with advancing interpretation. Collapse Key Words Deep learning Druggable target Neurological disorders Rare mutations Sequencing Transcriptome Collapse MESH Headings Collapse Grants 2022R1A2C1002984 Ministry of Science, ICT and Future Planning 202200000000904 Hanyang University Collapse Affiliation(s) Collapse
25	Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023] Abstract Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants. Collapse Key Words Functional prediction Non-coding variant Pathogenicity estimation Performance assessment Prediction model Collapse MESH Headings Humans Genome-Wide Association Study Virulence Autism Spectrum Disorder Polymorphism, Single Nucleotide Phenotype Collapse Grants Collapse Affiliation(s) Collapse
26	Correcting gradient-based interpretations of deep neural networks for genomics. Genome Biol 2023;24:109. [PMID: 37161475 PMCID: PMC10169356 DOI: 10.1186/s13059-023-02956-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/28/2023] [Indexed: 05/11/2023] Open Abstract Post hoc attribution methods can provide insights into the learned patterns from deep neural networks (DNNs) trained on high-throughput functional genomics data. However, in practice, their resultant attribution maps can be challenging to interpret due to spurious importance scores for seemingly arbitrary nucleotides. Here, we identify a previously overlooked attribution noise source that arises from how DNNs handle one-hot encoded DNA. We demonstrate this noise is pervasive across various genomic DNNs and introduce a statistical correction that effectively reduces it, leading to more reliable attribution maps. Our approach represents a promising step towards gaining meaningful insights from DNNs in regulatory genomics. Collapse Key Words Attribution methods Deep learning Explainable AI Model interpretability Regulatory genomics Collapse MESH Headings Collapse Grants R01HG012131 NHGRI NIH HHS Collapse Affiliation(s) Collapse
27	On the Decision Boundaries of Neural Networks: A Tropical Geometry Perspective. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023;45:5027-5037. [PMID: 36001517 DOI: 10.1109/tpami.2022.3201490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023] Abstract This work tackles the problem of characterizing and understanding the decision boundaries of neural networks with piecewise linear non-linearity activations. We use tropical geometry, a new development in the area of algebraic geometry, to characterize the decision boundaries of a simple network of the form (Affine, ReLU, Affine). Our main finding is that the decision boundaries are a subset of a tropical hypersurface, which is intimately related to a polytope formed by the convex hull of two zonotopes. The generators of these zonotopes are functions of the network parameters. This geometric characterization provides new perspectives to three tasks. (i) We propose a new tropical perspective to the lottery ticket hypothesis, where we view the effect of different initializations on the tropical geometric representation of a network's decision boundaries. (ii) Moreover, we propose new tropical based optimization reformulations that directly influence the decision boundaries of the network for the task of network pruning. (iii) At last, we discuss the reformulation of the generation of adversarial attacks in a tropical sense. We demonstrate that one can construct adversaries in a new tropical setting by perturbing a specific set of decision boundaries by perturbing a set of parameters in the network. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
28	The role of microRNAs in the molecular link between circadian rhythm and autism spectrum disorder. Anim Cells Syst (Seoul) 2023;27:38-52. [PMID: 36860270 PMCID: PMC9970207 DOI: 10.1080/19768354.2023.2180535] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open Abstract Circadian rhythm regulates physiological cycles of awareness and sleepiness. Melatonin production is primarily regulated by circadian regulation of gene expression and is involved in sleep homeostasis. If the circadian rhythm is abnormal, sleep disorders, such as insomnia and several other diseases, can occur. The term 'autism spectrum disorder (ASD)' is used to characterize people who exhibit a certain set of repetitive behaviors, severely constrained interests, social deficits, and/or sensory behaviors that start very early in life. Because many patients with ASD suffer from sleep disorders, sleep disorders and melatonin dysregulation are attracting attention for their potential roles in ASD. ASD is caused by abnormalities during the neurodevelopmental processes owing to various genetic or environmental factors. Recently, the role of microRNAs (miRNAs) in circadian rhythm and ASD have gained attraction. We hypothesized that the relationship between circadian rhythm and ASD could be explained by miRNAs that can regulate or be regulated by either or both. In this study, we introduced a possible molecular link between circadian rhythm and ASD. We performed a thorough literature review to understand their complexity. Collapse Key Words Circadian rhythm MicroRNA autism spectrum disorder Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
29	De novo human brain enhancers created by single-nucleotide mutations. SCIENCE ADVANCES 2023;9:eadd2911. [PMID: 36791193 PMCID: PMC9931207 DOI: 10.1126/sciadv.add2911] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 01/12/2023] [Indexed: 05/30/2023] Abstract Advanced human cognition is attributed to increased neocortex size and complexity, but the underlying evolutionary and regulatory mechanisms are largely unknown. Using human and macaque embryonic neocortical H3K27ac data coupled with a deep learning model of enhancers, we identified ~4000 enhancer gains in humans, which, per our model, can often be attributed to single-nucleotide essential mutations. Our analyses suggest that functional gains in embryonic brain development are associated with de novo enhancers whose putative target genes exhibit increased expression in progenitor cells and interneurons and partake in critical neural developmental processes. Essential mutations alter enhancer activity through altered binding of key transcription factors (TFs) of embryonic neocortex, including ISL1, POU3F2, PITX1/2, and several SOX TFs, and are associated with central nervous system disorders. Overall, our results suggest that essential mutations lead to gain of embryonic neocortex enhancers, which orchestrate expression of genes involved in critical developmental processes associated with human cognition. Collapse Key Words Collapse MESH Headings Humans Enhancer Elements, Genetic Nucleotides Transcription Factors/genetics Brain Mutation Gene Expression Regulation, Developmental Collapse Grants ZIA BC011979 Intramural NIH HHS ZIA LM200881 Intramural NIH HHS National Institutes of Health National Cancer Institute Collapse Affiliation(s) Collapse
30	Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023;3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023] Abstract Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area. Collapse Key Words deep learning gene regulation neural network omics Collapse MESH Headings Deep Learning Computational Biology/methods Collapse Grants Collapse Affiliation(s) Collapse
31	Machine Learning-Based Blood RNA Signature for Diagnosis of Autism Spectrum Disorder. Int J Mol Sci 2023;24:ijms24032082. [PMID: 36768401 PMCID: PMC9916487 DOI: 10.3390/ijms24032082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/15/2023] [Accepted: 01/17/2023] [Indexed: 01/21/2023] Open Abstract Early diagnosis of autism spectrum disorder (ASD) is crucial for providing appropriate treatments and parental guidance from an early age. Yet, ASD diagnosis is a lengthy process, in part due to the lack of reliable biomarkers. We recently applied RNA-sequencing of peripheral blood samples from 73 American and Israeli children with ASD and 26 neurotypically developing (NT) children to identify 10 genes with dysregulated blood expression levels in children with ASD. Machine learning (ML) analyzes data by computerized analytical model building and may be applied to building diagnostic tools based on the optimization of large datasets. Here, we present several ML-generated models, based on RNA expression datasets collected during our recently published RNA-seq study, as tentative tools for ASD diagnosis. Using the random forest classifier, two of our proposed models yield an accuracy of 82% in distinguishing children with ASD and NT children. Our proof-of-concept study requires refinement and independent validation by studies with far larger cohorts of children with ASD and NT children and should thus be perceived as starting point for building more accurate ML-based tools. Eventually, such tools may potentially provide an unbiased means to support the early diagnosis of ASD. Collapse Key Words RNA biomarkers autism spectrum disorder (ASD) blood RNA-sequencing machine learning Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
32	Deep learning predicts the impact of regulatory variants on cell-type-specific enhancers in the brain. BIOINFORMATICS ADVANCES 2023;3:vbad002. [PMID: 36726730 PMCID: PMC9887460 DOI: 10.1093/bioadv/vbad002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 11/11/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023] Abstract Motivation Previous studies have shown that the heritability of multiple brain-related traits and disorders is highly enriched in transcriptional enhancer regions. However, these regions often contain many individual variants, while only a subset of them are likely to causally contribute to a trait. Statistical fine-mapping techniques can identify putative causal variants, but their resolution is often limited, especially in regions with multiple variants in high linkage disequilibrium. In these cases, alternative computational methods to estimate the impact of individual variants can aid in variant prioritization. Results Here, we develop a deep learning pipeline to predict cell-type-specific enhancer activity directly from genomic sequences and quantify the impact of individual genetic variants in these regions. We show that the variants highlighted by our deep learning models are targeted by purifying selection in the human population, likely indicating a functional role. We integrate our deep learning predictions with statistical fine-mapping results for 8 brain-related traits, identifying 63 distinct candidate causal variants predicted to contribute to these traits by modulating enhancer activity, representing 6% of all genome-wide association study signals analyzed. Overall, our study provides a valuable computational method that can prioritize individual variants based on their estimated regulatory impact, but also highlights the limitations of existing methods for variant prioritization and fine-mapping. Availability and implementation The data underlying this article, nucleotide-level importance scores, and code for running the deep learning pipeline are available at https://github.com/Pandaman-Ryan/AgentBind-brain. Contact mgymrek@ucsd.edu. Supplementary information Supplementary data are available at Bioinformatics Advances online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
33	Trends and features of autism spectrum disorder research using artificial intelligence techniques: a bibliometric approach. CURRENT PSYCHOLOGY 2022. [DOI: 10.1007/s12144-022-03977-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
34	Evaluating deep learning for predicting epigenomic profiles. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00570-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
35	Deep learning approaches for noncoding variant prioritization in neurodegenerative diseases. Front Aging Neurosci 2022;14:1027224. [PMID: 36466610 PMCID: PMC9716280 DOI: 10.3389/fnagi.2022.1027224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open Abstract Determining how noncoding genetic variants contribute to neurodegenerative dementias is fundamental to understanding disease pathogenesis, improving patient prognostication, and developing new clinical treatments. Next generation sequencing technologies have produced vast amounts of genomic data on cell type-specific transcription factor binding, gene expression, and three-dimensional chromatin interactions, with the promise of providing key insights into the biological mechanisms underlying disease. However, this data is highly complex, making it challenging for researchers to interpret, assimilate, and dissect. To this end, deep learning has emerged as a powerful tool for genome analysis that can capture the intricate patterns and dependencies within these large datasets. In this review, we organize and discuss the many unique model architectures, development philosophies, and interpretation methods that have emerged in the last few years with a focus on using deep learning to predict the impact of genetic variants on disease pathogenesis. We highlight both broadly-applicable genomic deep learning methods that can be fine-tuned to disease-specific contexts as well as existing neurodegenerative disease research, with an emphasis on Alzheimer's-specific literature. We conclude with an overview of the future of the field at the intersection of neurodegeneration, genomics, and deep learning. Collapse Key Words gene regulation genomics machine learning neurodegeneration noncoding genetic variation Collapse MESH Headings Collapse Grants P01 AG073082 NIA NIH HHS Gladstone Institutes Collapse Affiliation(s) Collapse
36	Dissecting the multifaceted contribution of the mitochondrial genome to autism spectrum disorder. Front Genet 2022;13:953762. [PMID: 36419830 PMCID: PMC9676943 DOI: 10.3389/fgene.2022.953762] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 10/12/2022] [Indexed: 11/15/2023] Open Abstract Autism spectrum disorder (ASD) is a clinically heterogeneous class of neurodevelopmental conditions with a strong, albeit complex, genetic basis. The genetic architecture of ASD includes different genetic models, from monogenic transmission at one end, to polygenic risk given by thousands of common variants with small effects at the other end. The mitochondrial DNA (mtDNA) was also proposed as a genetic modifier for ASD, mostly focusing on maternal mtDNA, since the paternal mitogenome is not transmitted to offspring. We extensively studied the potential contribution of mtDNA in ASD pathogenesis and risk through deep next generation sequencing and quantitative PCR in a cohort of 98 families. While the maternally-inherited mtDNA did not seem to predispose to ASD, neither for haplogroups nor for the presence of pathogenic mutations, an unexpected influence of paternal mtDNA, apparently centered on haplogroup U, came from the Italian families extrapolated from the test cohort (n = 74) when compared to the control population. However, this result was not replicated in an independent Italian cohort of 127 families and it is likely due to the elevated paternal age at time of conception. In addition, ASD probands showed a reduced mtDNA content when compared to their unaffected siblings. Multivariable regression analyses indicated that variants with 15%-5% heteroplasmy in probands are associated to a greater severity of ASD based on ADOS-2 criteria, whereas paternal super-haplogroups H and JT were associated with milder phenotypes. In conclusion, our results suggest that the mtDNA impacts on ASD, significantly modifying the phenotypic expression in the Italian population. The unexpected finding of protection induced by paternal mitogenome in term of severity may derive from a role of mtDNA in influencing the accumulation of nuclear de novo mutations or epigenetic alterations in fathers' germinal cells, affecting the neurodevelopment in the offspring. This result remains preliminary and needs further confirmation in independent cohorts of larger size. If confirmed, it potentially opens a different perspective on how paternal non-inherited mtDNA may predispose or modulate other complex diseases. Collapse Key Words autism risk autism spectrum disorder mitochondrial DNA mitochondrial haplogroups universal heteroplasmy Collapse MESH Headings Collapse Grants Ministero della Salute Ministero dell'Università e della Ricerca National Human Genome Research Institute Collapse Affiliation(s) Collapse
37	Statistical and functional convergence of common and rare genetic influences on autism at chromosome 16p. Nat Genet 2022;54:1630-1639. [PMID: 36280734 PMCID: PMC9649437 DOI: 10.1038/s41588-022-01203-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 09/15/2022] [Indexed: 12/14/2022] Abstract The canonical paradigm for converting genetic association to mechanism involves iteratively mapping individual associations to the proximal genes through which they act. In contrast, in the present study we demonstrate the feasibility of extracting biological insights from a very large region of the genome and leverage this strategy to study the genetic influences on autism. Using a new statistical approach, we identified the 33-Mb p-arm of chromosome 16 (16p) as harboring the greatest excess of autism's common polygenic influences. The region also includes the mechanistically cryptic and autism-associated 16p11.2 copy number variant. Analysis of RNA-sequencing data revealed that both the common polygenic influences within 16p and the 16p11.2 deletion were associated with decreased average gene expression across 16p. The transcriptional effects of the rare deletion and diffuse common variation were correlated at the level of individual genes and analysis of Hi-C data revealed patterns of chromatin contact that may explain this transcriptional convergence. These results reflect a new approach for extracting biological insight from genetic association data and suggest convergence of common and rare genetic influences on autism at 16p. Collapse Key Words autism spectrum disorders gene expression genome-wide association studies epigenomics Collapse MESH Headings Humans Autistic Disorder/genetics DNA Copy Number Variations Chromosomes Chromosome Deletion Chromosomes, Human, Pair 16/genetics Collapse Grants R01 MH099134 NIMH NIH HHS T15 LM007092 NLM NIH HHS R01 MH100027 NIMH NIH HHS T32 GM144273 NIGMS NIH HHS R01 MH069359 NIMH NIH HHS R01 MH122412 NIMH NIH HHS R01 MH111813 NIMH NIH HHS F30 MH129009 NIMH NIH HHS T32 HG002295 NHGRI NIH HHS T32 GM007753 NIGMS NIH HHS R01 NS093200 NINDS NIH HHS R01 MH094400 NIMH NIH HHS R01 MH124851 NIMH NIH HHS R01 MH123619 NIMH NIH HHS R01 HD096326 NICHD NIH HHS U.S. Department of Health & Human Services \| NIH \| National Institute of Mental Health (NIMH) U.S. Department of Health & Human Services \| NIH \| U.S. National Library of Medicine (NLM) U.S. Department of Health & Human Services \| NIH \| National Institute of General Medical Sciences (NIGMS) Simons Foundation Autism Research Initiative (704413) U.S. Department of Health & Human Services \| NIH \| Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) U.S. Department of Health & Human Services \| NIH \| National Institute of Neurological Disorders and Stroke (NINDS) Collapse Affiliation(s) Collapse
38	Non-coding de novo mutations in chromatin interactions are implicated in autism spectrum disorder. Mol Psychiatry 2022;27:4680-4694. [PMID: 35840799 DOI: 10.1038/s41380-022-01697-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 06/27/2022] [Accepted: 07/01/2022] [Indexed: 12/14/2022] Abstract Three-dimensional chromatin interactions regulate gene expressions. The significance of de novo mutations (DNMs) in chromatin interactions remains poorly understood for autism spectrum disorder (ASD). We generated 813 whole-genome sequences from 242 Korean simplex families to detect DNMs, and identified target genes which were putatively affected by non-coding DNMs in chromatin interactions. Non-coding DNMs in chromatin interactions were significantly involved in transcriptional dysregulations related to ASD risk. Correspondingly, target genes showed spatiotemporal expressions relevant to ASD in developing brains and enrichment in biological pathways implicated in ASD, such as histone modification. Regarding clinical features of ASD, non-coding DNMs in chromatin interactions particularly contributed to low intelligence quotient levels in ASD probands. We further validated our findings using two replication cohorts, Simons Simplex Collection (SSC) and MSSNG, and showed the consistent enrichment of non-coding DNM-disrupted chromatin interactions in ASD probands. Generating human induced pluripotent stem cells in two ASD families, we were able to demonstrate that non-coding DNMs in chromatin interactions alter the expression of target genes at the stage of early neural development. Taken together, our findings indicate that non-coding DNMs in ASD probands lead to early neurodevelopmental disruption implicated in ASD risk via chromatin interactions. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
39	Genome-wide rare variant score associates with morphological subtypes of autism spectrum disorder. Nat Commun 2022;13:6463. [PMID: 36309498 PMCID: PMC9617891 DOI: 10.1038/s41467-022-34112-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 10/13/2022] [Indexed: 02/06/2023] Open Abstract Defining different genetic subtypes of autism spectrum disorder (ASD) can enable the prediction of developmental outcomes. Based on minor physical and major congenital anomalies, we categorize 325 Canadian children with ASD into dysmorphic and nondysmorphic subgroups. We develop a method for calculating a patient-level, genome-wide rare variant score (GRVS) from whole-genome sequencing (WGS) data. GRVS is a sum of the number of variants in morphology-associated coding and non-coding regions, weighted by their effect sizes. Probands with dysmorphic ASD have a significantly higher GRVS compared to those with nondysmorphic ASD (P = 0.03). Using the polygenic transmission disequilibrium test, we observe an over-transmission of ASD-associated common variants in nondysmorphic ASD probands (P = 2.9 × 10^-3). These findings replicate using WGS data from 442 ASD probands with accompanying morphology data from the Simons Simplex Collection. Our results provide support for an alternative genomic classification of ASD subgroups using morphology data, which may inform intervention protocols. Collapse Key Words genetics research medical genetics autism spectrum disorders Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
40	Evolution and dysfunction of human cognitive and social traits: A transcriptional regulation perspective. EVOLUTIONARY HUMAN SCIENCES 2022;4:e43. [PMID: 37588924 PMCID: PMC10426018 DOI: 10.1017/ehs.2022.42] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/11/2022] [Accepted: 09/11/2022] [Indexed: 11/07/2022] Open Abstract Evolutionary changes in brain and craniofacial development have endowed humans with unique cognitive and social skills, but also predisposed us to debilitating disorders in which these traits are disrupted. What are the developmental genetic underpinnings that connect the adaptive evolution of our cognition and sociality with the persistence of mental disorders with severe negative fitness effects? We argue that loss of function of genes involved in transcriptional regulation represents a crucial link between the evolution and dysfunction of human cognitive and social traits. The argument is based on the haploinsufficiency of many transcriptional regulator genes, which makes them particularly sensitive to loss-of-function mutations. We discuss how human brain and craniofacial traits evolved through partial loss of function (i.e. reduced expression) of these genes, a perspective compatible with the idea of human self-domestication. Moreover, we explain why selection against loss-of-function variants supports the view that mutation-selection-drift, rather than balancing selection, underlies the persistence of psychiatric disorders. Finally, we discuss testable predictions. Collapse Key Words haploinsufficiency human self-domestication loss of function neurodevelopmental disorders transcriptional regulation Collapse MESH Headings Collapse Grants Deutsche Forschungsgemeinschaft Collapse Affiliation(s) Collapse
41	Genetic regulation of RNA splicing in human pancreatic islets. Genome Biol 2022;23:196. [PMID: 36109769 PMCID: PMC9479353 DOI: 10.1186/s13059-022-02757-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 08/23/2022] [Indexed: 12/30/2022] Open Abstract BACKGROUND Non-coding genetic variants that influence gene transcription in pancreatic islets play a major role in the susceptibility to type 2 diabetes (T2D), and likely also contribute to type 1 diabetes (T1D) risk. For many loci, however, the mechanisms through which non-coding variants influence diabetes susceptibility are unknown. RESULTS We examine splicing QTLs (sQTLs) in pancreatic islets from 399 human donors and observe that common genetic variation has a widespread influence on the splicing of genes with established roles in islet biology and diabetes. In parallel, we profile expression QTLs (eQTLs) and use transcriptome-wide association as well as genetic co-localization studies to assign islet sQTLs or eQTLs to T2D and T1D susceptibility signals, many of which lack candidate effector genes. This analysis reveals biologically plausible mechanisms, including the association of T2D with an sQTL that creates a nonsense isoform in ERO1B, a regulator of ER-stress and proinsulin biosynthesis. The expanded list of T2D risk effector genes reveals overrepresented pathways, including regulators of G-protein-mediated cAMP production. The analysis of sQTLs also reveals candidate effector genes for T1D susceptibility such as DCLRE1B, a senescence regulator, and lncRNA MEG3. CONCLUSIONS These data expose widespread effects of common genetic variants on RNA splicing in pancreatic islets. The results support a role for splicing variation in diabetes susceptibility, and offer a new set of genetic targets with potential therapeutic benefit. Collapse Key Words Beta cells CTRB2 Diabetes pathophysiology G-protein signaling Pancreatic beta-cells Pancreatic islets Quantitative trait loci RNA splicing Senescence TWAS Type 1 diabetes Type 2 diabetes Collapse MESH Headings Diabetes Mellitus, Type 1/genetics Diabetes Mellitus, Type 1/metabolism Diabetes Mellitus, Type 2/genetics Exodeoxyribonucleases/genetics Exodeoxyribonucleases/metabolism Humans Islets of Langerhans/metabolism Proinsulin/genetics Proinsulin/metabolism Protein Isoforms/genetics RNA Splicing RNA, Long Noncoding/metabolism Collapse Grants MC_PC_17228 Medical Research Council WT101033 Wellcome Trust MR/L02036X/1 Medical Research Council Wellcome Trust 200837/Z/16/Z Wellcome Trust MC_QA137853 Medical Research Council H2020 European Research Council Horizon 2020 Framework Programme Ministerio de Ciencia e Innovación HORIZON EUROPE European Research Council HORIZON EUROPE Framework Programme H2020 Marie Skłodowska-Curie Actions Collapse Affiliation(s) Collapse
42	Correlation Analysis between Higher Education Level and College Students’ Public Mental Health Driven by AI. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:4204500. [PMID: 36131903 PMCID: PMC9484950 DOI: 10.1155/2022/4204500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 08/22/2022] [Accepted: 08/27/2022] [Indexed: 11/17/2022] Abstract Generally, there is a certain correlation between the level of higher education and the public mental health of college students. Traditionally, questionnaires and literature research methods are used to analyze the correlation between mental health and higher education, but these methods are always limited by many factors, such as resource conditions, survey paths, theoretical framework, and technical means. In recent years, with the rapid development and application of artificial intelligence technology, a new direction of analyzing the correlation between higher education level and college students’ public mental health has been given. The artificial intelligence method makes the correlation analysis change from subjective to big data algorithm evaluation, which can make up for the shortcomings and inefficiency of traditional methods, truly analyze the degree of correlation, and put forward exact solutions, which is of great significance for further evaluating and monitoring the public mental health of college students in higher education. This study first analyzes different AI algorithms and determines to use convolution neural network and random forest algorithm to establish an AI correlation model. After testing and data analysis, the established model has an accuracy of 87.5% in the determination and analysis of correlation. Compared with support vector machine (SVM) and backpropagation (BP) neural network algorithm, it has a higher recognition accuracy. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
43	The role of single-cell genomics in human genetics. J Med Genet 2022;59:827-839. [PMID: 35790352 PMCID: PMC9411920 DOI: 10.1136/jmedgenet-2022-108588] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 06/06/2022] [Indexed: 11/16/2022] Abstract Single-cell sequencing is a powerful approach that can detect genetic alterations and their phenotypic consequences in the context of human development, with cellular resolution. Humans start out as single-cell zygotes and undergo fission and differentiation to develop into multicellular organisms. Before fertilisation and during development, the cellular genome acquires hundreds of mutations that propagate down the cell lineage. Whether germline or somatic in nature, some of these mutations may have significant genotypic impact and lead to diseased cellular phenotypes, either systemically or confined to a tissue. Single-cell sequencing enables the detection and monitoring of the genotype and the consequent molecular phenotypes at a cellular resolution. It offers powerful tools to compare the cellular lineage between 'normal' and 'diseased' conditions and to establish genotype-phenotype relationships. By preserving cellular heterogeneity, single-cell sequencing, unlike bulk-sequencing, allows the detection of even small, diseased subpopulations of cells within an otherwise normal tissue. Indeed, the characterisation of biopsies with cellular resolution can provide a mechanistic view of the disease. While single-cell approaches are currently used mainly in basic research, it can be expected that applications of these technologies in the clinic may aid the detection, diagnosis and eventually the treatment of rare genetic diseases as well as cancer. This review article provides an overview of the single-cell sequencing technologies in the context of human genetics, with an aim to empower clinicians to understand and interpret the single-cell sequencing data and analyses. We discuss the state-of-the-art experimental and analytical workflows and highlight current challenges/limitations. Notably, we focus on two prospective applications of the technology in human genetics, namely the annotation of the non-coding genome using single-cell functional genomics and the use of single-cell sequencing data for in silico variant prioritisation. Collapse Key Words functional genomics genetic variation sequencing single cell variant annotation variant prioritization Collapse MESH Headings Genetic Variation Genomics Genotype Human Genetics Humans Phenotype Collapse Grants DLR Deutsches Zentrum für Luft- und Raumfahrt Max Planck Society Deutsche Forschungsgemeinschaft (DFG) Collapse Affiliation(s) Collapse
44	Medical Care of Rare and Undiagnosed Diseases: Prospects and Challenges. FUNDAMENTAL RESEARCH 2022. [DOI: 10.1016/j.fmre.2022.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
45	Translational pediatrics: clinical perspective for Phelan-McDermid syndrome and autism research. Pediatr Res 2022;92:373-377. [PMID: 34702975 DOI: 10.1038/s41390-021-01806-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 10/05/2021] [Indexed: 02/08/2023] Abstract Phelan-McDermid syndrome (PMS) is a rare genetic disorder presenting with developmental delay, epilepsy, and autism spectrum disorder (ASD). The segmental deletion of chromosome 22q13.3 affects the copy number of SHANK3, the gene encoding a scaffolding protein at the postsynaptic density. Biological studies indicate that SHANK3 plays crucial roles in the development of synaptic functions in the postnatal brain. Notably, induced pluripotent stem (iPS) cells have enabled researchers to develop brain organoids and microglia from patients and to explore the pathophysiology of neurodevelopmental disorders in human cells. Single-cell RNA sequencing of these cells revealed that human-specific genes are uniquely expressed during cortical development. Thus, patient-derived disease models are expected to identify as-yet-unidentified functions of SHANK3 in the development of human brain. These efforts may help establish a new style of translational research in pediatrics, which is expected to provide therapeutic insight for children with PMS and broader categories of disease. IMPACT: Phelan-McDermid syndrome is a prototypic model for molecular studies of autism spectrum disorder. Brain organoids are expected to provide therapeutic insight. Single-cell RNA sequencing of microglia may uncover the functional roles of human-specific genes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
46	Triage and priority-based healthcare diagnosis using artificial intelligence for autism spectrum disorder and gene contribution: A systematic review. Comput Biol Med 2022;146:105553. [PMID: 35561591 DOI: 10.1016/j.compbiomed.2022.105553] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 04/03/2022] [Accepted: 04/20/2022] [Indexed: 11/03/2022] Abstract The exact nature, harmful effects and aetiology of autism spectrum disorder (ASD) have caused widespread confusion. Artificial intelligence (AI) science helps solve challenging diagnostic problems in the medical field through extensive experiments. Disease severity is closely related to triage decisions and prioritisation contexts in medicine because both have been widely used to diagnose various diseases via AI, machine learning and automated decision-making techniques. Recently, taking advantage of high-performance AI algorithms has achieved accessible success in diagnosing and predicting risks from clinical and biological data. In contrast, less progress has been made with ASD because of obscure reasons. According to academic literature, ASD diagnosis works from a specific perspective, and much of the confusion arises from the fact that how AI techniques are currently integrated with the diagnosis of ASD concerning the triage and priority strategies and gene contributions. To this end, this study sought to describe a systematic review of the literature to assess the respective AI methods using the available datasets, highlight the tools and strategies used for diagnosing ASD and investigate how AI trends contribute in distinguishing triage and priority for ASD and gene contributions. Accordingly, this study checked the Science Direct, IEEE Xplore Digital Library, Web of Science (WoS), PubMed, and Scopus databases. A set of 363 articles from 2017 to 2022 is collected to reveal a clear picture and a better understanding of all the academic literature through a final set of 18 articles. The retrieved articles were filtered according to the defined inclusion and exclusion criteria and classified into three categories. The first category includes 'Triage patients based on diagnosis methods' which accounts for 16.66% (n = 3/18). The second category includes 'Prioritisation for Risky Genes' which accounts for 66.6% (n = 12/18) and is classified into two subcategories: 'Mutations observation based', 'Biomarkers and toxic chemical observations'. The third category includes 'E-triage using telehealth' which accounts for 16.66% (n = 3/18). This multidisciplinary systematic review revealed the taxonomy, motivations, recommendations and challenges of ASD research that need synergistic attention. Thus, this systematic review performs a comprehensive science mapping analysis and discusses the open issues that help perform and improve the recommended solution of ASD research direction. In addition, this study critically reviews the literature and attempts to address the current research gaps in knowledge and highlights weaknesses that require further research. Finally, a new developed methodology has been suggested as future work for triaging and prioritising ASD patients according to their severity levels by using decision-making techniques. Collapse Key Words ASD Artificial intelligence Autism Diagnosis Machine learning Telemedicine Triage and priority Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
47	Association of mitochondrial DNA content, heteroplasmies and inter-generational transmission with autism. Nat Commun 2022;13:3790. [PMID: 35778412 PMCID: PMC9249801 DOI: 10.1038/s41467-022-30805-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 05/19/2022] [Indexed: 12/30/2022] Open Abstract Mitochondria are essential for brain development. While previous studies linked dysfunctional mitochondria with autism spectrum disorder (ASD), the role of the mitochondrial genome (mtDNA) in ASD risk is largely unexplored. This study investigates the association of mtDNA heteroplasmies (co-existence of mutated and unmutated mtDNA) and content with ASD, as well as its inter-generational transmission and sex differences among two independent samples: a family-based study (n = 1,938 families with parents, probands and sibling controls) and a prospective birth cohort (n = 997 mother-child pairs). In both samples, predicted pathogenic (PP) heteroplasmies in children are associated with ASD risk (Meta-OR = 1.56, P = 0.00068). Inter-generational transmission of mtDNA reveals attenuated effects of purifying selection on maternal heteroplasmies in children with ASD relative to controls, particularly among males. Among children with ASD and PP heteroplasmies, increased mtDNA content shows benefits for cognition, communication, and behaviors (P ≤ 0.02). These results underscore the value of exploring maternal and newborn mtDNA in ASD. Most genetic studies of autism spectrum disorder (ASD) have focused on the nuclear genome. Here, the authors show that variations in mitochondrial DNA, detectable at birth, are also associated with risk of ASD. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
48	Germline mosaicism of a missense variant in KCNC2 in a multiplex family with autism and epilepsy characterized by long-read sequencing. Am J Med Genet A 2022;188:2071-2081. [PMID: 35366058 PMCID: PMC9197999 DOI: 10.1002/ajmg.a.62743] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 02/04/2022] [Accepted: 02/18/2022] [Indexed: 02/06/2023] Abstract Currently, protein-coding de novo variants and large copy number variants have been identified as important for ~30% of individuals with autism. One approach to identify relevant variation in individuals who lack these types of events is by utilizing newer genomic technologies. In this study, highly accurate PacBio HiFi long-read sequencing was applied to a family with autism, epileptic encephalopathy, cognitive impairment, and mild dysmorphic features (two affected female siblings, unaffected parents, and one unaffected male sibling) with no known clinical variant. From our long-read sequencing data, a de novo missense variant in the KCNC2 gene (encodes Kv3.2) was identified in both affected children. This variant was phased to the paternal chromosome of origin and is likely a germline mosaic. In silico assessment revealed the variant was not in controls, highly conserved, and predicted damaging. This specific missense variant (Val473Ala) has been shown in both an ortholog and paralog of Kv3.2 to accelerate current decay, shift the voltage dependence of activation, and prevent the channel from entering a long-lasting open state. Seven additional missense variants have been identified in other individuals with neurodevelopmental disorders (p = 1.03 × 10^-5 ). KCNC2 is most highly expressed in the brain; in particular, in the thalamus and is enriched in GABAergic neurons. Long-read sequencing was useful in discovering the relevant variant in this family with autism that had remained a mystery for several years and will potentially have great benefits in the clinic once it is widely available. Collapse Key Words autism channel epilepsy genetics genomics long-read sequencing Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
49	Genomics enters the deep learning era. PeerJ 2022;10:e13613. [PMID: 35769139 PMCID: PMC9235815 DOI: 10.7717/peerj.13613] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/30/2022] [Indexed: 01/17/2023] Open Abstract The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences. Collapse Key Words Bioinformatics Deep learning Epigenomics Genetics Genomics Metagenomics Neural networks Personalized medecine Review Synthetic genomes Collapse MESH Headings Deep Learning Neural Networks, Computer Genomics Computational Biology Collapse Grants Collapse Affiliation(s) Collapse
50	The first complete human genome. Nature 2022;606:468-469. [PMID: 35606432 DOI: 10.1038/d41586-022-01368-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract Collapse Key Words Genetics Genomics Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse