1
|
Li K, Chen Y, Sheng Y, Tang D, Cao Y, He X. Defects in mRNA splicing and implications for infertility: a comprehensive review and in silico analysis. Hum Reprod Update 2025; 31:218-239. [PMID: 39953708 DOI: 10.1093/humupd/dmae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 11/25/2024] [Indexed: 02/17/2025] Open
Abstract
BACKGROUND mRNA splicing is a fundamental process in the reproductive system, playing a pivotal role in reproductive development and endocrine function, and ensuring the proper execution of meiosis, mitosis, and gamete function. Trans-acting factors and cis-acting elements are key players in mRNA splicing whose dysfunction can potentially lead to male and female infertility. Although hundreds of trans-acting factors have been implicated in mRNA splicing, the mechanisms by which these factors influence reproductive processes are fully understood for only a subset. Furthermore, the clinical impact of variations in cis-acting elements on human infertility has not been comprehensively characterized, leading to probable omissions of pathogenic variants in standard genetic analyses. OBJECTIVE AND RATIONALE This review aimed to summarize our current understanding of the factors involved in mRNA splicing regulation and their association with infertility disorders. We introduced methods for prioritizing and functionally validating splicing variants associated with human infertility. Additionally, we explored corresponding abnormal splicing therapies that could potentially provide insight into treating human infertility. SEARCH METHODS Systematic literature searches of human and model organisms were performed in the PubMed database between May 1977 and July 2024. To identify mRNA splicing-related genes and pathogenic variants in infertility, the search terms 'splice', 'splicing', 'variant', and 'mutation' were combined with azoospermia, oligozoospermia, asthenozoospermia, multiple morphological abnormalities of the sperm flagella, acephalic spermatozoa, disorders of sex development, early embryonic arrest, reproductive endocrine disorders, oocyte maturation arrest, premature ovarian failure, primary ovarian insufficiency, zona pellucida, fertilization defects, infertile, fertile, infertility, fertility, reproduction, and reproductive. OUTCOMES Our search identified 5014 publications, of which 291 were included in the final analysis. This review provided a comprehensive overview of the biological mechanisms of mRNA splicing, with a focus on the roles of trans-acting factors and cis-acting elements. We highlighted the disruption of 52 trans-acting proteins involved in spliceosome assembly and catalytic activity and recognized splicing regulatory regions and epigenetic regulation associated with infertility. The 73 functionally validated splicing variants in the cis-acting elements of 54 genes have been reported in 20 types of human infertility; 27 of them were located outside the canonical splice sites and potentially overlooked in standard genetic analysis due to likely benign or of uncertain significance. The in silico prediction of splicing can prioritize potential splicing abnormalities that may be true pathogenic mechanisms. We also summarize the methods for prioritizing splicing variants and strategies for functional validation and review splicing therapy approaches for other diseases, providing a reference for abnormal reproduction treatment. WIDER IMPLICATIONS Our comprehensive review of trans-acting factors and cis-acting elements in mRNA splicing will further promote a more thorough understanding of reproductive regulatory processes, leading to improved pathogenic variant identification and potential treatments for human infertility. REGISTRATION NUMBER N/A.
Collapse
Affiliation(s)
- Kuokuo Li
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Engineering Research Center of Biopreservation and Artificial Organs, Ministry of Education, Hefei, Anhui, China
| | - Yuge Chen
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Engineering Research Center of Biopreservation and Artificial Organs, Ministry of Education, Hefei, Anhui, China
| | - Yuying Sheng
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Engineering Research Center of Biopreservation and Artificial Organs, Ministry of Education, Hefei, Anhui, China
| | - Dongdong Tang
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Engineering Research Center of Biopreservation and Artificial Organs, Ministry of Education, Hefei, Anhui, China
| | - Yunxia Cao
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
- NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei, Anhui, China
- Engineering Research Center of Biopreservation and Artificial Organs, Ministry of Education, Hefei, Anhui, China
| | - Xiaojin He
- Department of Obstetrics and Gynecology, Reproductive Medicine Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
2
|
Charles M, Gaiani N, Sanchez MP, Boussaha M, Hozé C, Boichard D, Rocha D, Boulling A. Functional impact of splicing variants in the elaboration of complex traits in cattle. Nat Commun 2025; 16:3893. [PMID: 40274775 PMCID: PMC12022281 DOI: 10.1038/s41467-025-58970-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 04/04/2025] [Indexed: 04/26/2025] Open
Abstract
GWAS conducted directly on imputed whole genome sequence have led to the identification of numerous genetic variants associated with agronomic traits in cattle. However, such variants are often simply markers in linkage disequilibrium with the actual causal variants, which is a limiting factor for the development of accurate genomic predictions. It is possible to identify causal variants by integrating information on how variants impact gene expression into GWAS output. RNA splicing plays a major role in regulating gene expression. Thus, assessing the effect of variants on RNA splicing may explain their function. Here, we use a high-throughput strategy to functionally analyse putative splice-disrupting variants in the bovine genome. Using GWAS, massively parallel reporter assay and deep learning algorithms designed to predict splice-disrupting variants, we identify 38 splice-disrupting variants associated with complex traits in cattle, three of which could be classified as causal. Our results indicate that splice-disrupting variants are widely found in the quantitative trait loci related to these phenotypes. Using our combined approach, we also assess the validity of splicing predictors originally developed to analyse human variants in the context of the bovine genome.
Collapse
Affiliation(s)
- Mathieu Charles
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
- INRAE, SIGENAE, 78350, Jouy-en-Josas, France
| | - Nicolas Gaiani
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Marie-Pierre Sanchez
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Mekki Boussaha
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Chris Hozé
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
- ELIANCE, 75012, Paris, France
| | - Didier Boichard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Dominique Rocha
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Arnaud Boulling
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France.
| |
Collapse
|
3
|
Sullivan PJ, Quinn JMW, Ajuyah P, Pinese M, Davis RL, Cowley MJ. Data-driven insights to inform splice-altering variant assessment. Am J Hum Genet 2025; 112:764-778. [PMID: 40056912 DOI: 10.1016/j.ajhg.2025.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 02/10/2025] [Accepted: 02/11/2025] [Indexed: 04/06/2025] Open
Abstract
Disease-causing genetic variants often disrupt mRNA splicing, an intricate process that is incompletely understood. Thus, accurate inference of which genetic variants will affect splicing and what their functional consequences will be is challenging, particularly for variants outside of the essential splice sites. Here, we describe a set of data-driven heuristics that inform the interpretation of human splice-altering variants (SAVs) based on the analysis of annotated exons, experimentally validated SAVs, and the currently understood principles of splicing biology. We defined requisite splicing criteria by examining around 202,000 canonical protein-coding exons and 19,000 experimentally validated splicing branchpoints. This analysis defined the sequence, spacing, and motif strength required for splicing, with 95.9% of the exons examined meeting these criteria. By considering over 12,000 experimentally validated variants from the SpliceVarDB, we defined a set of heuristics that inform the evaluation of putative SAVs. To ensure the applicability of each heuristic, only those supported by at least 10 experimentally validated variants were considered. This allowed us to establish a measure of spliceogenicity: the proportion of variants at a location (or motif site) that affected splicing in a given context. This study makes considerable advances toward bridging the gap between computational predictions and the biological process of splicing, offering an evidence-based approach to identifying SAVs and evaluating their impact. Our splicing heuristics enhance the current framework for genetic variant evaluation with a robust, detailed, and comprehensible analysis by adding valuable context over traditional binary prediction tools.
Collapse
Affiliation(s)
- Patricia J Sullivan
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia; University of New South Wales Centre for Childhood Cancer Research, UNSW Sydney, Sydney, NSW, Australia
| | - Julian M W Quinn
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Pamela Ajuyah
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Pinese
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia
| | - Ryan L Davis
- Neurogenetics Research Group, Kolling Institute, University of Sydney and Northern Sydney Local Health District, St. Leonards, NSW, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Mark J Cowley
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia.
| |
Collapse
|
4
|
Zhang K, Wang Y, Jiang S, Li Y, Xiang P, Zhang Y, Chen Y, Chen M, Su W, Liu L, Li S. dsDAP: An efficient method for high-abundance DNA-encoded library construction in mammalian cells. Int J Biol Macromol 2025; 298:140089. [PMID: 39842606 DOI: 10.1016/j.ijbiomac.2025.140089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/14/2025] [Accepted: 01/17/2025] [Indexed: 01/24/2025]
Abstract
DNA-encoded libraries are invaluable tools for high-throughput screening and functional genomics studies. However, constructing high-abundance libraries in mammalian cells remains challenging. Here, we present dsDNA-assembly-PCR (dsDAP), a novel Gibson-assembly-PCR strategy for creating DNA-encoded libraries, offering improved flexibility and efficiency over previous methods. We demonstrated this approach by investigating the impact of translation initiation sequences (TIS) on protein expression in HEK293T cells. Both CRISPR-Cas9 and piggyBac systems were employed for genomic integration, allowing comparison of different integration methods. Our results confirmed the importance of specific nucleotides in the TIS region, particularly the preference for adenine at the -3 position in high-expression sequences. We also explored the effects of library dilution on genotype-phenotype correlations. This Gibson-assembly-PCR strategy overcomes limitations of existing methods, such as restriction enzyme dependencies, and provides a versatile tool for constructing high-abundance libraries in mammalian cells. Our approach has broad applications in functional genomics, drug discovery, and the study of gene regulation.
Collapse
Affiliation(s)
- Kaili Zhang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yi Wang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Shuze Jiang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yifan Li
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Pan Xiang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yuxuan Zhang
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yongzi Chen
- Department of Tumor Cell Biology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Min Chen
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China
| | - Weijun Su
- School of Medicine, Nankai University, Tianjin 300071, China
| | - Liren Liu
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China.
| | - Shuai Li
- Department of Molecular Pharmacology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China.
| |
Collapse
|
5
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2025; 26:171-190. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
6
|
Ramaker ME, Abdulrahim JW, Corey KM, Ramaker RC, Kwee LC, Kraus WE, Shah SH. Cardiovascular Disease Pathogenicity Predictor (CVD-PP): A Tissue-Specific In Silico Tool for Discriminating Pathogenicity of Variants of Unknown Significance in Cardiovascular Disease Genes. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2024; 17:e004464. [PMID: 39469763 DOI: 10.1161/circgen.123.004464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 09/05/2024] [Indexed: 10/30/2024]
Abstract
BACKGROUND Interpretation of variants of uncertain significance (VUSs) remains a challenge in the care of patients with inherited cardiovascular diseases (CVDs); 56% of variants within CVD risk genes are VUS, and machine learning algorithms trained upon large data resources can stratify VUS into higher versus lower probability of contributing to a CVD phenotype. METHODS We used ClinVar pathogenic/likely pathogenic and benign/likely benign variants from 47 CVD genes to build a predictive model of variant pathogenicity utilizing measures of evolutionary constraint, deleteriousness, splicogenicity, local pathogenicity, cardiac-specific expression, and population allele frequency. Performance was validated using variants for which the ClinVar pathogenicity assignment changed. Functional validation was assessed using prior studies in >900 identified VUS. The model utility was demonstrated using the Catheterization Genetics (CATHGEN) cohort. RESULTS We identified a top-ranked model that accurately prioritized variants for which ClinVar clinical significance had changed (n=663; precision-recall area under the curve, 0.97) and performed well compared with conventional in silico methods. This model (CVD pathogenicity predictor [CVD-PP]) also had high accuracy in prioritizing VUS with functional effects in vivo (precision-recall area under the curve, 0.58). In CATHGEN, there was a greater burden of higher CVD-PP scored VUS in individuals with dilated cardiomyopathy compared with controls (P=8.2×10-15). Of individuals in CATHGEN who harbored highly ranked CVD pathogenicity predictor VUS meeting clinical pathogenicity criteria, 27.6% had clinical evidence of disease. Variant prioritization using this model increased genetic diagnosis in CATHGEN participants with a known clinical diagnosis of hypertrophic cardiomyopathy (7.8%-27.2%). CONCLUSIONS We present a cardiac-specific model for prioritizing variants underlying CVD syndromes with high performance in discriminating the pathogenicity of VUS in CVD genes. Variant review and phenotyping of individuals carrying VUS of pathogenic interest support the clinical utility of this model. This model could also have utility in filtering variants as part of large-scale genomic sequencing studies.
Collapse
Affiliation(s)
- Megan E Ramaker
- Duke Molecular Physiology Institute, Duke University, Durham, NC (M.E.R., J.W.A., K.M.C., L.C.K., W.E.K., S.H.S.)
- Duke Center for Precision Health, Duke Clinical and Translational Science Institute (M.E.R., L.C.K., S.H.S.)
| | - Jawan W Abdulrahim
- Duke Molecular Physiology Institute, Duke University, Durham, NC (M.E.R., J.W.A., K.M.C., L.C.K., W.E.K., S.H.S.)
| | - Kristin M Corey
- Duke Molecular Physiology Institute, Duke University, Durham, NC (M.E.R., J.W.A., K.M.C., L.C.K., W.E.K., S.H.S.)
| | - Ryne C Ramaker
- Division of Medical Oncology, Department of Medicine, Duke Cancer Institute, Duke University, Durham, NC (R.C.R.)
| | - Lydia Coulter Kwee
- Duke Molecular Physiology Institute, Duke University, Durham, NC (M.E.R., J.W.A., K.M.C., L.C.K., W.E.K., S.H.S.)
- Duke Center for Precision Health, Duke Clinical and Translational Science Institute (M.E.R., L.C.K., S.H.S.)
| | - William E Kraus
- Duke Molecular Physiology Institute, Duke University, Durham, NC (M.E.R., J.W.A., K.M.C., L.C.K., W.E.K., S.H.S.)
- Division of Cardiology, Department of Medicine (W.E.K., S.H.S.)
| | - Svati H Shah
- Duke Molecular Physiology Institute, Duke University, Durham, NC (M.E.R., J.W.A., K.M.C., L.C.K., W.E.K., S.H.S.)
- Division of Cardiology, Department of Medicine (W.E.K., S.H.S.)
- Duke Center for Precision Health, Duke Clinical and Translational Science Institute (M.E.R., L.C.K., S.H.S.)
| |
Collapse
|
7
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
8
|
Sullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ. SpliceVarDB: A comprehensive database of experimentally validated human splicing variants. Am J Hum Genet 2024; 111:2164-2175. [PMID: 39226898 PMCID: PMC11480807 DOI: 10.1016/j.ajhg.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 09/05/2024] Open
Abstract
Variants that alter gene splicing are estimated to comprise up to a third of all disease-causing variants, yet they are hard to predict from DNA sequencing data alone. To overcome this, many groups are incorporating RNA-based analyses, which are resource intensive, particularly for diagnostic laboratories. There are thousands of functionally validated variants that induce mis-splicing; however, this information is not consolidated, and they are under-represented in ClinVar, which presents a barrier to variant interpretation and can result in duplication of validation efforts. To address this issue, we developed SpliceVarDB, an online database consolidating over 50,000 variants assayed for their effects on splicing in over 8,000 human genes. We evaluated over 500 published data sources and established a spliceogenicity scale to standardize, harmonize, and consolidate variant validation data generated by a range of experimental protocols. According to the strength of their supporting evidence, variants were classified as "splice-altering" (∼25%), "not splice-altering" (∼25%), and "low-frequency splice-altering" (∼50%), which correspond to weak or indeterminate evidence of spliceogenicity. Importantly, 55% of the splice-altering variants in SpliceVarDB are outside the canonical splice sites (5.6% are deep intronic). These variants can support the variant curation diagnostic pathway and can be used to provide the high-quality data necessary to develop more accurate in silico splicing predictors. The variants are accessible through an online platform, SpliceVarDB, with additional features for visualization, variant information, in silico predictions, and validation metrics. SpliceVarDB is a very large collection of splice-altering variants and is available at https://splicevardb.org.
Collapse
Affiliation(s)
- Patricia J Sullivan
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia; UNSW Centre for Childhood Cancer Research, UNSW Sydney, Sydney, NSW, Australia
| | - Julian M W Quinn
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Weilin Wu
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Mark Pinese
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia
| | - Mark J Cowley
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia.
| |
Collapse
|
9
|
Aucouturier C, Soirat N, Castéra L, Bertrand D, Atkinson A, Lavolé T, Goardon N, Quesnelle C, Levilly J, Barbachou S, Legros A, Caron O, Crivelli L, Denizeau P, Berthet P, Ricou A, Boulouard F, Vaur D, Krieger S, Leman R. Fine mapping of RNA isoform diversity using an innovative targeted long-read RNA sequencing protocol with novel dedicated bioinformatics pipeline. BMC Genomics 2024; 25:909. [PMID: 39350015 PMCID: PMC11440762 DOI: 10.1186/s12864-024-10741-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/28/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND Solving the structure of mRNA transcripts is a major challenge for both research and molecular diagnostic purposes. Current approaches based on short-read RNA sequencing and RT-PCR techniques cannot fully explore the complexity of transcript structure. The emergence of third-generation long-read sequencing addresses this problem by solving this sequence directly. However, genes with low expression levels are difficult to study with the whole transcriptome sequencing approach. To fix this technical limitation, we propose a novel method to capture transcripts of a gene panel using a targeted enrichment approach suitable for Pacific Biosciences and Oxford Nanopore Technologies platforms. RESULTS We designed a set of probes to capture transcripts of a panel of genes involved in hereditary breast and ovarian cancer syndrome. We present SOSTAR (iSofOrmS annoTAtoR), a versatile pipeline to assemble, quantify and annotate isoforms from long read sequencing using a new tool specially designed for this application. The significant enrichment of transcripts by our capture protocol, together with the SOSTAR annotation, allowed the identification of 1,231 unique transcripts within the gene panel from the eight patients sequenced. The structure of these transcripts was annotated with a resolution of one base relative to a reference transcript. All major alternative splicing events of the BRCA1 and BRCA2 genes described in the literature were found. Complex splicing events such as pseudoexons were correctly annotated. SOSTAR enabled the identification of abnormal transcripts in the positive controls. In addition, a case of unexplained inheritance in a family with a history of breast and ovarian cancer was solved by identifying an SVA retrotransposon in intron 13 of the BRCA1 gene. CONCLUSIONS We have validated a new protocol for the enrichment of transcripts of interest using probes adapted to the ONT and PacBio platforms. This protocol allows a complete description of the alternative structures of transcripts, the estimation of their expression and the identification of aberrant transcripts in a single experiment. This proof-of-concept opens new possibilities for RNA structure exploration in both research and molecular diagnostics.
Collapse
Affiliation(s)
- Camille Aucouturier
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
- Normandie Univ, UNICAEN, Caen, 14000, France
| | - Nicolas Soirat
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
- SeqOne Genomics, Montpellier, 34000, France
| | - Laurent Castéra
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
| | | | - Alexandre Atkinson
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Thibaut Lavolé
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Nicolas Goardon
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
| | - Céline Quesnelle
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Julien Levilly
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Sosthène Barbachou
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Angelina Legros
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Olivier Caron
- Département Médecine Oncologique, Institut Gustave Roussy, Villejuif, France
| | - Louise Crivelli
- Service d'Oncogénétique, Centre Eugène Marquis, Rennes, France
| | - Philippe Denizeau
- Service de génétique clinique, Centre Hospitalier Universitaire Rennes, Rennes, France
| | - Pascaline Berthet
- Service d'Oncogénétique, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
| | - Agathe Ricou
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
| | - Flavie Boulouard
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
| | - Dominique Vaur
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
| | - Sophie Krieger
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France
- Normandie Univ, UNICAEN, Caen, 14000, France
| | - Raphael Leman
- Laboratoire de biologie et de génétique du cancer, Département de Biopathologie, Centre François Baclesse, Caen, 14000, France.
- Cancer and Brain Genomics, FHU G4 Genomics, Inserm U1245, Normandie University, Rouen, 76183, France.
| |
Collapse
|
10
|
O'Neill MJ, Yang T, Laudeman J, Calandranis ME, Harvey ML, Solus JF, Roden DM, Glazer AM. ParSE-seq: a calibrated multiplexed assay to facilitate the clinical classification of putative splice-altering variants. Nat Commun 2024; 15:8320. [PMID: 39333091 PMCID: PMC11437130 DOI: 10.1038/s41467-024-52474-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 09/10/2024] [Indexed: 09/29/2024] Open
Abstract
Interpreting the clinical significance of putative splice-altering variants outside canonical splice sites remains difficult without time-intensive experimental studies. To address this, we introduce Parallel Splice Effect Sequencing (ParSE-seq), a multiplexed assay to quantify variant effects on RNA splicing. We first apply this technique to study hundreds of variants in the arrhythmia-associated gene SCN5A. Variants are studied in 'minigene' plasmids with molecular barcodes to allow pooled variant effect quantification. We perform experiments in two cell types, including disease-relevant induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs). The assay strongly separates known control variants from ClinVar, enabling quantitative calibration of the ParSE-seq assay. Using these evidence strengths and experimental data, we reclassify 29 of 34 variants with conflicting interpretations and 11 of 42 variants of uncertain significance. In addition to intronic variants, we show that many synonymous and missense variants disrupted RNA splicing. Two splice-altering variants in the assay also disrupt splicing and sodium current when introduced into iPSC-CMs by CRISPR-Cas9 editing. ParSE-seq provides high-throughput experimental data for RNA-splicing to support precision medicine efforts and can be readily adopted to study other loss-of-function genotype-phenotype relationships.
Collapse
Affiliation(s)
| | - Tao Yang
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Julie Laudeman
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Maria E Calandranis
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - M Lorena Harvey
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joseph F Solus
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan M Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Andrew M Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
11
|
Xu C, Bao S, Wang Y, Li W, Chen H, Shen Y, Jiang T, Zhang C. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. Genome Res 2024; 34:1052-1065. [PMID: 39060028 PMCID: PMC11368187 DOI: 10.1101/gr.279044.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 07/18/2024] [Indexed: 07/28/2024]
Abstract
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes, and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ∼15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders (NDDs), including 19 genes with recurrent splicing-altering mutations. Integration of splicing-altering mutations with other types of de novo mutation burdens allowed the prediction of eight novel NDD-risk genes. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
Collapse
Affiliation(s)
- Chencheng Xu
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Suying Bao
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | - Ye Wang
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | - Wenxing Li
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Tao Jiang
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA
| | - Chaolin Zhang
- Department of Systems Biology, Columbia University, New York, New York 10032, USA;
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| |
Collapse
|
12
|
Yuan J, Shao Z, Lv M, Li K, Wei Z. Identification of deleterious variants in nine polycystic kidney disease affected families. Gene 2024; 919:148505. [PMID: 38670396 DOI: 10.1016/j.gene.2024.148505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/01/2024] [Accepted: 04/23/2024] [Indexed: 04/28/2024]
Abstract
Polycystic kidney disease (PKD) is common genetic renal disorder. In present study, we performed WES to identify pathogenic variant in nine families including 26 patients with PKD and 19 unaffected members. The eight pathogenic variants were identified in known PKD associated genes including PKD1 (n = 6), PKD2 (n = 1), and OFD1 (n = 1) in eight families. There is one missense, one stopgain, two non-frameshifts, two canonical splicing variants, three frameshift variants and one potential non-canonical splicing variant (NCSV) in 8 families. The six variants were novel variants and not reported in ClinVar database. In addition, the compound heterozygous variants in PKHD1 were identified including one frameshift variants (PKHD1: NM_138694.4, c.9841del, p.S3281Lfs*4) and one non-canonical splicing variant (PKHD1: NM_138694.4, c.6332 + 40A > G) which were defined as deleterious variant by four splicing prediction tools (CADD-splice, SpliceAI, Spliceogen, Squirl). We used the minigene method to validate whether the prioritized potential NSCVs disrupt the typical mRNA splicing process and found abnormally larger PCR production of minigene carrying potential NCSV comparing to wild-type minigene. Sanger sequencing confirmed the 39-bp insertion of intron 38 between exon 38 and exon 39, which results in non-frameshift and 13 amino acid insertions. In conclusion, our study expands the variant spectrum and highlight the important role of non-canonical splicing variant in PKD.
Collapse
Affiliation(s)
- Jing Yuan
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei 230032, Anhui, China
| | - Zhongmei Shao
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei 230032, Anhui, China
| | - Mingrong Lv
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei 230032, Anhui, China
| | - Kuokuo Li
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei 230032, Anhui, China.
| | - Zhaolian Wei
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei 230032, Anhui, China.
| |
Collapse
|
13
|
Ishikawa T, Masuda T, Hachiya T, Dina C, Simonet F, Nagata Y, Tanck MWT, Sonehara K, Glinge C, Tadros R, Khongphatthanayothin A, Lu TP, Higuchi C, Nakajima T, Hayashi K, Aizawa Y, Nakano Y, Nogami A, Morita H, Ohno S, Aiba T, Krijger Juárez C, Mauleekoonphairoj J, Poovorawan Y, Gourraud JB, Shimizu W, Probst V, Horie M, Wilde AAM, Redon R, Juang JMJ, Nademanee K, Bezzina CR, Barc J, Tanaka T, Okada Y, Schott JJ, Makita N. Brugada syndrome in Japan and Europe: a genome-wide association study reveals shared genetic architecture and new risk loci. Eur Heart J 2024; 45:2320-2332. [PMID: 38747976 DOI: 10.1093/eurheartj/ehae251] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 03/02/2024] [Accepted: 04/08/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND AND AIMS Brugada syndrome (BrS) is an inherited arrhythmia with a higher disease prevalence and more lethal arrhythmic events in Asians than in Europeans. Genome-wide association studies (GWAS) have revealed its polygenic architecture mainly in European populations. The aim of this study was to identify novel BrS-associated loci and to compare allelic effects across ancestries. METHODS A GWAS was conducted in Japanese participants, involving 940 cases and 1634 controls, followed by a cross-ancestry meta-analysis of Japanese and European GWAS (total of 3760 cases and 11 635 controls). The novel loci were characterized by fine-mapping, gene expression, and splicing quantitative trait associations in the human heart. RESULTS The Japanese-specific GWAS identified one novel locus near ZSCAN20 (P = 1.0 × 10-8), and the cross-ancestry meta-analysis identified 17 association signals, including six novel loci. The effect directions of the 17 lead variants were consistent (94.1%; P for sign test = 2.7 × 10-4), and their allelic effects were highly correlated across ancestries (Pearson's R = .91; P = 2.9 × 10-7). The genetic risk score derived from the BrS GWAS of European ancestry was significantly associated with the risk of BrS in the Japanese population [odds ratio 2.12 (95% confidence interval 1.94-2.31); P = 1.2 × 10-61], suggesting a shared genetic architecture across ancestries. Functional characterization revealed that a lead variant in CAMK2D promotes alternative splicing, resulting in an isoform switch of calmodulin kinase II-δ, favouring a pro-inflammatory/pro-death pathway. CONCLUSIONS This study demonstrates novel susceptibility loci implicating potentially novel pathogenesis underlying BrS. Despite differences in clinical expressivity and epidemiology, the polygenic architecture of BrS was substantially shared across ancestries.
Collapse
Affiliation(s)
- Taisuke Ishikawa
- Omics Research Center, National Cerebral and Cardiovascular Center, Suita, Japan
| | - Tatsuo Masuda
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- StemRIM Institute of Regeneration-Inducing Medicine, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Obstetrics and Gynecology, Osaka University Graduate School of Medicine, Suita, Japan
| | - Tsuyoshi Hachiya
- Institute for Biomedical Sciences, Iwate Medical University, Iwate, Japan
| | - Christian Dina
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
| | - Floriane Simonet
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
| | - Yuki Nagata
- Bioresource Research Center, Tokyo Medical and Dental University, Tokyo, Japan
- Department of Human Genetics and Disease Diversity, Tokyo Medical and Dental University, Tokyo, Japan
| | - Michael W T Tanck
- Epidemiology and Data Science, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands
| | - Kyuto Sonehara
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
| | - Charlotte Glinge
- Department of Experimental Cardiology, Amsterdam UMC location University of Amsterdam, Heart Centre, Amsterdam, The Netherlands
- Heart Failure & Arrhythmias, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
- Department of Cardiology, The Heart Centre, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
| | - Rafik Tadros
- Department of Experimental Cardiology, Amsterdam UMC location University of Amsterdam, Heart Centre, Amsterdam, The Netherlands
- Heart Failure & Arrhythmias, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
- Montreal Heart Institute, Universite de Montreal, Cardiovascular Genetics Centre, Montreal, Quebec, Canada
| | - Apichai Khongphatthanayothin
- Department of Medicine, Center of Excellence in Arrhythmia Research Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Department of Pediatrics, Division of Cardiology Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Department of Cardiology, Bangkok Hospital, Bangkok, Thailand
| | - Tzu-Pin Lu
- Department of Public Health, Institute of Health Data Analytics and Statistics, National Taiwan University, Taipei, Taiwan
| | - Chihiro Higuchi
- Bioresource Research Center, Tokyo Medical and Dental University, Tokyo, Japan
- Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, Settsu, Japan
| | - Tadashi Nakajima
- Department of Cardiovascular Medicine, Gunma University Graduate School of Medicine, Maebashi, Japan
| | - Kenshi Hayashi
- Department of Cardiovascular Medicine, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan
| | - Yoshiyasu Aizawa
- Department of Cardiovascular Medicine, International University of Health and Welfare, Narita, Japan
| | - Yukiko Nakano
- Department of Cardiovascular Medicine, Hiroshima University Graduate School of Biomedical and Health Sciences, Hiroshima, Japan
| | - Akihiko Nogami
- Department of Cardiology, University of Tsukuba, Tsukuba, Japan
| | - Hiroshi Morita
- Department of Cardiovascular Therapeutics, Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama, Japan
| | - Seiko Ohno
- Medical Genome Center, National Cerebral and Cardiovascular Center, Suita, Japan
- Department of Cardiovascular Medicine, Shiga University of Medical Sciences, Otsu, Japan
| | - Takeshi Aiba
- Department of Cardiovascular Medicine, National Cerebral and Cardiovascular Center, Suita, Japan
| | - Christian Krijger Juárez
- Department of Experimental Cardiology, Amsterdam UMC location University of Amsterdam, Heart Centre, Amsterdam, The Netherlands
- Heart Failure & Arrhythmias, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
| | - John Mauleekoonphairoj
- Department of Medicine, Center of Excellence in Arrhythmia Research Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Yong Poovorawan
- Center of Excellence in Clinical Virology Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Jean-Baptiste Gourraud
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
| | - Wataru Shimizu
- Department of Cardiovascular Medicine, Nippon Medical School, Tokyo, Japan
| | - Vincent Probst
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
| | - Minoru Horie
- Department of Cardiovascular Medicine, Shiga University of Medical Sciences, Otsu, Japan
| | - Arthur A M Wilde
- Heart Failure & Arrhythmias, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
- Department of Cardiology, Amsterdam UMC location University of Amsterdam, Heart Centre, Amsterdam, The Netherlands
| | - Richard Redon
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
| | - Jyh-Ming Jimmy Juang
- Cardiovascular Center, Heart Failure Center and Department of Internal Medicine, Division of Cardiology, National Taiwan University Hospital, Taipei, Taiwan
- Department of Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Koonlawee Nademanee
- Department of Medicine, Center of Excellence in Arrhythmia Research Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Pacific Rim Electrophysiology Research Institute, Bumrungrad International Hospital, Bangkok, Thailand
| | - Connie R Bezzina
- Department of Experimental Cardiology, Amsterdam UMC location University of Amsterdam, Heart Centre, Amsterdam, The Netherlands
- Heart Failure & Arrhythmias, Amsterdam Cardiovascular Sciences, Amsterdam, The Netherlands
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
| | - Julien Barc
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
| | - Toshihiro Tanaka
- Bioresource Research Center, Tokyo Medical and Dental University, Tokyo, Japan
- Department of Human Genetics and Disease Diversity, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center, Osaka University, Suita, Japan
| | - Jean-Jacques Schott
- L'institut du thorax, Nantes Université, CHU Nantes, CNRS, INSERM, Nantes, France
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-HEART https://guardheart.ern-net.eu)
| | - Naomasa Makita
- Department of Cardiology, Sapporo Teishinkai Hospital, N33, E1, Sapporo 065-0033, Japan
- Department of Cell Biology, National Cerebral and Cardiovascular Center Research Institute, 6-1, Kishibe Shimmachi, 564-8565 Suita, Japan
| |
Collapse
|
14
|
Kiianitsa K, Lukes ME, Hayes BJ, Brutman JN, Valdmanis PN, Bird TD, Raskind WH, Korvatska O. TREM2 variants that cause early dementia and increase Alzheimer's disease risk affect gene splicing. Brain 2024; 147:2368-2383. [PMID: 38226698 PMCID: PMC11224616 DOI: 10.1093/brain/awae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 01/02/2024] [Accepted: 01/09/2024] [Indexed: 01/17/2024] Open
Abstract
Loss-of-function variants in the triggering receptor expressed on myeloid cells 2 (TREM2) are responsible for a spectrum of neurodegenerative disorders. In the homozygous state, they cause severe pathologies with early onset dementia, such as Nasu-Hakola disease and behavioural variants of frontotemporal dementia (FTD), whereas heterozygous variants increase the risk of late-onset Alzheimer's disease (AD) and FTD. For over half of TREM2 variants found in families with recessive early onset dementia, the defect occurs at the transcript level via premature termination codons or aberrant splicing. The remaining variants are missense alterations thought to affect the protein; however, the underlying pathogenic mechanism is less clear. In this work, we tested whether these disease-associated TREM2 variants contribute to the pathology via altered splicing. Variants scored by SpliceAI algorithm were tested by a full-size TREM2 splicing reporter assay in different cell lines. The effect of variants was quantified by qRT-/RT-PCR and western blots. Nanostring nCounter was used to measure TREM2 RNA in the brains of NHD patients who carried spliceogenic variants. Exon skipping events were analysed from brain RNA-Seq datasets available through the Accelerating Medicines Partnership for Alzheimer's Disease Consortium. We found that for some Nasu-Hakola disease and early onset FTD-causing variants, splicing defects were the primary cause (D134G) or likely contributor to pathogenicity (V126G and K186N). Similar but milder effects on splicing of exons 2 and 3 were demonstrated for A130V, L133L and R136W enriched in patients with dementia. Moreover, the two most frequent missense variants associated with AD/FTD risk in European and African ancestries (R62H, 1% in Caucasians and T96K, 12% in Africans) had splicing defects via excessive skipping of exon 2 and overproduction of a potentially antagonistic TREM2 protein isoform. The effect of R62H on exon 2 skipping was confirmed in three independent brain RNA-Seq datasets. Our findings revealed an unanticipated complexity of pathogenic variation in TREM2, in which effects on post-transcriptional gene regulation and protein function often coexist. This necessitates the inclusion of computational and experimental analyses of splicing and mRNA processing for a better understanding of genetic variation in disease.
Collapse
Affiliation(s)
- Kostantin Kiianitsa
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA 98195, USA
| | - Maria E Lukes
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| | - Brian J Hayes
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Julianna N Brutman
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| | - Paul N Valdmanis
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| | - Thomas D Bird
- Department of Neurology, University of Washington, Seattle, WA 98195, USA
- Geriatric Research, Education and Clinical Center (GRECC), VA Puget Sound Medical Center, Seattle, WA 98108, USA
| | - Wendy H Raskind
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
- Mental Illness Research, Education and Clinical Center (MIRECC), VA Puget Sound Medical Center, Seattle, WA 98108, USA
| | - Olena Korvatska
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
15
|
Sun KY, Bai X, Chen S, Bao S, Zhang C, Kapoor M, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke AE, Ganel L, Hawes A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Di Gioia A, Rajagopal VM, Lopez A, Varela JR, Alegre-Díaz J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton JD, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalogue of protein-coding variation in 983,578 individuals. Nature 2024; 631:583-592. [PMID: 38768635 PMCID: PMC11254753 DOI: 10.1038/s41586-024-07556-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 05/10/2024] [Indexed: 05/22/2024]
Abstract
Rare coding variants that substantially affect function provide insights into the biology of a gene1-3. However, ascertaining the frequency of such variants requires large sample sizes4-8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Liron Ganel
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | - Jesús Alegre-Díaz
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Jaime Berumen
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Pablo Kuri-Morales
- Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
16
|
Quinones-Valdez G, Amoah K, Xiao X. Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599101. [PMID: 38915585 PMCID: PMC11195283 DOI: 10.1101/2024.06.14.599101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Genetic regulation of alternative splicing constitutes an important link between genetic variation and disease. Nonetheless, RNA splicing is regulated by both cis-acting elements and trans-acting splicing factors. Determining splicing events that are directed primarily by the cis- or trans-acting mechanisms will greatly inform our understanding of the genetic basis of disease. Here, we show that long-read RNA-seq, combined with our new method isoLASER, enables a clear segregation of cis- and trans-directed splicing events for individual samples. The genetic linkage of splicing is largely individual-specific, in stark contrast to the tissue-specific pattern of splicing profiles. Analysis of long-read RNA-seq data from human and mouse revealed thousands of cis-directed splicing events susceptible to genetic regulation. We highlight such events in the HLA genes whose analysis was challenging with short-read data. We also highlight novel cis-directed splicing events in Alzheimer's disease-relevant genes such as MAPT and BIN1. Together, the clear demarcation of cis- and trans-directed splicing paves ways for future studies of the genetic basis of disease.
Collapse
Affiliation(s)
- Giovanni Quinones-Valdez
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kofi Amoah
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
17
|
Chin IM, Gardell ZA, Corces MR. Decoding polygenic diseases: advances in noncoding variant prioritization and validation. Trends Cell Biol 2024; 34:465-483. [PMID: 38719704 DOI: 10.1016/j.tcb.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/12/2024] [Accepted: 03/21/2024] [Indexed: 06/09/2024]
Abstract
Genome-wide association studies (GWASs) provide a key foundation for elucidating the genetic underpinnings of common polygenic diseases. However, these studies have limitations in their ability to assign causality to particular genetic variants, especially those residing in the noncoding genome. Over the past decade, technological and methodological advances in both analytical and empirical prioritization of noncoding variants have enabled the identification of causative variants by leveraging orthogonal functional evidence at increasing scale. In this review, we present an overview of these approaches and describe how this workflow provides the groundwork necessary to move beyond associations toward genetically informed studies on the molecular and cellular mechanisms of polygenic disease.
Collapse
Affiliation(s)
- Iris M Chin
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA; Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Zachary A Gardell
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA; Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - M Ryan Corces
- Gladstone Institute of Neurological Disease, Gladstone Institutes, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA; Department of Neurology, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
18
|
Fronk AD, Manzanares MA, Zheng P, Geier A, Anderson K, Stanton S, Zumrut H, Gera S, Munch R, Frederick V, Dhingra P, Arun G, Akerman M. Development and validation of AI/ML derived splice-switching oligonucleotides. Mol Syst Biol 2024; 20:676-701. [PMID: 38664594 PMCID: PMC11148135 DOI: 10.1038/s44320-024-00034-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/03/2024] [Accepted: 04/09/2024] [Indexed: 06/05/2024] Open
Abstract
Splice-switching oligonucleotides (SSOs) are antisense compounds that act directly on pre-mRNA to modulate alternative splicing (AS). This study demonstrates the value that artificial intelligence/machine learning (AI/ML) provides for the identification of functional, verifiable, and therapeutic SSOs. We trained XGboost tree models using splicing factor (SF) pre-mRNA binding profiles and spliceosome assembly information to identify modulatory SSO binding sites on pre-mRNA. Using Shapley and out-of-bag analyses we also predicted the identity of specific SFs whose binding to pre-mRNA is blocked by SSOs. This step adds considerable transparency to AI/ML-driven drug discovery and informs biological insights useful in further validation steps. We applied this approach to previously established functional SSOs to retrospectively identify the SFs likely to regulate those events. We then took a prospective validation approach using a novel target in triple negative breast cancer (TNBC), NEDD4L exon 13 (NEDD4Le13). Targeting NEDD4Le13 with an AI/ML-designed SSO decreased the proliferative and migratory behavior of TNBC cells via downregulation of the TGFβ pathway. Overall, this study illustrates the ability of AI/ML to extract actionable insights from RNA-seq data.
Collapse
Affiliation(s)
| | | | - Paulina Zheng
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | - Adam Geier
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | | | | | - Hasan Zumrut
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | - Sakshi Gera
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | - Robin Munch
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | | | | | - Gayatri Arun
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | | |
Collapse
|
19
|
Ma K, Gauthier LO, Cheung F, Huang S, Lek M. High-throughput assays to assess variant effects on disease. Dis Model Mech 2024; 17:dmm050573. [PMID: 38940340 PMCID: PMC11225591 DOI: 10.1242/dmm.050573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Interpreting the wealth of rare genetic variants discovered in population-scale sequencing efforts and deciphering their associations with human health and disease present a critical challenge due to the lack of sufficient clinical case reports. One promising avenue to overcome this problem is deep mutational scanning (DMS), a method of introducing and evaluating large-scale genetic variants in model cell lines. DMS allows unbiased investigation of variants, including those that are not found in clinical reports, thus improving rare disease diagnostics. Currently, the main obstacle limiting the full potential of DMS is the availability of functional assays that are specific to disease mechanisms. Thus, we explore high-throughput functional methodologies suitable to examine broad disease mechanisms. We specifically focus on methods that do not require robotics or automation but instead use well-designed molecular tools to transform biological mechanisms into easily detectable signals, such as cell survival rate, fluorescence or drug resistance. Here, we aim to bridge the gap between disease-relevant assays and their integration into the DMS framework.
Collapse
Affiliation(s)
- Kaiyue Ma
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Logan O. Gauthier
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Frances Cheung
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Shushu Huang
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| |
Collapse
|
20
|
Recinos Y, Ustianenko D, Yeh YT, Wang X, Jacko M, Yesantharao LV, Wu Q, Zhang C. CRISPR-dCas13d-based deep screening of proximal and distal splicing-regulatory elements. Nat Commun 2024; 15:3839. [PMID: 38714659 PMCID: PMC11076525 DOI: 10.1038/s41467-024-47140-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 03/16/2024] [Indexed: 05/10/2024] Open
Abstract
Pre-mRNA splicing, a key process in gene expression, can be therapeutically modulated using various drug modalities, including antisense oligonucleotides (ASOs). However, determining promising targets is hampered by the challenge of systematically mapping splicing-regulatory elements (SREs) in their native sequence context. Here, we use the catalytically inactive CRISPR-RfxCas13d RNA-targeting system (dCas13d/gRNA) as a programmable platform to bind SREs and modulate splicing by competing against endogenous splicing factors. SpliceRUSH, a high-throughput screening method, was developed to map SREs in any gene of interest using a lentivirus gRNA library that tiles the genetic region, including distal intronic sequences. When applied to SMN2, a therapeutic target for spinal muscular atrophy, SpliceRUSH robustly identifies not only known SREs but also a previously unknown distal intronic SRE, which can be targeted to alter exon 7 splicing using either dCas13d/gRNA or ASOs. This technology enables a deeper understanding of splicing regulation with applications for RNA-based drug discovery.
Collapse
Affiliation(s)
- Yocelyn Recinos
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Dmytro Ustianenko
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
- Flagship Pioneering, Cambridge, MA, 02142, USA
| | - Yow-Tyng Yeh
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Xiaojian Wang
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Martin Jacko
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
- Aperture Therapeutics, Inc., San Carlos, CA, 94070, USA
| | - Lekha V Yesantharao
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
- Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Qiyang Wu
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Chaolin Zhang
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA.
| |
Collapse
|
21
|
Xu C, Bao S, Chen H, Jiang T, Zhang C. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.22.586363. [PMID: 38586002 PMCID: PMC10996483 DOI: 10.1101/2024.03.22.586363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ~15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders, including 19 genes with recurrent splicing-altering mutations. Among the new candidate disease risk genes, MFN1 is involved in mitochondria fusion, which is frequently disrupted in autism patients. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
Collapse
Affiliation(s)
- Chencheng Xu
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Present address: Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Suying Bao
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Present address: Regeneron Pharmaceuticals, Terrytown, NY 10591, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
- Present address: Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tao Jiang
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Chaolin Zhang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
22
|
Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief Bioinform 2024; 25:bbae163. [PMID: 38605640 PMCID: PMC11009468 DOI: 10.1093/bib/bbae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/22/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.
Collapse
Affiliation(s)
- Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhou
- Peng Cheng Laboratory, Shenzhen, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, China
| | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
| |
Collapse
|
23
|
Li K, Xiao J, Ling Z, Luo T, Xiong J, Chen Q, Dong L, Wang Y, Wang X, Jiang Z, Xia L, Yu Z, Hua R, Guo R, Tang D, Lv M, Lian A, Li B, Zhao G, He X, Xia K, Cao Y, Li J. Prioritizing de novo potential non-canonical splicing variants in neurodevelopmental disorders. EBioMedicine 2024; 99:104928. [PMID: 38113761 PMCID: PMC10767160 DOI: 10.1016/j.ebiom.2023.104928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 11/30/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Genomic variants outside of the canonical splicing site (±2) may generate abnormal mRNA splicing, which are defined as non-canonical splicing variants (NCSVs). However, the clinical interpretation of NCSVs in neurodevelopmental disorders (NDDs) is largely unknown. METHODS We investigated the contribution of NCSVs to NDDs from 345,787 de novo variants (DNVs) in 47,574 patients with NDDs. We performed functional enrichment and protein-protein interaction analysis to assess the association between genes carrying prioritised NCSVs and NDDs. Minigene was used to validate the impact of NCSVs on mRNA splicing. FINDINGS We observed significantly more NCSVs (p = 0.02, odds ratio [OR] = 2.05) among patients with NDD than in controls. Both canonical splicing variants (CSVs) and NCSVs contributed to an equal proportion of patients with NDD (0.76% vs. 0.82%). The candidate genes carrying NCSVs were associated with glutamatergic synapse and chromatin remodelling. Minigene successfully validated 59 of 79 (74.68%) NCSVs that led to abnormal splicing in 40 candidate genes, and 9 of the genes (ARID1B, KAT6B, TCF4, SMARCA2, SHANK3, PDHA1, WDR45, SCN2A, SYNGAP1) harboured recurrent NCSVs with the same variant present in more than two unrelated patients with NDD. Moreover, 36 of 59 (61.02%) NCSVs are novel clinically relevant variants, including 34 unreported and 2 clinically conflicting interpretations or of uncertain significance NCSVs in the ClinVar database. INTERPRETATION This study highlights the common pathology and clinical importance of NCSVs in unsolved patients with NDD. FUNDING The present study was funded by grants from the National Natural Science Foundation of China, China Postdoctoral Science Foundation, the Hunan Youth Science and Technology Innovation Talent Project, the Provincial Natural Science Foundation of Hunan, The Scientific Research Program of FuRong laboratory, and the Natural Science Project of the University of Anhui Province.
Collapse
Affiliation(s)
- Kuokuo Li
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Jifang Xiao
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - Zhengbao Ling
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Jingyu Xiong
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Qian Chen
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - Lijie Dong
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - Yijing Wang
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - Zhaowei Jiang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhen Yu
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Rong Hua
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Rui Guo
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Dongdong Tang
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Mingrong Lv
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China
| | - Aojie Lian
- National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hunan Provincial Maternal and Child Health Care Hospital, Changsha, Hunan, China
| | - Bin Li
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - GuiHu Zhao
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China
| | - Xiaojin He
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China; Anhui Provincial Human Sperm Bank, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China.
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China.
| | - Yunxia Cao
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract (Anhui Medical University), No 81 Meishan Road, Hefei, 230032, Anhui, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, No 81 Meishan Road, Hefei, 230032, Anhui, China.
| | - Jinchen Li
- Bioinformatics Center, National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China; Bioinformatics Center, Furong Laboratory, Central South University, Changsha, Hunan, China.
| |
Collapse
|
24
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
25
|
Maes S, Deploey N, Peelman F, Eyckerman S. Deep mutational scanning of proteins in mammalian cells. CELL REPORTS METHODS 2023; 3:100641. [PMID: 37963462 PMCID: PMC10694495 DOI: 10.1016/j.crmeth.2023.100641] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/06/2023] [Accepted: 10/20/2023] [Indexed: 11/16/2023]
Abstract
Protein mutagenesis is essential for unveiling the molecular mechanisms underlying protein function in health, disease, and evolution. In the past decade, deep mutational scanning methods have evolved to support the functional analysis of nearly all possible single-amino acid changes in a protein of interest. While historically these methods were developed in lower organisms such as E. coli and yeast, recent technological advancements have resulted in the increased use of mammalian cells, particularly for studying proteins involved in human disease. These advancements will aid significantly in the classification and interpretation of variants of unknown significance, which are being discovered at large scale due to the current surge in the use of whole-genome sequencing in clinical contexts. Here, we explore the experimental aspects of deep mutational scanning studies in mammalian cells and report the different methods used in each step of the workflow, ultimately providing a useful guide toward the design of such studies.
Collapse
Affiliation(s)
- Stefanie Maes
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biochemistry and Microbiology, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Nick Deploey
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Frank Peelman
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Sven Eyckerman
- VIB Center for Medical Biotechnology (CMB), Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium.
| |
Collapse
|
26
|
Sun KY, Bai X, Chen S, Bao S, Kapoor M, Zhang C, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Gioia AD, Rajagopal V, Lopez A, Varela JR, Alegre J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton J, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalog of protein-coding variation in 985,830 individuals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.539329. [PMID: 37214792 PMCID: PMC10197621 DOI: 10.1101/2023.05.09.539329] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | - Adam Locke
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | - Jesus Alegre
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Roberto Tapia-Conyer
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Pablo Kuri-Morales
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jason Torres
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
27
|
Liao SE, Sudarshan M, Regev O. Deciphering RNA splicing logic with interpretable machine learning. Proc Natl Acad Sci U S A 2023; 120:e2221165120. [PMID: 37796983 PMCID: PMC10576025 DOI: 10.1073/pnas.2221165120] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/29/2023] [Indexed: 10/07/2023] Open
Abstract
Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: Despite their excellent accuracy, they cannot describe how they arrived at their predictions. Here, using an "interpretable-by-design" approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed uncharacterized components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Collapse
Affiliation(s)
- Susan E. Liao
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Oded Regev
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| |
Collapse
|
28
|
Wang R, Helbig I, Edmondson AC, Lin L, Xing Y. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open
Abstract
Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients.
Collapse
Affiliation(s)
- Robert Wang
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew C Edmondson
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
29
|
O'Neill MJ, Yang T, Laudeman J, Calandranis M, Solus J, Roden DM, Glazer AM. ParSE-seq: A Calibrated Multiplexed Assay to Facilitate the Clinical Classification of Putative Splice-altering Variants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.04.23295019. [PMID: 37732247 PMCID: PMC10508793 DOI: 10.1101/2023.09.04.23295019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Background Interpreting the clinical significance of putative splice-altering variants outside 2-base pair canonical splice sites remains difficult without functional studies. Methods We developed Parallel Splice Effect Sequencing (ParSE-seq), a multiplexed minigene-based assay, to test variant effects on RNA splicing quantified by high-throughput sequencing. We studied variants in SCN5A, an arrhythmia-associated gene which encodes the major cardiac voltage-gated sodium channel. We used the computational tool SpliceAI to prioritize exonic and intronic candidate splice variants, and ClinVar to select benign and pathogenic control variants. We generated a pool of 284 barcoded minigene plasmids, transfected them into Human Embryonic Kidney (HEK293) cells and induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs), sequenced the resulting pools of splicing products, and calibrated the assay to the American College of Medical Genetics and Genomics scheme. Variants were interpreted using the calibrated functional data, and experimental data were compared to SpliceAI predictions. We further studied some splice-altering missense variants by cDNA-based automated patch clamping (APC) in HEK cells and assessed splicing and sodium channel function in CRISPR-edited iPSC-CMs. Results ParSE-seq revealed the splicing effect of 224 SCN5A variants in iPSC-CMs and 244 variants in HEK293 cells. The scores between the cell types were highly correlated (R2=0.84). In iPSCs, the assay had concordant scores for 21/22 benign/likely benign and 24/25 pathogenic/likely pathogenic control variants from ClinVar. 43/112 exonic variants and 35/70 intronic variants with determinate scores disrupted splicing. 11 of 42 variants of uncertain significance were reclassified, and 29 of 34 variants with conflicting interpretations were reclassified using the functional data. SpliceAI computational predictions correlated well with experimental data (AUC = 0.96). We identified 20 unique SCN5A missense variants that disrupted splicing, and 2 clinically observed splice-altering missense variants of uncertain significance had normal function when tested with the cDNA-based APC assay. A splice-altering intronic variant detected by ParSE-seq, c.1891-5C>G, also disrupted splicing and sodium current when introduced into iPSC-CMs at the endogenous locus by CRISPR editing. Conclusions ParSE-seq is a calibrated, multiplexed, high-throughput assay to facilitate the classification of candidate splice-altering variants.
Collapse
Affiliation(s)
| | - Tao Yang
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Julie Laudeman
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Maria Calandranis
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Joseph Solus
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Dan M Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Andrew M Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
30
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
31
|
Recinos Y, Ustianenko D, Yeh YT, Wang X, Jacko M, Yesantharao LV, Wu Q, Zhang C. Deep screening of proximal and distal splicing-regulatory elements in a native sequence context. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.21.554109. [PMID: 37662340 PMCID: PMC10473672 DOI: 10.1101/2023.08.21.554109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Pre-mRNA splicing, a key process in gene expression, can be therapeutically modulated using various drug modalities, including antisense oligonucleotides (ASOs). However, determining promising targets is impeded by the challenge of systematically mapping splicing-regulatory elements (SREs) in their native sequence context. Here, we use the catalytically dead CRISPR-RfxCas13d RNA-targeting system (dCas13d/gRNA) as a programmable platform to bind SREs and modulate splicing by competing against endogenous splicing factors. SpliceRUSH, a high-throughput screening method, was developed to map SREs in any gene of interest using a lentivirus gRNA library that tiles the genetic region, including distal intronic sequences. When applied to SMN2, a therapeutic target for spinal muscular atrophy, SpliceRUSH robustly identified not only known SREs, but also a novel distal intronic splicing enhancer, which can be targeted to alter exon 7 splicing using either dCas13d/gRNA or ASOs. This technology enables a deeper understanding of splicing regulation with applications for RNA-based drug discovery.
Collapse
Affiliation(s)
- Yocelyn Recinos
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Dmytro Ustianenko
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
- Present address: Flagship Pioneering, Cambridge, MA 02142, USA
| | - Yow-Tyng Yeh
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Xiaojian Wang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Martin Jacko
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
- Present address: Aperture Therapeutics, Inc., San Carlos, CA 94070, USA
| | - Lekha V. Yesantharao
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
- Present address: Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Qiyang Wu
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Chaolin Zhang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| |
Collapse
|
32
|
Gaulton KJ, Preissl S, Ren B. Interpreting non-coding disease-associated human variants using single-cell epigenomics. Nat Rev Genet 2023; 24:516-534. [PMID: 37161089 PMCID: PMC10629587 DOI: 10.1038/s41576-023-00598-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2023] [Indexed: 05/11/2023]
Abstract
Genome-wide association studies (GWAS) have linked hundreds of thousands of sequence variants in the human genome to common traits and diseases. However, translating this knowledge into a mechanistic understanding of disease-relevant biology remains challenging, largely because such variants are predominantly in non-protein-coding sequences that still lack functional annotation at cell-type resolution. Recent advances in single-cell epigenomics assays have enabled the generation of cell type-, subtype- and state-resolved maps of the epigenome in heterogeneous human tissues. These maps have facilitated cell type-specific annotation of candidate cis-regulatory elements and their gene targets in the human genome, enhancing our ability to interpret the genetic basis of common traits and diseases.
Collapse
Affiliation(s)
- Kyle J Gaulton
- Department of Paediatrics, Paediatric Diabetes Research Center, University of California San Diego School of Medicine, La Jolla, CA, USA.
| | - Sebastian Preissl
- Center for Epigenomics, University of California San Diego School of Medicine, La Jolla, CA, USA.
- Institute of Experimental and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| | - Bing Ren
- Center for Epigenomics, University of California San Diego School of Medicine, La Jolla, CA, USA.
- Department of Cellular and Molecular Medicine, University of California San Diego School of Medicine, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, La Jolla, CA, USA.
| |
Collapse
|
33
|
Walker LC, Hoya MDL, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A, Zimmermann H, Byrne AB, Pesaran T, Karam R, Harrison SM, Spurdle AB. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet 2023; 110:1046-1067. [PMID: 37352859 PMCID: PMC10357475 DOI: 10.1016/j.ajhg.2023.06.002] [Citation(s) in RCA: 89] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/25/2023] Open
Abstract
The American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1, PS3, PP3, BS3, BP4, and BP7. However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. We utilized empirically derived splicing evidence to (1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, (2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and (3) exemplify methodology to calibrate splice prediction tools. We propose repurposing the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely, BP7 may be used to capture RNA results demonstrating no splicing impact for intronic and synonymous variants. We propose that the PS3/BS3 codes are applied only for well-established assays that measure functional impact not directly captured by RNA-splicing assays. We recommend the application of PS1 based on similarity of predicted RNA-splicing effects for a variant under assessment in comparison with a known pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA-assay evidence described aim to help standardize variant pathogenicity classification processes when interpreting splicing-based evidence.
Collapse
Affiliation(s)
- Logan C Walker
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Miguel de la Hoya
- Molecular Oncology Laboratory, CIBERONC, Hospital Clinico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - George A R Wiggins
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | | | | | - Michael T Parsons
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Daffodil M Canson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | | | | | | | - Alicia B Byrne
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Steven M Harrison
- Ambry Genetics, Aliso Viejo, CA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia; Faculty of Medicine, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
34
|
Fabo T, Khavari P. Functional characterization of human genomic variation linked to polygenic diseases. Trends Genet 2023; 39:462-490. [PMID: 36997428 PMCID: PMC11025698 DOI: 10.1016/j.tig.2023.02.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/30/2023]
Abstract
The burden of human disease lies predominantly in polygenic diseases. Since the early 2000s, genome-wide association studies (GWAS) have identified genetic variants and loci associated with complex traits. These have ranged from variants in coding sequences to mutations in regulatory regions, such as promoters and enhancers, as well as mutations affecting mediators of mRNA stability and other downstream regulators, such as 5' and 3'-untranslated regions (UTRs), long noncoding RNA (lncRNA), and miRNA. Recent research advances in genetics have utilized a combination of computational techniques, high-throughput in vitro and in vivo screening modalities, and precise genome editing to impute the function of diverse classes of genetic variants identified through GWAS. In this review, we highlight the vastness of genomic variants associated with polygenic disease risk and address recent advances in how genetic tools can be used to functionally characterize them.
Collapse
Affiliation(s)
- Tania Fabo
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Paul Khavari
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Graduate Program in Genetics, Stanford University, Stanford, CA, USA; Stanford University School of Medicine, Stanford University, Stanford, CA, USA; Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA.
| |
Collapse
|
35
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
36
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
37
|
Rogalska ME, Vivori C, Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat Rev Genet 2023; 24:251-269. [PMID: 36526860 DOI: 10.1038/s41576-022-00556-8] [Citation(s) in RCA: 106] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2022] [Indexed: 12/23/2022]
Abstract
The removal of introns from mRNA precursors and its regulation by alternative splicing are key for eukaryotic gene expression and cellular function, as evidenced by the numerous pathologies induced or modified by splicing alterations. Major recent advances have been made in understanding the structures and functions of the splicing machinery, in the description and classification of physiological and pathological isoforms and in the development of the first therapies for genetic diseases based on modulation of splicing. Here, we review this progress and discuss important remaining challenges, including predicting splice sites from genomic sequences, understanding the variety of molecular mechanisms and logic of splicing regulation, and harnessing this knowledge for probing gene function and disease aetiology and for the design of novel therapeutic approaches.
Collapse
Affiliation(s)
- Malgorzata Ewa Rogalska
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Vivori
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- The Francis Crick Institute, London, UK
| | - Juan Valcárcel
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
38
|
Walker LC, de la Hoya M, Wiggins GA, Lindy A, Vincent LM, Parsons M, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A, Zimmermann H, Byrne AB, Pesaran T, Karam R, Harrison SM, Spurdle AB. APPLICATION OF THE ACMG/AMP FRAMEWORK TO CAPTURE EVIDENCE RELEVANT TO PREDICTED AND OBSERVED IMPACT ON SPLICING: RECOMMENDATIONS FROM THE CLINGEN SVI SPLICING SUBGROUP. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.24.23286431. [PMID: 36865205 PMCID: PMC9980257 DOI: 10.1101/2023.02.24.23286431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1 (null variant in a gene where loss-of-function is the mechanism of disease), PS3 (functional assays show damaging effect on splicing), PP3 (computational evidence supports a splicing effect), BS3 (functional assays show no damaging effect on splicing), BP4 (computational evidence suggests no splicing impact), and BP7 (silent change with no predicted impact on splicing). However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation (SVI) Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. Our study utilised empirically derived splicing evidence to: 1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, 2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and 3) exemplify methodology to calibrate bioinformatic splice prediction tools. We propose repurposing of the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely BP7 may be used to capture RNA results demonstrating no impact on splicing for both intronic and synonymous variants, and for missense variants if protein functional impact has been excluded. Furthermore, we propose that the PS3 and BS3 codes are applied only for well-established assays that measure functional impact that is not directly captured by RNA splicing assays. We recommend the application of PS1 based on similarity of predicted RNA splicing effects for a variant under assessment in comparison to a known Pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA assay evidence described aim to help standardise variant pathogenicity classification processes and result in greater consistency when interpreting splicing-based evidence.
Collapse
|
39
|
Аpplication of massive parallel reporter analysis in biotechnology and medicine. КЛИНИЧЕСКАЯ ПРАКТИКА 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|
40
|
Barbosa P, Savisaar R, Carmo-Fonseca M, Fonseca A. Computational prediction of human deep intronic variation. Gigascience 2022; 12:giad085. [PMID: 37878682 PMCID: PMC10599398 DOI: 10.1093/gigascience/giad085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND The adoption of whole-genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to discriminate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce. RESULTS In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that potentially affect splicing regulatory elements. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground - information, but the use of these tools results in decreased predictive power when compared to black box methods. CONCLUSIONS Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.
Collapse
Affiliation(s)
- Pedro Barbosa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | | | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | - Alcides Fonseca
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
| |
Collapse
|
41
|
Luo X, Maciaszek JL, Thompson BA, Leong HS, Dixon K, Sousa S, Anderson M, Roberts ME, Lee K, Spurdle AB, Mensenkamp AR, Brannan T, Pardo C, Zhang L, Pesaran T, Wei S, Fasaye GA, Kesserwan C, Shirts BH, Davis JL, Oliveira C, Plon SE, Schrader KA, Karam R. Optimising clinical care through CDH1-specific germline variant curation: improvement of clinical assertions and updated curation guidelines. J Med Genet 2022; 60:568-575. [PMID: 36600593 DOI: 10.1136/jmg-2022-108807] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/10/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND Germline pathogenic variants in CDH1 are associated with increased risk of diffuse gastric cancer and lobular breast cancer. Risk reduction strategies include consideration of prophylactic surgery, thereby making accurate interpretation of germline CDH1 variants critical for physicians deciding on these procedures. The Clinical Genome Resource (ClinGen) CDH1 Variant Curation Expert Panel (VCEP) developed specifications for CDH1 variant curation with a goal to resolve variants of uncertain significance (VUS) and with ClinVar conflicting interpretations and continues to update these specifications. METHODS CDH1 variant classification specifications were modified based on updated genetic testing clinical criteria, new recommendations from ClinGen and expert knowledge from ongoing CDH1 variant curations. The CDH1 VCEP reviewed 273 variants using updated CDH1 specifications and incorporated published and unpublished data provided by diagnostic laboratories. RESULTS Updated CDH1-specific interpretation guidelines include 11 major modifications since the initial specifications from 2018. Using the refined guidelines, 97% (36 of 37) of variants with ClinVar conflicting interpretations were resolved to benign, likely benign, likely pathogenic or pathogenic, and 35% (15 of 43) of VUS were resolved to benign or likely benign. Overall, 88% (239 of 273) of curated variants had non-VUS classifications. To date, variants classified as pathogenic are either nonsense, frameshift, splicing, or affecting the translation initiation codon, and the only missense variants classified as pathogenic or likely pathogenic have been shown to affect splicing. CONCLUSIONS The development and evolution of CDH1-specific criteria by the expert panel resulted in decreased uncertain and conflicting interpretations of variants in this clinically actionable gene, which can ultimately lead to more effective clinical management recommendations.
Collapse
Affiliation(s)
- Xi Luo
- Department of Pediatrics/Hematology-Oncology, Baylor College of Medicine, Houston, Texas, USA
| | - Jamie L Maciaszek
- Department of Pathology, St Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Bryony A Thompson
- Department of Pathology, Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Huei San Leong
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| | - Katherine Dixon
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Sónia Sousa
- Instituto de Investigação e Inovação em Saúde - (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology - (IPATIMUP), University of Porto, Porto, Portugal
| | | | | | - Kristy Lee
- Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Arjen R Mensenkamp
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | | | - Liying Zhang
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, California, USA
| | | | - Sainan Wei
- Department of Pathology and Laboratory Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Grace-Ann Fasaye
- Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Brian H Shirts
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington, USA
| | - Jeremy L Davis
- Surgical Oncology Program, National Cancer Institute, Bethesda, Maryland, USA
| | - Carla Oliveira
- Instituto de Investigação e Inovação em Saúde - (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology - (IPATIMUP), University of Porto, Porto, Portugal.,Department of Pathology, University of Porto, Porto, Portugal
| | - Sharon E Plon
- Department of Pediatrics/Hematology-Oncology, Baylor College of Medicine, Houston, Texas, USA
| | - Kasmintan A Schrader
- Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada.,Hereditary Cancer Program, BC Cancer, Vancouver, British Columbia, Canada
| | | | | |
Collapse
|
42
|
O’Neill MJ, Wada Y, Hall LD, Mitchell DW, Glazer AM, Roden DM. Functional Assays Reclassify Suspected Splice-Altering Variants of Uncertain Significance in Mendelian Channelopathies. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2022; 15:e003782. [PMID: 36197721 PMCID: PMC9772980 DOI: 10.1161/circgen.122.003782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/12/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Rare protein-altering variants in SCN5A, KCNQ1, and KCNH2 are major causes of Brugada syndrome and the congenital long QT syndrome. While splice-altering variants lying outside 2-bp canonical splice sites can cause these diseases, their role remains poorly described. We implemented 2 functional assays to assess 12 recently reported putative splice-altering variants of uncertain significance and 1 likely pathogenic variant without functional data observed in Brugada syndrome and long QT syndrome probands. METHODS We deployed minigene assays to assess the splicing consequences of 10 variants. Three variants incompatible with the minigene approach were introduced into control induced pluripotent stem cells by CRISPR genome editing. We differentiated cells into induced pluripotent stem cell-derived cardiomyocytes and studied splicing outcomes by reverse transcription-polymerase chain reaction. We used the American College of Medical Genetics and Genomics functional assay criteria (PS3/BS3) to reclassify variants. RESULTS We identified aberrant splicing, with presumed disruption of protein sequence, in 8/10 variants studied using the minigene assay and 1/3 studied in induced pluripotent stem cell-derived cardiomyocytes. We reclassified 8 variants of uncertain significance to likely pathogenic, 1 variant of uncertain significance to likely benign, and 1 likely pathogenic variant to pathogenic. CONCLUSIONS Functional assays reclassified splice-altering variants outside canonical splice sites in Brugada Syndrome- and long QT syndrome-associated genes.
Collapse
Affiliation(s)
- Matthew J. O’Neill
- Vanderbilt University School of Medicine, Medical Scientist
Training Program, Vanderbilt University
| | - Yuko Wada
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Lynn D. Hall
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Devyn W. Mitchell
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Andrew M. Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Division of Clinical Pharmacology, Department of Medicine
| | - Dan M. Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics
(VanCART), Departments of Medicine, Pharmacology, and Biomedical Informatics,
Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
43
|
Leman R, Parfait B, Vidaud D, Girodon E, Pacot L, Le Gac G, Ka C, Ferec C, Fichou Y, Quesnelle C, Aucouturier C, Muller E, Vaur D, Castera L, Boulouard F, Ricou A, Tubeuf H, Soukarieh O, Gaildrat P, Riant F, Guillaud‐Bataille M, Caputo SM, Caux‐Moncoutier V, Boutry‐Kryza N, Bonnet‐Dorion F, Schultz I, Rossing M, Quenez O, Goldenberg L, Harter V, Parsons MT, Spurdle AB, Frébourg T, Martins A, Houdayer C, Krieger S. SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing. Hum Mutat 2022; 43:2308-2323. [PMID: 36273432 PMCID: PMC10946553 DOI: 10.1002/humu.24491] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 10/06/2022] [Accepted: 10/18/2022] [Indexed: 01/25/2023]
Abstract
Modeling splicing is essential for tackling the challenge of variant interpretation as each nucleotide variation can be pathogenic by affecting pre-mRNA splicing via disruption/creation of splicing motifs such as 5'/3' splice sites, branch sites, or splicing regulatory elements. Unfortunately, most in silico tools focus on a specific type of splicing motif, which is why we developed the Splicing Prediction Pipeline (SPiP) to perform, in one single bioinformatic analysis based on a machine learning approach, a comprehensive assessment of the variant effect on different splicing motifs. We gathered a curated set of 4616 variants scattered all along the sequence of 227 genes, with their corresponding splicing studies. The Bayesian analysis provided us with the number of control variants, that is, variants without impact on splicing, to mimic the deluge of variants from high-throughput sequencing data. Results show that SPiP can deal with the diversity of splicing alterations, with 83.13% sensitivity and 99% specificity to detect spliceogenic variants. Overall performance as measured by area under the receiving operator curve was 0.986, better than SpliceAI and SQUIRLS (0.965 and 0.766) for the same data set. SPiP lends itself to a unique suite for comprehensive prediction of spliceogenicity in the genomic medicine era. SPiP is available at: https://sourceforge.net/projects/splicing-prediction-pipeline/.
Collapse
Affiliation(s)
- Raphaël Leman
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- UNICAENNormandie UniversitéCaenFrance
| | - Béatrice Parfait
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Dominique Vidaud
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Emmanuelle Girodon
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Laurence Pacot
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Gérald Le Gac
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Chandran Ka
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Claude Ferec
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Yann Fichou
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Céline Quesnelle
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
| | - Camille Aucouturier
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Etienne Muller
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
| | - Dominique Vaur
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Laurent Castera
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Flavie Boulouard
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Agathe Ricou
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Hélène Tubeuf
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- Integrative BiosoftwareRouenFrance
| | - Omar Soukarieh
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | | | - Florence Riant
- Laboratoire de Génétique, AP‐HPGH Saint‐Louis‐Lariboisière‐Fernand WidalParisFrance
| | | | - Sandrine M. Caputo
- Department of Genetics, Institut CurieParis Sciences Lettres Research UniversityParisFrance
| | | | - Nadia Boutry‐Kryza
- Unité Mixte de Génétique Constitutionnelle des Cancers FréquentsHospices Civils de LyonLyonFrance
| | - Françoise Bonnet‐Dorion
- Departement de Biopathologie Unité de Génétique ConstitutionnelleInstitut Bergonie—INSERM U1218BordeauxFrance
| | - Ines Schultz
- Laboratoire d'OncogénétiqueCentre Paul StraussStrasbourgFrance
| | - Maria Rossing
- Centre for Genomic Medicine, RigshospitaletUniversity of CopenhagenCopenhagenDenmark
| | - Olivier Quenez
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Louis Goldenberg
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Valentin Harter
- Department of BiostatisticsBaclesse Unicancer CenterCaenFrance
| | - Michael T. Parsons
- Department of Genetics and Computational BiologyQIMR Berghofer Medical Research InstituteHerstonQueenslandAustralia
| | - Amanda B. Spurdle
- Department of Genetics and Computational BiologyQIMR Berghofer Medical Research InstituteHerstonQueenslandAustralia
| | - Thierry Frébourg
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- Department of geneticsRouen University HospitalRouenFrance
| | - Alexandra Martins
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Claude Houdayer
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- Department of geneticsRouen University HospitalRouenFrance
| | - Sophie Krieger
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- UNICAENNormandie UniversitéCaenFrance
| |
Collapse
|
44
|
Mechanism and modeling of human disease-associated near-exon intronic variants that perturb RNA splicing. Nat Struct Mol Biol 2022; 29:1043-1055. [PMID: 36303034 DOI: 10.1038/s41594-022-00844-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 08/23/2022] [Indexed: 12/24/2022]
Abstract
It is estimated that 10%-30% of disease-associated genetic variants affect splicing. Splicing variants may generate deleteriously altered gene product and are potential therapeutic targets. However, systematic diagnosis or prediction of splicing variants is yet to be established, especially for the near-exon intronic splice region. The major challenge lies in the redundant and ill-defined branch sites and other splicing motifs therein. Here, we carried out unbiased massively parallel splicing assays on 5,307 disease-associated variants that overlapped with branch sites and collected 5,884 variants across the 5' splice region. We found that strong splice sites and exonic features preserve splicing from intronic sequence variation. Whereas the splice-altering mechanism of the 3' intronic variants is complex, that of the 5' is mainly splice-site destruction. Statistical learning combined with these molecular features allows precise prediction of altered splicing from an intronic variant. This statistical model provides the identity and ranking of biological features that determine splicing, which serves as transferable knowledge and out-performs the benchmarking predictive tool. Moreover, we demonstrated that intronic splicing variants may associate with disease risks in the human population. Our study elucidates the mechanism of splicing response of intronic variants, which classify disease-associated splicing variants for the promise of precision medicine.
Collapse
|
45
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
46
|
Komarova ES, Dontsova OA, Pyshnyi DV, Kabilov MR, Sergiev PV. Flow-Seq Method: Features and Application in Bacterial Translation Studies. Acta Naturae 2022; 14:20-37. [PMID: 36694903 PMCID: PMC9844084 DOI: 10.32607/actanaturae.11820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/11/2022] [Indexed: 01/22/2023] Open
Abstract
The Flow-seq method is based on using reporter construct libraries, where a certain element regulating the gene expression of fluorescent reporter proteins is represented in many thousands of variants. Reporter construct libraries are introduced into cells, sorted according to their fluorescence level, and then subjected to next-generation sequencing. Therefore, it turns out to be possible to identify patterns that determine the expression efficiency, based on tens and hundreds of thousands of reporter constructs in one experiment. This method has become common in evaluating the efficiency of protein synthesis simultaneously by multiple mRNA variants. However, its potential is not confined to this area. In the presented review, a comparative analysis of the Flow-seq method and other alternative approaches used for translation efficiency evaluation of mRNA was carried out; the features of its application and the results obtained by Flow-seq were also considered.
Collapse
Affiliation(s)
- E. S. Komarova
- Institute of Functional Genomics, Lomonosov Moscow State University, Moscow, 119234 Russia
| | - O. A. Dontsova
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119234 Russia
- Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119234 Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117437 Russia
| | - D. V. Pyshnyi
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia
| | - M. R. Kabilov
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia
| | - P. V. Sergiev
- Institute of Functional Genomics, Lomonosov Moscow State University, Moscow, 119234 Russia
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119234 Russia
- Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119234 Russia
| |
Collapse
|
47
|
McAfee JC, Bell JL, Krupa O, Matoba N, Stein JL, Won H. Focus on your locus with a massively parallel reporter assay. J Neurodev Disord 2022; 14:50. [PMID: 36085003 PMCID: PMC9463819 DOI: 10.1186/s11689-022-09461-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 09/01/2022] [Indexed: 01/01/2023] Open
Abstract
A growing number of variants associated with risk for neurodevelopmental disorders have been identified by genome-wide association and whole genome sequencing studies. As common risk variants often fall within large haplotype blocks covering long stretches of the noncoding genome, the causal variants within an associated locus are often unknown. Similarly, the effect of rare noncoding risk variants identified by whole genome sequencing on molecular traits is seldom known without functional assays. A massively parallel reporter assay (MPRA) is an assay that can functionally validate thousands of regulatory elements simultaneously using high-throughput sequencing and barcode technology. MPRA has been adapted to various experimental designs that measure gene regulatory effects of genetic variants within cis- and trans-regulatory elements as well as posttranscriptional processes. This review discusses different MPRA designs that have been or could be used in the future to experimentally validate genetic variants associated with neurodevelopmental disorders. Though MPRA has limitations such as it does not model genomic context, this assay can help narrow down the underlying genetic causes of neurodevelopmental disorders by screening thousands of sequences in one experiment. We conclude by describing future directions of this technique such as applications of MPRA for gene-by-environment interactions and pharmacogenetics.
Collapse
Affiliation(s)
- Jessica C McAfee
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jessica L Bell
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Oleh Krupa
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
48
|
Liu H, Dai J, Li K, Sun Y, Wei H, Wang H, Zhao C, Wang DW. Performance evaluation of computational methods for splice-disrupting variants and improving the performance using the machine learning-based framework. Brief Bioinform 2022; 23:6670557. [PMID: 35976049 DOI: 10.1093/bib/bbac334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 07/14/2022] [Accepted: 07/21/2022] [Indexed: 01/07/2023] Open
Abstract
A critical challenge in genetic diagnostics is the assessment of genetic variants associated with diseases, specifically variants that fall out with canonical splice sites, by altering alternative splicing. Several computational methods have been developed to prioritize variants effect on splicing; however, performance evaluation of these methods is hampered by the lack of large-scale benchmark datasets. In this study, we employed a splicing-region-specific strategy to evaluate the performance of prediction methods based on eight independent datasets. Under most conditions, we found that dbscSNV-ADA performed better in the exonic region, S-CAP performed better in the core donor and acceptor regions, S-CAP and SpliceAI performed better in the extended acceptor region and MMSplice performed better in identifying variants that caused exon skipping. However, it should be noted that the performances of prediction methods varied widely under different datasets and splicing regions, and none of these methods showed the best overall performance with all datasets. To address this, we developed a new method, machine learning-based classification of splice sites variants (MLCsplice), to predict variants effect on splicing based on individual methods. We demonstrated that MLCsplice achieved stable and superior prediction performance compared with any individual method. To facilitate the identification of the splicing effect of variants, we provided precomputed MLCsplice scores for all possible splice sites variants across human protein-coding genes (http://39.105.51.3:8090/MLCsplice/). We believe that the performance of different individual methods under eight benchmark datasets will provide tentative guidance for appropriate method selection to prioritize candidate splice-disrupting variants, thereby increasing the genetic diagnostic yield.
Collapse
Affiliation(s)
- Hao Liu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Jiaqi Dai
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Ke Li
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Yang Sun
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Haoran Wei
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Hong Wang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Chunxia Zhao
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| | - Dao Wen Wang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology and Hubei Key Laboratory of Genetics and Molecular Mechanisms of Cardiological Disorders, Wuhan 430030, China
| |
Collapse
|
49
|
Brooks-Warburton J, Modos D, Sudhakar P, Madgwick M, Thomas JP, Bohar B, Fazekas D, Zoufir A, Kapuy O, Szalay-Beko M, Verstockt B, Hall LJ, Watson A, Tremelling M, Parkes M, Vermeire S, Bender A, Carding SR, Korcsmaros T. A systems genomics approach to uncover patient-specific pathogenic pathways and proteins in ulcerative colitis. Nat Commun 2022; 13:2299. [PMID: 35484353 PMCID: PMC9051123 DOI: 10.1038/s41467-022-29998-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 04/06/2022] [Indexed: 12/11/2022] Open
Abstract
We describe a precision medicine workflow, the integrated single nucleotide polymorphism network platform (iSNP), designed to determine the mechanisms by which SNPs affect cellular regulatory networks, and how SNP co-occurrences contribute to disease pathogenesis in ulcerative colitis (UC). Using SNP profiles of 378 UC patients we map the regulatory effects of the SNPs to a human signalling network containing protein-protein, miRNA-mRNA and transcription factor binding interactions. With unsupervised clustering algorithms we group these patient-specific networks into four distinct clusters driven by PRKCB, HLA, SNAI1/CEBPB/PTPN1 and VEGFA/XPO5/POLH hubs. The pathway analysis identifies calcium homeostasis, wound healing and cell motility as key processes in UC pathogenesis. Using transcriptomic data from an independent patient cohort, with three complementary validation approaches focusing on the SNP-affected genes, the patient specific modules and affected functions, we confirm the regulatory impact of non-coding SNPs. iSNP identified regulatory effects for disease-associated non-coding SNPs, and by predicting the patient-specific pathogenic processes, we propose a systems-level way to stratify patients.
Collapse
Affiliation(s)
- Johanne Brooks-Warburton
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Department of Clinical, Pharmaceutical and Biological Sciences, University of Hertfordshire, Hertford, UK
- Gastroenterology Department, Lister Hospital, Stevenage, UK
| | - Dezso Modos
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Padhmanand Sudhakar
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
| | - Matthew Madgwick
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - John P Thomas
- Earlham Institute, Norwich Research Park, Norwich, UK
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - Balazs Bohar
- Earlham Institute, Norwich Research Park, Norwich, UK
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - David Fazekas
- Earlham Institute, Norwich Research Park, Norwich, UK
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary
| | - Azedine Zoufir
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Orsolya Kapuy
- Department of Molecular Biology, Semmelweis University, Budapest, Hungary
| | | | - Bram Verstockt
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
- University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
| | - Lindsay J Hall
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- Norwich Medical School, University of East Anglia, Norwich, UK
- School of Life Sciences, ZIEL - Institute for Food & Health, Technical University of Munich, 80333, Freising, Germany
| | - Alastair Watson
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - Mark Tremelling
- Department of Gastroenterology, Norfolk and Norwich University Hospitals, Norwich, UK
| | - Miles Parkes
- Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK
| | - Severine Vermeire
- KU Leuven, Department of Chronic diseases, Metabolism and Ageing, Leuven, Belgium
- University Hospitals Leuven, Department of Gastroenterology and Hepatology, KU Leuven, Leuven, Belgium
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Simon R Carding
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
- Norwich Medical School, University of East Anglia, Norwich, UK.
| | - Tamas Korcsmaros
- Earlham Institute, Norwich Research Park, Norwich, UK.
- Gut Microbes and Health Programme, The Quadram Institute Bioscience, Norwich Research Park, Norwich, UK.
| |
Collapse
|
50
|
Zeng T, Li YI. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol 2022; 23:103. [PMID: 35449021 PMCID: PMC9022248 DOI: 10.1186/s13059-022-02664-4] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 04/04/2022] [Indexed: 11/26/2022] Open
Abstract
Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.
Collapse
Affiliation(s)
- Tony Zeng
- The College, University of Chicago, Chicago, 60637, IL, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, 60637, IL, USA.
| |
Collapse
|