51
|
Horste EL, Fansler MM, Cai T, Chen X, Mitschka S, Zhen G, Lee FCY, Ule J, Mayr C. Subcytoplasmic location of translation controls protein output. Mol Cell 2023; 83:4509-4523.e11. [PMID: 38134885 PMCID: PMC11146010 DOI: 10.1016/j.molcel.2023.11.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 08/15/2023] [Accepted: 11/21/2023] [Indexed: 12/24/2023]
Abstract
The cytoplasm is highly compartmentalized, but the extent and consequences of subcytoplasmic mRNA localization in non-polarized cells are largely unknown. We determined mRNA enrichment in TIS granules (TGs) and the rough endoplasmic reticulum (ER) through particle sorting and isolated cytosolic mRNAs by digitonin extraction. When focusing on genes that encode non-membrane proteins, we observed that 52% have transcripts enriched in specific compartments. Compartment enrichment correlates with a combinatorial code based on mRNA length, exon length, and 3' UTR-bound RNA-binding proteins. Compartment-biased mRNAs differ in the functional classes of their encoded proteins: TG-enriched mRNAs encode low-abundance proteins with strong enrichment of transcription factors, whereas ER-enriched mRNAs encode large and highly expressed proteins. Compartment localization is an important determinant of mRNA and protein abundance, which is supported by reporter experiments showing that redirecting cytosolic mRNAs to the ER increases their protein expression. In summary, the cytoplasm is functionally compartmentalized by local translation environments.
Collapse
Affiliation(s)
- Ellen L Horste
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY 10065, USA; Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Mervin M Fansler
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA; Tri-Institutional Training Program in Computational Biology and Medicine, Weill-Cornell Graduate College, New York, NY 10021, USA
| | - Ting Cai
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Xiuzhen Chen
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Sibylle Mitschka
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Gang Zhen
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Flora C Y Lee
- UK Dementia Research Institute, King's College London, London SE5 9NU, UK; The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - Jernej Ule
- UK Dementia Research Institute, King's College London, London SE5 9NU, UK; The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - Christine Mayr
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY 10065, USA; Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA; Tri-Institutional Training Program in Computational Biology and Medicine, Weill-Cornell Graduate College, New York, NY 10021, USA.
| |
Collapse
|
52
|
Ma JG, O’Neill MJ, Richardson E, Thomson KL, Ingles J, Muhammad A, Solus JF, Davogustto G, Anderson KC, Benjamin Shoemaker M, Stergachis AB, Floyd BJ, Dunn K, Parikh VN, Chubb H, Perrin MJ, Roden DM, Vandenberg JI, Ng CA, Glazer AM. Multi-site validation of a functional assay to adjudicate SCN5A Brugada Syndrome-associated variants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.19.23299592. [PMID: 38196587 PMCID: PMC10775332 DOI: 10.1101/2023.12.19.23299592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Brugada Syndrome (BrS) is an inheritable arrhythmia condition that is associated with rare, loss-of-function variants in the cardiac sodium channel gene, SCN5A. Interpreting the pathogenicity of SCN5A missense variants is challenging and ~79% of SCN5A missense variants in ClinVar are currently classified as Variants of Uncertain Significance (VUS). An in vitro SCN5A-BrS automated patch clamp assay was generated for high-throughput functional studies of NaV1.5. The assay was independently studied at two separate research sites - Vanderbilt University Medical Center and Victor Chang Cardiac Research Institute - revealing strong correlations, including peak INa density (R2=0.86). The assay was calibrated according to ClinGen Sequence Variant Interpretation recommendations using high-confidence variant controls (n=49). Normal and abnormal ranges of function were established based on the distribution of benign variant assay results. The assay accurately distinguished benign controls (24/25) from pathogenic controls (23/24). Odds of Pathogenicity values derived from the experimental results yielded 0.042 for normal function (BS3 criterion) and 24.0 for abnormal function (PS3 criterion), resulting in up to strong evidence for both ACMG criteria. The calibrated assay was then used to study SCN5A VUS observed in four families with BrS and other arrhythmia phenotypes associated with SCN5A loss-of-function. The assay revealed loss-of-function for three of four variants, enabling reclassification to likely pathogenic. This validated APC assay provides clinical-grade functional evidence for the reclassification of current VUS and will aid future SCN5A-BrS variant classification.
Collapse
Affiliation(s)
- Joanne G. Ma
- Mark Cowley Lidwill Research Program in Cardiac Electrophysiology, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Darlinghurst, NSW, Australia
| | | | - Ebony Richardson
- Clinical Genomics Laboratory, Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia and Murdoch Children Research Institute, Melbourne, Australia
| | - Kate L. Thomson
- Oxford Genetics Laboratories, Churchill Hospital, Oxford, UK
| | - Jodie Ingles
- Clinical Genomics Laboratory, Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia and Murdoch Children Research Institute, Melbourne, Australia
| | - Ayesha Muhammad
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Joseph F. Solus
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Giovanni Davogustto
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Katherine C. Anderson
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - M. Benjamin Shoemaker
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Andrew B. Stergachis
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Brendan J. Floyd
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Kyla Dunn
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Victoria N. Parikh
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Henry Chubb
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Mark J. Perrin
- Department of Genomic Medicine, Royal Melbourne Hospital, Victoria, Australia
| | - Dan M. Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jamie I. Vandenberg
- Mark Cowley Lidwill Research Program in Cardiac Electrophysiology, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Darlinghurst, NSW, Australia
| | - Chai-Ann Ng
- Mark Cowley Lidwill Research Program in Cardiac Electrophysiology, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Darlinghurst, NSW, Australia
| | - Andrew M. Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
53
|
McCarley SC, Murphy DA, Thompson J, Shovlin CL. Pharmacogenomic Considerations for Anticoagulant Prescription in Patients with Hereditary Haemorrhagic Telangiectasia. J Clin Med 2023; 12:7710. [PMID: 38137783 PMCID: PMC10744266 DOI: 10.3390/jcm12247710] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
Hereditary haemorrhagic telangiectasia (HHT) is a vascular dysplasia that commonly results in bleeding but with frequent indications for therapeutic anticoagulation. Our aims were to advance the understanding of drug-specific intolerance and evaluate if there was an indication for pharmacogenomic testing. Genes encoding proteins involved in the absorption, distribution, metabolism, and excretion of warfarin, heparin, and direct oral anticoagulants (DOACs) apixaban, rivaroxaban, edoxaban, and dabigatran were identified and examined. Linkage disequilibrium with HHT genes was excluded, before variants within these genes were examined following whole genome sequencing of general and HHT populations. The 44 genes identified included 5/17 actionable pharmacogenes with guidelines. The 76,156 participants in the Genome Aggregation Database v3.1.2 had 28,446 variants, including 9668 missense substitutions and 1076 predicted loss-of-function (frameshift, nonsense, and consensus splice site) variants, i.e., approximately 1 in 7.9 individuals had a missense substitution, and 1 in 71 had a loss-of-function variant. Focusing on the 17 genes relevant to usually preferred DOACs, similar variant profiles were identified in HHT patients. With HHT patients at particular risk of haemorrhage when undergoing anticoagulant treatment, we explore how pre-emptive pharmacogenomic testing, alongside HHT gene testing, may prove beneficial in reducing the risk of bleeding and conclude that HHT patients are well placed to be at the vanguard of personalised prescribing.
Collapse
Affiliation(s)
- Sarah C. McCarley
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK; (S.C.M.); (J.T.)
| | - Daniel A. Murphy
- Pharmacy Department, Imperial College Healthcare NHS Trust, London W2 1NY, UK;
- Social, Genetic and Envionmental Determinants of Health Theme, NIHR Imperial Biomedical Research Centre, London W2 1NY, UK
| | - Jack Thompson
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK; (S.C.M.); (J.T.)
| | - Claire L. Shovlin
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK; (S.C.M.); (J.T.)
- Social, Genetic and Envionmental Determinants of Health Theme, NIHR Imperial Biomedical Research Centre, London W2 1NY, UK
- Specialist Medicine, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W12 0HS, UK
| |
Collapse
|
54
|
Radford EJ, Tan HK, Andersson MHL, Stephenson JD, Gardner EJ, Ironfield H, Waters AJ, Gitterman D, Lindsay S, Abascal F, Martincorena I, Kolesnik-Taylor A, Ng-Cordell E, Firth HV, Baker K, Perry JRB, Adams DJ, Gerety SS, Hurles ME. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat Commun 2023; 14:7702. [PMID: 38057330 PMCID: PMC10700591 DOI: 10.1038/s41467-023-43041-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 10/30/2023] [Indexed: 12/08/2023] Open
Abstract
Loss-of-function of DDX3X is a leading cause of neurodevelopmental disorders (NDD) in females. DDX3X is also a somatically mutated cancer driver gene proposed to have tumour promoting and suppressing effects. We perform saturation genome editing of DDX3X, testing in vitro the functional impact of 12,776 nucleotide variants. We identify 3432 functionally abnormal variants, in three distinct classes. We train a machine learning classifier to identify functionally abnormal variants of NDD-relevance. This classifier has at least 97% sensitivity and 99% specificity to detect variants pathogenic for NDD, substantially out-performing in silico predictors, and resolving up to 93% of variants of uncertain significance. Moreover, functionally-abnormal variants can account for almost all of the excess nonsynonymous DDX3X somatic mutations seen in DDX3X-driven cancers. Systematic maps of variant effects generated in experimentally tractable cell types have the potential to transform clinical interpretation of both germline and somatic disease-associated variation.
Collapse
Affiliation(s)
- Elizabeth J Radford
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Level 8, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Hong-Kee Tan
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | | | - Eugene J Gardner
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | | | | | | | | | | | | | | | - Elise Ng-Cordell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Psychology, University of British Columbia, Vancouver, Canada
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - Kate Baker
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - John R B Perry
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | | | | | | |
Collapse
|
55
|
Zhang Q, Shao M. Transcript assembly and annotations: Bias and adjustment. PLoS Comput Biol 2023; 19:e1011734. [PMID: 38127855 PMCID: PMC10769104 DOI: 10.1371/journal.pcbi.1011734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 01/05/2024] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
56
|
Dondi A, Lischetti U, Jacob F, Singer F, Borgsmüller N, Coelho R, Heinzelmann-Schwarz V, Beisel C, Beerenwinkel N. Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer. Nat Commun 2023; 14:7780. [PMID: 38012143 PMCID: PMC10682465 DOI: 10.1038/s41467-023-43387-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023] Open
Abstract
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine.
Collapse
Affiliation(s)
- Arthur Dondi
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
| | - Ulrike Lischetti
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland.
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland.
| | - Francis Jacob
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland
| | - Franziska Singer
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
- ETH Zurich, NEXUS Personalized Health Technologies, Wagistrasse 18, 8952, Schlieren, Switzerland
| | - Nico Borgsmüller
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
| | - Ricardo Coelho
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland
| | - Viola Heinzelmann-Schwarz
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland
- University Hospital Basel, Gynecological Cancer Center, Spitalstrasse 21, 4031, Basel, Switzerland
| | - Christian Beisel
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland.
| | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland.
| |
Collapse
|
57
|
Sanchez-Mete L, Mosciatti L, Casadio M, Vittori L, Martayan A, Stigliano V. MUTYH-associated polyposis: Is it time to change upper gastrointestinal surveillance? A single-center case series and a literature overview. World J Gastrointest Oncol 2023; 15:1891-1899. [DOI: 10.4251/wjgo.v15.i11.1891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/28/2023] [Accepted: 06/13/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND The presence of Spigelman stage (SS) IV duodenal polyposis is considered the most significant risk factor for duodenal cancer in patients with MUTYH-associated polyposis (MAP). However, advanced SS disease is rarely reported in MAP patients, and no clear recommendations on small bowel (SB) surveillance have been proposed in this patient setting.
AIM To research more because that case reports of duodenal cancers in MAP suggest that they may develop in the absence of advanced benign SS disease and often involve the distal portion of the duodenum.
METHODS We describe a series of MAP patients followed up at the Regina Elena National Cancer Institute of Rome (Italy). A literature overview on previously reported SB cancers in MAP is also provided.
RESULTS We identified two (6%) SB adenocarcinomas with no previous history of duodenal polyposis. Our observations, supported by literature evidence, suggest that the formula for staging duodenal polyposis and predicting risk factors for distal duodenum and jejunal cancer may need to be adjusted to take this into account rather than focusing solely on the presence or absence of SS IV disease.
CONCLUSION Our study emphasizes the need for further studies to define appropriate upper gastrointestinal surveillance programs in MAP patients.
Collapse
Affiliation(s)
- Lupe Sanchez-Mete
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Lorenzo Mosciatti
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Marco Casadio
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Luigi Vittori
- Department of Radiological, Oncological and Pathological Sciences, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Aline Martayan
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Vittoria Stigliano
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| |
Collapse
|
58
|
Zhang P, Chaldebas M, Ogishi M, Al Qureshah F, Ponsin K, Feng Y, Rinchai D, Milisavljevic B, Han JE, Moncada-Vélez M, Keles S, Schröder B, Stenson PD, Cooper DN, Cobat A, Boisson B, Zhang Q, Boisson-Dupuis S, Abel L, Casanova JL. Genome-wide detection of human intronic AG-gain variants located between splicing branchpoints and canonical splice acceptor sites. Proc Natl Acad Sci U S A 2023; 120:e2314225120. [PMID: 37931111 PMCID: PMC10655562 DOI: 10.1073/pnas.2314225120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 10/02/2023] [Indexed: 11/08/2023] Open
Abstract
Human genetic variants that introduce an AG into the intronic region between the branchpoint (BP) and the canonical splice acceptor site (ACC) of protein-coding genes can disrupt pre-mRNA splicing. Using our genome-wide BP database, we delineated the BP-ACC segments of all human introns and found extreme depletion of AG/YAG in the [BP+8, ACC-4] high-risk region. We developed AGAIN as a genome-wide computational approach to systematically and precisely pinpoint intronic AG-gain variants within the BP-ACC regions. AGAIN identified 350 AG-gain variants from the Human Gene Mutation Database, all of which alter splicing and cause disease. Among them, 74% created new acceptor sites, whereas 31% resulted in complete exon skipping. AGAIN also predicts the protein-level products resulting from these two consequences. We performed AGAIN on our exome/genomes database of patients with severe infectious diseases but without known genetic etiology and identified a private homozygous intronic AG-gain variant in the antimycobacterial gene SPPL2A in a patient with mycobacterial disease. AGAIN also predicts a retention of six intronic nucleotides that encode an in-frame stop codon, turning AG-gain into stop-gain. This allele was then confirmed experimentally to lead to loss of function by disrupting splicing. We further showed that AG-gain variants inside the high-risk region led to misspliced products, while those outside the region did not, by two case studies in genes STAT1 and IRF7. We finally evaluated AGAIN on our 14 paired exome-RNAseq samples and found that 82% of AG-gain variants in high-risk regions showed evidence of missplicing. AGAIN is publicly available from https://hgidsoft.rockefeller.edu/AGAIN and https://github.com/casanova-lab/AGAIN.
Collapse
Affiliation(s)
- Peng Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Matthieu Chaldebas
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Masato Ogishi
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Fahd Al Qureshah
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Khoren Ponsin
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Yi Feng
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Darawan Rinchai
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Baptiste Milisavljevic
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Ji Eun Han
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Marcela Moncada-Vélez
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Sevgi Keles
- Division of Pediatric Allergy and Immunology, Necmettin Erbakan University, Meram Medical Faculty, Konya42080, Turkey
| | - Bernd Schröder
- Institute of Physiological Chemistry, Technische Universität Dresden, Dresden01307, Germany
| | - Peter D. Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, CardiffCF14 4XN, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, CardiffCF14 4XN, United Kingdom
| | - Aurélie Cobat
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Bertrand Boisson
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Qian Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Stéphanie Boisson-Dupuis
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Laurent Abel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
- Department of Pediatrics, Necker Hospital for Sick Children, Paris75015, France
- HHMI, New York, NY10065
| |
Collapse
|
59
|
Shinder I, Hu R, Ji HJ, Chao KH, Pertea M. EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes. Nat Commun 2023; 14:7223. [PMID: 37940654 PMCID: PMC10632439 DOI: 10.1038/s41467-023-43017-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/30/2023] [Indexed: 11/10/2023] Open
Abstract
Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR's application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts.
Collapse
Affiliation(s)
- Ida Shinder
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
| | - Richard Hu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Hyun Joo Ji
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
60
|
Sun KY, Bai X, Chen S, Bao S, Kapoor M, Zhang C, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Gioia AD, Rajagopal V, Lopez A, Varela JR, Alegre J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton J, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalog of protein-coding variation in 985,830 individuals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.539329. [PMID: 37214792 PMCID: PMC10197621 DOI: 10.1101/2023.05.09.539329] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | - Adam Locke
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | - Jesus Alegre
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Roberto Tapia-Conyer
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Pablo Kuri-Morales
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jason Torres
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
61
|
Cross NCP, Ernst T, Branford S, Cayuela JM, Deininger M, Fabarius A, Kim DDH, Machova Polakova K, Radich JP, Hehlmann R, Hochhaus A, Apperley JF, Soverini S. European LeukemiaNet laboratory recommendations for the diagnosis and management of chronic myeloid leukemia. Leukemia 2023; 37:2150-2167. [PMID: 37794101 PMCID: PMC10624636 DOI: 10.1038/s41375-023-02048-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/13/2023] [Accepted: 09/20/2023] [Indexed: 10/06/2023]
Abstract
From the laboratory perspective, effective management of patients with chronic myeloid leukemia (CML) requires accurate diagnosis, assessment of prognostic markers, sequential assessment of levels of residual disease and investigation of possible reasons for resistance, relapse or progression. Our scientific and clinical knowledge underpinning these requirements continues to evolve, as do laboratory methods and technologies. The European LeukemiaNet convened an expert panel to critically consider the current status of genetic laboratory approaches to help diagnose and manage CML patients. Our recommendations focus on current best practice and highlight the strengths and pitfalls of commonly used laboratory tests.
Collapse
Affiliation(s)
| | - Thomas Ernst
- Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Susan Branford
- Centre for Cancer Biology and SA Pathology, Adelaide, SA, Australia
| | - Jean-Michel Cayuela
- Laboratory of Hematology, University Hospital Saint-Louis, AP-HP and EA3518, Université Paris Cité, Paris, France
| | | | - Alice Fabarius
- III. Medizinische Klinik, Medizinische Fakultät Mannheim, Universität Heidelberg, Mannheim, Germany
| | - Dennis Dong Hwan Kim
- Department of Medical Oncology and Hematology, Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, Canada
| | | | | | - Rüdiger Hehlmann
- III. Medizinische Klinik, Medizinische Fakultät Mannheim, Universität Heidelberg, Mannheim, Germany
- ELN Foundation, Weinheim, Germany
| | - Andreas Hochhaus
- Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Jane F Apperley
- Centre for Haematology, Imperial College London, London, UK
- Department of Clinical Haematology, Imperial College Healthcare NHS Trust, London, UK
| | - Simona Soverini
- Department of Medical and Surgical Sciences, Institute of Hematology "Lorenzo e Ariosto Seràgnoli", University of Bologna, Bologna, Italy
| |
Collapse
|
62
|
Varabyou A, Sommer MJ, Erdogdu B, Shinder I, Minkin I, Chao KH, Park S, Heinz J, Pockrandt C, Shumate A, Rincon N, Puiu D, Steinegger M, Salzberg SL, Pertea M. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol 2023; 24:249. [PMID: 37904256 PMCID: PMC10614308 DOI: 10.1186/s13059-023-03088-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
| | - Markus J Sommer
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Ida Shinder
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Ilia Minkin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sukhwan Park
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Jakob Heinz
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Christopher Pockrandt
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Natalia Rincon
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
63
|
Ljungdahl A, Kohani S, Page NF, Wells ES, Wigdor EM, Dong S, Sanders SJ. AlphaMissense is better correlated with functional assays of missense impact than earlier prediction algorithms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.24.562294. [PMID: 37961354 PMCID: PMC10634779 DOI: 10.1101/2023.10.24.562294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Missense variants that alter a single amino acid in the encoded protein contribute to many human disorders but pose a substantial challenge in interpretation. Though these variants can be reliably identified through sequencing, distinguishing the clinically significant ones remains difficult, such that "Variants of Unknown Significance" outnumber those classified as "Pathogenic" or "Likely Pathogenic." Numerous in silico approaches have been developed to predict the functional impact of missense variants to inform clinical interpretation, the latest being AlphaMissense, which uses artificial intelligence methods trained on predicted protein structure. To independently assess the performance of AlphaMissense and 38 other predictors of missense severity, we compared predictions to data from multiplexed assays of variant effect (MAVE). MAVE experiments generate almost every possible individual amino acid change in a gene and measure their functional impact using a high-throughput assay. Assessing 17,696 variants across five genes (DDX3X, MSH2, PTEN, KCNQ4, and BRCA1), we find that AlphaMissense is consistently one of the top five algorithms based on correlation with functional impact and is the best-correlated algorithm for two genes. We conclude that AlphaMissense represents the current best-in-class predictor by this metric; however, the improvement over other algorithms is modest. We note that multiple missense predictors, including AlphaMissense, appear to overcall variants as pathogenic despite minimal functional impact and that substantially more high-quality training data, including consistently analyzed patient cohorts and MAVE analyses, are required to improve accuracy.
Collapse
Affiliation(s)
- Alicia Ljungdahl
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sayeh Kohani
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Nicholas F. Page
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eloise S. Wells
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Emilie M. Wigdor
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Shan Dong
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Stephan J. Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
64
|
Kubota N, Takeda R, Kobayashi J, Hidaka E, Nishi E, Takano K, Wakui K. Reanalysis of Chromosomal Microarray Data Using a Smaller Copy Number Variant Call Threshold Identifies Four Cases with Heterozygous Multiexon Deletions of ARID1B, EHMT1, and FOXP1 Genes. Mol Syndromol 2023; 14:394-404. [PMID: 37901861 PMCID: PMC10601822 DOI: 10.1159/000530252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 03/16/2023] [Indexed: 10/31/2023] Open
Abstract
Introduction Chromosomal microarray (CMA) is a highly accurate and established method for detecting copy number variations (CNVs) in clinical genetic testing. CNVs are important etiological factors for disorders such as intellectual disability, developmental delay, and multiple congenital anomalies. Recently developed analytical methods have facilitated the identification of smaller CNVs. Therefore, reanalyzing CMA data using a smaller CNV calling threshold may yield useful information. However, this method was left to the discretion of each institution. Methods We reanalyzed the CMA data of 131 patients using a smaller CNV call threshold: 50 kb 50 probes for gain and 25 kb 25 probes for loss. We interpreted the reanalyzed CNVs based on the most recently available information. In the reanalysis, we filtered the data using the Clinical Genome Resource dosage sensitivity gene list as an index to quickly and efficiently check morbid genes. Results The number of copy number loss was approximately 20 times greater, and copy number gain was approximately three times greater compared to those in the previous analysis. We detected new likely pathogenic CNVs in four participants: a 236.5 kb loss within ARID1B, a 50.6 kb loss including EHMT1, a 46.5 kb loss including EHMT1, and an 89.1 kb loss within the FOXP1 gene. Conclusion The method employed in this study is simple and effective for CMA data reanalysis using a smaller CNV call threshold. Thus, this method is efficient for both ongoing and repeated analyses. This study may stimulate further discussion of reanalysis methodology in clinical laboratories.
Collapse
Affiliation(s)
- Noriko Kubota
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
| | - Ryojun Takeda
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
- Division of Medical Genetics, Nagano Children’s Hospital, Azumino, Japan
| | - Jun Kobayashi
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
| | - Eiko Hidaka
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
| | - Eriko Nishi
- Division of Medical Genetics, Nagano Children’s Hospital, Azumino, Japan
| | - Kyoko Takano
- Division of Medical Genetics, Nagano Children’s Hospital, Azumino, Japan
- Department of Medical Genetics, Shinshu University School of Medicine, Matsumoto, Japan
- Center for Medical Genetics, Shinshu University Hospital, Matsumoto, Japan
| | - Keiko Wakui
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
- Department of Medical Genetics, Shinshu University School of Medicine, Matsumoto, Japan
- Center for Medical Genetics, Shinshu University Hospital, Matsumoto, Japan
| |
Collapse
|
65
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. Nature 2023; 622:41-47. [PMID: 37794265 PMCID: PMC10575709 DOI: 10.1038/s41586-023-06490-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/27/2023] [Indexed: 10/06/2023]
Abstract
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, Sao Paulo, Brazil
| | | | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Tempus Labs, Chicago, IL, USA
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK
| | - Artemis G Hatzigeorgiou
- Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece
- Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Human Technopole, Milan, Italy.
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
66
|
Lang M, Kazdal D, Mohr I, Anamaterou C. Differences and similarities of GTF2I mutated thymomas in different Eurasian ethnic groups. Transl Lung Cancer Res 2023; 12:1842-1844. [PMID: 37854159 PMCID: PMC10579828 DOI: 10.21037/tlcr-23-396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 09/06/2023] [Indexed: 10/20/2023]
Affiliation(s)
- Matthias Lang
- Department of General, Visceral, and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
| | - Daniel Kazdal
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
- Translational Lung Research Center (TLRC) Heidelberg, German Center for Lung Research (DZL), Heidelberg, Germany
| | - Isabelle Mohr
- Department of Internal Medicine IV, Department of Gastroenterology, University Hospital Heidelberg, Heidelberg, Germany
| | | |
Collapse
|
67
|
Martin-Geary AC, Blakes AJM, Dawes R, Findlay SD, Lord J, Walker S, Talbot-Martin J, Wieder N, D’Souza EN, Fernandes M, Hilton S, Lahiri N, Campbell C, Jenkinson S, DeGoede CGEL, Anderson ER, Burge CB, Sanders SJ, Ellingford J, Baralle D, Banka S, Whiffin N. Systematic identification of disease-causing promoter and untranslated region variants in 8,040 undiagnosed individuals with rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.12.23295416. [PMID: 37745552 PMCID: PMC10516070 DOI: 10.1101/2023.09.12.23295416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Background Both promoters and untranslated regions (UTRs) have critical regulatory roles, yet variants in these regions are largely excluded from clinical genetic testing due to difficulty in interpreting pathogenicity. The extent to which these regions may harbour diagnoses for individuals with rare disease is currently unknown. Methods We present a framework for the identification and annotation of potentially deleterious proximal promoter and UTR variants in known dominant disease genes. We use this framework to annotate de novo variants (DNVs) in 8,040 undiagnosed individuals in the Genomics England 100,000 genomes project, which were subject to strict region-based filtering, clinical review, and validation studies where possible. In addition, we performed region and variant annotation-based burden testing in 7,862 unrelated probands against matched unaffected controls. Results We prioritised eleven DNVs and identified an additional variant overlapping one of the eleven. Ten of these twelve variants (82%) are in genes that are a strong match to the individual's phenotype and six had not previously been identified. Through burden testing, we did not observe a significant enrichment of potentially deleterious promoter and/or UTR variants in individuals with rare disease collectively across any of our region or variant annotations. Conclusions Overall, we demonstrate the value of screening promoters and UTRs to uncover additional diagnoses for previously undiagnosed individuals with rare disease and provide a framework for doing so without dramatically increasing interpretation burden.
Collapse
Affiliation(s)
- Alexandra C Martin-Geary
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Alexander J M Blakes
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Ruebena Dawes
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, USA
| | | | | | | | - Nechama Wieder
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Elston N D’Souza
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Maria Fernandes
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Sarah Hilton
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Nayana Lahiri
- St George’s, University of London & St George’s University Hospitals NHS Foundation Trust, Institute of Molecular and Clinical Sciences, London, SW17 0QT, UK
| | - Christopher Campbell
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Sarah Jenkinson
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Christian G E L DeGoede
- Department of Paediatric Neurology, Clinical research Facility, Lancashire Teaching Hospitals NHS Trust
- Manchester Metropolitan University
| | - Emily R Anderson
- Liverpool Centre for Genomic Medicine, Liverpool Women’s Hospital, Liverpool, UK
| | | | - Stephan J Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- New York Genome Center, New York, NY, USA
| | - Jamie Ellingford
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Siddharth Banka
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Nicola Whiffin
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
68
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
69
|
Bohn E, Lau TTY, Wagih O, Masud T, Merico D. A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction. Front Mol Biosci 2023; 10:1257550. [PMID: 37745687 PMCID: PMC10517338 DOI: 10.3389/fmolb.2023.1257550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Variants in 5' and 3' untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3' and 5' UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3' and 188 5' UTR variants were obtained from ClinVar, of which 26 3' and 68 5' UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3' and 5' UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3' and 5' UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.
Collapse
Affiliation(s)
- Emma Bohn
- Deep Genomics Inc., Toronto, ON, Canada
| | | | | | | | - Daniele Merico
- Deep Genomics Inc., Toronto, ON, Canada
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
70
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. PLoS Genet 2023; 19:e1010932. [PMID: 37721944 PMCID: PMC10538656 DOI: 10.1371/journal.pgen.1010932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/28/2023] [Accepted: 08/22/2023] [Indexed: 09/20/2023] Open
Abstract
The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - James D. Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Hans J. Teras
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children’s Hospital, University Hospital LMU Munich, Munich, Germany
| | - William Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
71
|
Hayesmoore JB, Bhuiyan ZA, Coviello DA, du Sart D, Edwards M, Iascone M, Morris-Rosendahl DJ, Sheils K, van Slegtenhorst M, Thomson KL. EMQN: Recommendations for genetic testing in inherited cardiomyopathies and arrhythmias. Eur J Hum Genet 2023; 31:1003-1009. [PMID: 37443332 PMCID: PMC10474043 DOI: 10.1038/s41431-023-01421-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/15/2023] Open
Abstract
Inherited cardiomyopathies and arrhythmias (ICAs) are a prevalent and clinically heterogeneous group of genetic disorders that are associated with increased risk of sudden cardiac death and heart failure. Making a genetic diagnosis can inform the management of patients and their at-risk relatives and, as such, molecular genetic testing is now considered an integral component of the clinical care pathway. However, ICAs are characterised by high genetic and allelic heterogeneity, incomplete / age-related penetrance, and variable expressivity. Therefore, despite our improved understanding of the genetic basis of these conditions, and significant technological advances over the past two decades, identifying and recognising the causative genotype remains challenging. As clinical genetic testing for ICAs becomes more widely available, it is increasingly important for clinical laboratories to consolidate existing knowledge and experience to inform and improve future practice. These recommendations have been compiled to help clinical laboratories navigate the challenges of ICAs and thereby facilitate best practice and consistency in genetic test provision for this group of disorders. General recommendations on internal and external quality control, referral, analysis, result interpretation, and reporting are described. Also included are appendices that provide specific information pertinent to genetic testing for hypertrophic, dilated, and arrhythmogenic right ventricular cardiomyopathies, long QT syndrome, Brugada syndrome, and catecholaminergic polymorphic ventricular tachycardia.
Collapse
Affiliation(s)
- Jesse B Hayesmoore
- Oxford Regional Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Zahurul A Bhuiyan
- Division of Genetic Medicine, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | | | - Desirée du Sart
- Biological Sciences and Genomics, Monash University, Melbourne, VIC, Australia
| | - Matthew Edwards
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - Maria Iascone
- Laboratorio di Genetica Medica, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Deborah J Morris-Rosendahl
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | | | | | - Kate L Thomson
- Oxford Regional Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| |
Collapse
|
72
|
Lee H, Greer SU, Pavlichin DS, Zhou B, Urban AE, Weissman T, Ji HP. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. CELL REPORTS METHODS 2023; 3:100543. [PMID: 37671027 PMCID: PMC10475782 DOI: 10.1016/j.crmeth.2023.100543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 04/14/2023] [Accepted: 07/06/2023] [Indexed: 09/07/2023]
Abstract
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephanie U. Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dmitri S. Pavlichin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alexander E. Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
73
|
Korbecki J, Bosiacki M, Chlubek D, Baranowska-Bosiacka I. Bioinformatic Analysis of the CXCR2 Ligands in Cancer Processes. Int J Mol Sci 2023; 24:13287. [PMID: 37686093 PMCID: PMC10487711 DOI: 10.3390/ijms241713287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Human CXCR2 has seven ligands, i.e., CXCL1, CXCL2, CXCL3, CXCL5, CXCL6, CXCL7, and CXCL8/IL-8-chemokines with nearly identical properties. However, no available study has compared the contribution of all CXCR2 ligands to cancer progression. That is why, in this study, we conducted a bioinformatic analysis using the GEPIA, UALCAN, and TIMER2.0 databases to investigate the role of CXCR2 ligands in 31 different types of cancer, including glioblastoma, melanoma, and colon, esophageal, gastric, kidney, liver, lung, ovarian, pancreatic, and prostate cancer. We focused on the differences in the regulation of expression (using the Tfsitescan and miRDB databases) and analyzed mutation types in CXCR2 ligand genes in cancers (using the cBioPortal). The data showed that the effect of CXCR2 ligands on prognosis depends on the type of cancer. CXCR2 ligands were associated with EMT, angiogenesis, recruiting neutrophils to the tumor microenvironment, and the count of M1 macrophages. The regulation of the expression of each CXCR2 ligand was different and, thus, each analyzed chemokine may have a different function in cancer processes. Our findings suggest that each type of cancer has a unique pattern of CXCR2 ligand involvement in cancer progression, with each ligand having a unique regulation of expression.
Collapse
Affiliation(s)
- Jan Korbecki
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
- Department of Anatomy and Histology, Collegium Medicum, University of Zielona Góra, Zyty 28 St., 65-046 Zielona Góra, Poland
| | - Mateusz Bosiacki
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
- Department of Functional Diagnostics and Physical Medicine, Faculty of Health Sciences, Pomeranian Medical University in Szczecin, Żołnierska Str. 54, 71-210 Szczecin, Poland
| | - Dariusz Chlubek
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
| | - Irena Baranowska-Bosiacka
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
| |
Collapse
|
74
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
75
|
Foreman J, Perrett D, Mazaika E, Hunt SE, Ware JS, Firth HV. DECIPHER: Improving Genetic Diagnosis Through Dynamic Integration of Genomic and Clinical Data. Annu Rev Genomics Hum Genet 2023; 24:151-176. [PMID: 37285546 PMCID: PMC7615097 DOI: 10.1146/annurev-genom-102822-100509] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.
Collapse
Affiliation(s)
- Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel Perrett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Erica Mazaika
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
- Royal Brompton and Harefield Hospitals, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, United Kingdom
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom;
| |
Collapse
|
76
|
Brovkina MV, Chapman MA, Holding ML, Clowney EJ. Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays. BMC Biol 2023; 21:179. [PMID: 37612705 PMCID: PMC10463633 DOI: 10.1186/s12915-023-01673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND The radiation of mammals at the extinction of the dinosaurs produced a plethora of new forms-as diverse as bats, dolphins, and elephants-in only 10-20 million years. Behind the scenes, adaptation to new niches is accompanied by extensive innovation in large families of genes that allow animals to contact the environment, including chemosensors, xenobiotic enzymes, and immune and barrier proteins. Genes in these "outward-looking" families are allelically diverse among humans and exhibit tissue-specific and sometimes stochastic expression. RESULTS Here, we show that these tandem arrays of outward-looking genes occupy AT-biased isochores and comprise the "tissue-specific" gene class that lack CpG islands in their promoters. Models of mammalian genome evolution have not incorporated the sharply different functions and transcriptional patterns of genes in AT- versus GC-biased regions. To examine the relationship between gene family expansion, sequence content, and allelic diversity, we use population genetic data and comparative analysis. First, we find that AT bias can emerge during evolutionary expansion of gene families in cis. Second, human genes in AT-biased isochores or with GC-poor promoters experience relatively low rates of de novo point mutation today but are enriched for non-synonymous variants. Finally, we find that isochores containing gene clusters exhibit low rates of recombination. CONCLUSIONS Our analyses suggest that tolerance of non-synonymous variation and low recombination are two forces that have produced the depletion of GC bases in outward-facing gene arrays. In turn, high AT content exerts a profound effect on their chromatin organization and transcriptional regulation.
Collapse
Affiliation(s)
- Margarita V Brovkina
- Graduate Program in Cellular and Molecular Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Margaret A Chapman
- Neurosciences Graduate Program, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - E Josephine Clowney
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA.
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
77
|
Ahles A, Engelhardt S. Genetic Variants of Adrenoceptors. Handb Exp Pharmacol 2023. [PMID: 37578621 DOI: 10.1007/164_2023_676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Adrenoceptors are class A G-protein-coupled receptors grouped into three families (α1-, α2-, and β-adrenoceptors), each one including three members. All nine corresponding adrenoceptor genes display genetic variation in their coding and adjacent non-coding genomic region. Coding variants, i.e., nucleotide exchanges within the transcribed and translated receptor sequence, may result in a difference in amino acid sequence thus altering receptor function and signaling. Such variants have been intensely studied in vitro in overexpression systems and addressed in candidate-gene studies for distinct clinical parameters. In recent years, large cohorts were analyzed in genome-wide association studies (GWAS), where variants are detected as significant in context with specific traits. These studies identified two of the in-depth characterized 18 coding variants in adrenoceptors as repeatedly statistically significant genetic risk factors - p.Arg389Gly in the β1- and p.Thr164Ile in the β2-adrenoceptor, along with 56 variants in the non-coding regions adjacent to the adrenoceptor gene loci, the functional role of which is largely unknown at present. This chapter summarizes current knowledge on the two coding variants in adrenoceptors that have been consistently validated in GWAS and provides a prospective overview on the numerous non-coding variants more recently attributed to adrenoceptor gene loci.
Collapse
Affiliation(s)
- Andrea Ahles
- Institute of Pharmacology and Toxicology, Technical University of Munich (TUM), Munich, Germany
| | - Stefan Engelhardt
- Institute of Pharmacology and Toxicology, Technical University of Munich (TUM), Munich, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Munich, Germany.
| |
Collapse
|
78
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
79
|
Wang L. Reference-guided search for open reading frames. NATURE COMPUTATIONAL SCIENCE 2023; 3:667-668. [PMID: 38177317 DOI: 10.1038/s43588-023-00497-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Liguo Wang
- Division of Computational Biology, Mayo Clinic College of Medicine and Science, Rochester, MN, USA.
| |
Collapse
|
80
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
81
|
Varabyou A, Erdogdu B, Salzberg SL, Pertea M. Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage. NATURE COMPUTATIONAL SCIENCE 2023; 3:700-708. [PMID: 38098813 PMCID: PMC10718564 DOI: 10.1038/s43588-023-00496-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/05/2023] [Indexed: 12/17/2023]
Abstract
ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Steven L. Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
82
|
Chao KH, Mao A, Salzberg SL, Pertea M. Splam: a deep-learning-based splice site predictor that improves spliced alignments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550754. [PMID: 37546880 PMCID: PMC10402160 DOI: 10.1101/2023.07.27.550754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam's accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant Arabidopsis thaliana. Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
83
|
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez JM, Hunt T, Lagarde J, Liang CE, Li H, Jerryd Meade M, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Hasan Çelik M, Chen Y, Du MR, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Liang Li J, Lienhard M, Mikheenko A, Mulligan D, Ming Nip K, Pertea M, Ritchie ME, Sim AD, Tang AD, Kei Wan Y, Wang C, Wong BY, Yang C, Barnes I, Berry A, Capella S, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Goetz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Laird Smith M, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Fai Au K, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Collapse
Affiliation(s)
- Francisco J. Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- These authors contributed equally to this work
| | - Dingjie Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- These authors contributed equally to this work
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- These authors contributed equally to this work
| | - Jane E. Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Maite De María
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Matthew S. Adams
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Amit K. Behera
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Jose M. Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Flomics Biotech, Dr Aiguader 88, Barcelona 08003, Spain
- These authors contributed equally to this work
| | - Cindy E. Liang
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Haoran Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Marcus Jerryd Meade
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- These authors contributed equally to this work
| | - David A. Moraga Amador
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Andrey D. Prjibelski
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
- These authors contributed equally to this work
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Hamed Bostan
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Ashley M. Brooks
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Muhammed Hasan Çelik
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Mei R,M. Du
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Colette Felton
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Ralf Herwig
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Hideya Kawaji
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Joseph Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Jian Liang Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Matthias Lienhard
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Alla Mikheenko
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Dennis Mulligan
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Matthew E. Ritchie
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Andre D. Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Alison D. Tang
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Changqing Wang
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Brandon Y. Wong
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Namrita Dhillon
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | | | - Luis Ferrández-Peral
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Natàlia Garcia-Reyero
- Environmental Laboratory, US Army Engineer Research & Development Center, Vicksburg, USA
| | | | | | | | | | | | | | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nedka G. Panayotova
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | | | - Eric Rouchka
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Brandon Saint-John
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Enrique Sapena
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, UK
| | - Leon Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
| | - Melissa Laird Smith
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Hazuki Takahashi
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
| | | | - Piero Carninci
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
- Human Technopole, Milano, Italy
| | - Nancy D. Denslow
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, Department of Physiological Sciences,, University of Florida, Gainesville, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Margaret E. Hunter
- U.S. Geological Survey, Wetland and Aquatic Research Center, Gainesville, USA
| | - Hagen U. Tilgner
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Gloria M. Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- Center for Public Health Genomics
- UVA Cancer Center, University of Virginia, Charlottesville, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, USA
| | - Angela N. Brooks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| |
Collapse
|
84
|
Sansbury SE, Serebrenik YV, Lapidot T, Burslem GM, Shalem O. Pooled tagging and hydrophobic targeting of endogenous proteins for unbiased mapping of unfolded protein responses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.13.548611. [PMID: 37503003 PMCID: PMC10370017 DOI: 10.1101/2023.07.13.548611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
System-level understanding of proteome organization and function requires methods for direct visualization and manipulation of proteins at scale. We developed an approach enabled by high-throughput gene tagging for the generation and analysis of complex cell pools with endogenously tagged proteins. Proteins are tagged with HaloTag to enable visualization or direct perturbation. Fluorescent labeling followed by in situ sequencing and deep learning-based image analysis identifies the localization pattern of each tag, providing a bird's-eye-view of cellular organization. Next, we use a hydrophobic HaloTag ligand to misfold tagged proteins, inducing spatially restricted proteotoxic stress that is read out by single cell RNA sequencing. By integrating optical and perturbation data, we map compartment-specific responses to protein misfolding, revealing inter-compartment organization and direct crosstalk, and assigning proteostasis functions to uncharacterized genes. Altogether, we present a powerful and efficient method for large-scale studies of proteome dynamics, function, and homeostasis.
Collapse
|
85
|
Sweatt AJ, Griffiths CD, Paudel BB, Janes KA. Proteome-wide copy-number estimation from transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548432. [PMID: 37503057 PMCID: PMC10369941 DOI: 10.1101/2023.07.10.548432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Protein copy numbers constrain systems-level properties of regulatory networks, but absolute proteomic data remain scarce compared to transcriptomics obtained by RNA sequencing. We addressed this persistent gap by relating mRNA to protein statistically using best-available data from quantitative proteomics-transcriptomics for 4366 genes in 369 cell lines. The approach starts with a central estimate of protein copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model that links mRNAs to protein. For dozens of independent cell lines and primary prostate samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, and empirical protein-to-mRNA ratios. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein interaction complexes, suggesting mechanistic relationships are embedded. We use the method to estimate viral-receptor abundances of CD55-CXADR from human heart transcriptomes and build 1489 systems-biology models of coxsackievirus B3 infection susceptibility. When applied to 796 RNA sequencing profiles of breast cancer from The Cancer Genome Atlas, inferred copy-number estimates collectively reclassify 26% of Luminal A and 29% of Luminal B tumors. Protein-based reassignments strongly involve a pharmacologic target for luminal breast cancer (CDK4) and an α-catenin that is often undetectable at the mRNA level (CTTNA2). Thus, by adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility limits of contemporary proteomics. The collection of gene-specific models is assembled as a web tool for users seeking mRNA-guided predictions of absolute protein abundance (http://janeslab.shinyapps.io/Pinferna).
Collapse
Affiliation(s)
- Andrew J. Sweatt
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
| | - Cameron D. Griffiths
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
| | - B. Bishal Paudel
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
| | - Kevin A. Janes
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908
- Department of Biochemistry & Molecular Genetics, University of Virginia, Charlottesville, VA, 22908
| |
Collapse
|
86
|
Walker LC, Hoya MDL, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A, Zimmermann H, Byrne AB, Pesaran T, Karam R, Harrison SM, Spurdle AB. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet 2023; 110:1046-1067. [PMID: 37352859 PMCID: PMC10357475 DOI: 10.1016/j.ajhg.2023.06.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/25/2023] Open
Abstract
The American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1, PS3, PP3, BS3, BP4, and BP7. However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. We utilized empirically derived splicing evidence to (1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, (2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and (3) exemplify methodology to calibrate splice prediction tools. We propose repurposing the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely, BP7 may be used to capture RNA results demonstrating no splicing impact for intronic and synonymous variants. We propose that the PS3/BS3 codes are applied only for well-established assays that measure functional impact not directly captured by RNA-splicing assays. We recommend the application of PS1 based on similarity of predicted RNA-splicing effects for a variant under assessment in comparison with a known pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA-assay evidence described aim to help standardize variant pathogenicity classification processes when interpreting splicing-based evidence.
Collapse
Affiliation(s)
- Logan C Walker
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Miguel de la Hoya
- Molecular Oncology Laboratory, CIBERONC, Hospital Clinico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - George A R Wiggins
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | | | | | - Michael T Parsons
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Daffodil M Canson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | | | | | | | - Alicia B Byrne
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Steven M Harrison
- Ambry Genetics, Aliso Viejo, CA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia; Faculty of Medicine, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
87
|
Hamza A, El-Sissy C, Yousfi N, Martins PV, Rafat C, Masliah-Planchon J, Frémeaux-Bacchi V, Mesnard L. The absence of CFHR3 and CFHR1 genes from the T2T-CHM13 assembly can limit the molecular diagnosis of complement-related diseases. Eur J Hum Genet 2023; 31:730-732. [PMID: 37032353 PMCID: PMC10325998 DOI: 10.1038/s41431-023-01350-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 03/14/2023] [Accepted: 03/20/2023] [Indexed: 04/11/2023] Open
Affiliation(s)
- Abderaouf Hamza
- Department of Genetics, Institut Curie, PSL Research University, Paris, France
| | - Carine El-Sissy
- Department of Biological Immunology, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Nadhir Yousfi
- Unité Mixte de Recherche S1155, Institut National de la Santé et de la Recherche Médicale (INSERM), Paris, France
| | - Paula Vieira Martins
- Department of Biological Immunology, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Cédric Rafat
- Service de Soins Intensifs Néphrologiques et Rein Aigu (SINRA), French Intensive Renal Network, Hôpital Tenon, Assistance Publique-Hôpitaux de Paris, Paris, France
- Faculté de Médecine, Sorbonne Université, Paris, France
| | | | - Véronique Frémeaux-Bacchi
- Department of Biological Immunology, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, Paris, France
- Unité Mixte de Recherche S1138, Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Recherche des Cordeliers, Paris, France
| | - Laurent Mesnard
- Unité Mixte de Recherche S1155, Institut National de la Santé et de la Recherche Médicale (INSERM), Paris, France.
- Service de Soins Intensifs Néphrologiques et Rein Aigu (SINRA), French Intensive Renal Network, Hôpital Tenon, Assistance Publique-Hôpitaux de Paris, Paris, France.
- Faculté de Médecine, Sorbonne Université, Paris, France.
- Institut des Sciences du Calcul et des Données, Sorbonne Université, Paris, France.
| |
Collapse
|
88
|
Bucalo A, Conti G, Valentini V, Capalbo C, Bruselles A, Tartaglia M, Bonanni B, Calistri D, Coppa A, Cortesi L, Giannini G, Gismondi V, Manoukian S, Manzella L, Montagna M, Peterlongo P, Radice P, Russo A, Tibiletti MG, Turchetti D, Viel A, Zanna I, Palli D, Silvestri V, Ottini L. Male breast cancer risk associated with pathogenic variants in genes other than BRCA1/2: an Italian case-control study. Eur J Cancer 2023; 188:183-191. [PMID: 37262986 DOI: 10.1016/j.ejca.2023.04.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/24/2023] [Accepted: 04/26/2023] [Indexed: 06/03/2023]
Abstract
BACKGROUND Germline pathogenic variants (PVs) in BRCA1/2 genes are associated with breast cancer (BC) risk in both women and men. Multigene panel testing is being increasingly used for BC risk assessment, allowing the identification of PVs in genes other than BRCA1/2. While data on actionable PVs in other cancer susceptibility genes are now available in female BC, reliable data are still lacking in male BC (MBC). This study aimed to provide the patterns, prevalence and risk estimates associated with PVs in non-BRCA1/2 genes for MBC in order to improve BC prevention for male patients. METHODS We performed a large case-control study in the Italian population, including 767 BRCA1/2-negative MBCs and 1349 male controls, all screened using a custom 50 cancer gene panel. RESULTS PVs in genes other than BRCA1/2 were significantly more frequent in MBCs compared with controls (4.8% vs 1.8%, respectively) and associated with a threefold increased MBC risk (OR: 3.48, 95% CI: 1.88-6.44; p < 0.0001). PV carriers were more likely to have personal (p = 0.03) and family (p = 0.02) history of cancers, not limited to BC. PALB2 PVs were associated with a sevenfold increased MBC risk (OR: 7.28, 95% CI: 1.17-45.52; p = 0.034), and ATM PVs with a fivefold increased MBC risk (OR: 4.79, 95% CI: 1.12-20.56; p = 0.035). CONCLUSIONS This study highlights the role of PALB2 and ATM PVs in MBC susceptibility and provides risk estimates at population level. These data may help in the implementation of multigene panel testing in MBC patients and inform gender-specific BC risk management and decision making for patients and their families.
Collapse
Affiliation(s)
- Agostino Bucalo
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Giulia Conti
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Virginia Valentini
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Carlo Capalbo
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Marco Tartaglia
- Molecular Genetics and Functional Genomics Research Unit, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome, Italy
| | - Bernardo Bonanni
- Division of Cancer Prevention and Genetics, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Daniele Calistri
- Istituto Romagnolo per lo Studio dei Tumori "Dino Amadori"-IRST IRCCS, Meldola, Italy
| | - Anna Coppa
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Laura Cortesi
- Department of Oncology and Haematology, University of Modena and Reggio Emilia, Modena, Italy
| | - Giuseppe Giannini
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy; Istituto Pasteur-Fondazione Cenci Bolognetti, Rome, Italy
| | - Viviana Gismondi
- Hereditary Cancer Unit, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Siranoush Manoukian
- Unità di Genetica Medica, Dipartimento di Oncologia Medica ed Ematologia, Fondazione IRCCS Istituto Nazionale dei Tumori (INT), Milan, Italy
| | - Livia Manzella
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Marco Montagna
- Immunology and Molecular Oncology Unit, Veneto Institute of Oncology IOV - IRCCS, Padua, Italy
| | - Paolo Peterlongo
- Genome Diagnostics Program, IFOM ETS - The AIRC Institute of Molecular Oncology, Milan, Italy
| | - Paolo Radice
- Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Research, Fondazione IRCCS Istituto Nazionale Dei Tumori (INT), Milan, Italy
| | - Antonio Russo
- Section of Medical Oncology, Department of Surgical and Oncological Sciences, University of Palermo, Palermo, Italy
| | - Maria Grazia Tibiletti
- Dipartimento di Patologia, ASST Settelaghi and Centro di Ricerca per lo studio dei tumori eredo-familiari, Università dell'Insubria, Varese, Italy
| | - Daniela Turchetti
- Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
| | - Alessandra Viel
- Unità di Oncogenetica e Oncogenomica Funzionale, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano, Italy
| | - Ines Zanna
- Cancer Risk Factors and Lifestyle Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network (ISPRO), Florence, Italy
| | - Domenico Palli
- Cancer Risk Factors and Lifestyle Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network (ISPRO), Florence, Italy
| | | | - Laura Ottini
- Department of Molecular Medicine, Sapienza University of Rome, Rome, Italy.
| |
Collapse
|
89
|
Kovačević M, Milićević O, Branković M, Janković M, Novaković I, Sokić D, Ristić A, Shamsani J, Vojvodić N. Novel variants in established epilepsy genes in focal epilepsy. Seizure 2023; 110:146-152. [PMID: 37390664 DOI: 10.1016/j.seizure.2023.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/30/2023] [Accepted: 06/06/2023] [Indexed: 07/02/2023] Open
Abstract
INTRODUCTION Next generation sequencing (NGS) has greatly expanded our understanding of genetic contributors in multiple epilepsy syndromes, including focal epilepsy. Describing the genetic architecture of common syndromes promises to facilitate the diagnostic process as well as aid in the identification of patients who stand to benefit from genetic testing, but most studies to date have been limited to examining children or adults with intellectual disability. Our aim was to determine the yield of targeted sequencing of 5 established epilepsy genes (DEPDC5, LGI1, SCN1A, GRIN2A, and PCHD19) in an extensively phenotyped cohort of focal epilepsy patients with normal intellectual function or mild intellectual disability, as well as describe novel variants and determine the characteristics of variant carriers. PATIENTS AND METHODS Targeted panel sequencing was performed on 96 patients with a strong clinical suspicion of genetic focal epilepsy. Patients had previously gone through a comprehensive diagnostic epilepsy evaluation in The Neurology Clinic, University Clinical Center of Serbia. Variants of interest (VOI) were classified using the American College of Medical Genetics and the Association for Molecular Pathology criteria. RESULTS Six VOI in eight (8/96, 8.3%) patients were found in our cohort. Four likely pathogenic VOI were determined in six (6/96, 6.2%) patients, two DEPDC5 variants in two patients, one SCN1A variant in two patients and one PCDH19 variant in two patients. One variant of unknown significance (VUS) was found in GRIN2A in one (1/96, 1.0%) patient. Only one VOI in GRIN2A was classified as likely benign. No VOI were detected in LGI1. CONCLUSION Sequencing of only five known epilepsy genes yielded a diagnostic result in 6.2% of our cohort and revealed multiple novel variants. Further research is necessary for a better understanding of the genetic basis in common epilepsy syndromes in patients with normal intellectual function or mild intellectual disability.
Collapse
Affiliation(s)
- Maša Kovačević
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia.
| | | | | | - Milena Janković
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ivana Novaković
- Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Dragoslav Sokić
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Aleksandar Ristić
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | | | - Nikola Vojvodić
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia; Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
90
|
Florian K, Benet-Pagès A, Berner D, Teubert A, Eck S, Arnold N, Bauer P, Begemann M, Sturm M, Kleinle S, B. Haack T, Eggermann T. Quality assurance within the context of genome diagnostics (a german perspective). MED GENET-BERLIN 2023; 35:91-104. [PMID: 38840862 PMCID: PMC10842579 DOI: 10.1515/medgen-2023-2028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
The rapid and dynamic implementation of Next-Generation Sequencing (NGS)-based assays has revolutionized genetic testing, and in the near future, nearly all molecular alterations of the human genome will be diagnosable via massive parallel sequencing. While this progress will further corroborate the central role of human genetics in the multidisciplinary management of patients with genetic disorders, it must be accompanied by quality assurance measures in order to allow the safe and optimal use of knowledge ascertained from genome diagnostics. To achieve this, several valuable tools and guidelines have been developed to support the quality of genome diagnostics. In this paper, authors with experience in diverse aspects of genomic analysis summarize the current status of quality assurance in genome diagnostics, with the aim of facilitating further standardization and quality improvement in one of the core competencies of the field.
Collapse
Affiliation(s)
- Kraft Florian
- Medizinische Fakultät der RWTH AachenInstitut für Humangenetik und GenommedizinAachenDeutschland
| | - Anna Benet-Pagès
- Institut für NeurogenomikHelmholtz Zentrum MünchenNeuherbergDeutschland
| | | | | | | | - Norbert Arnold
- Universitätsklinikum Schleswig-HolsteinZentrum für familiären Brust- und Eierstockkrebs; Klinik für Gynäkologie und GeburtshilfeKielDeutschland
| | | | - Matthias Begemann
- Medizinische Fakultät der RWTH AachenInstitut für Humangenetik und GenommedizinAachenDeutschland
| | - Marc Sturm
- Universität TübingenInstitut für Medizinische Genetik und Angewandte GenomikTübingenDeutschland
| | | | - Tobias B. Haack
- Universität TübingenInstitut für Medizinische Genetik und Angewandte GenomikTübingenDeutschland
| | - Thomas Eggermann
- Medizinische Fakultät der RWTH AachenInstitut für Humangenetik und GenommedizinPauwelsstr. 3052074AachenDeutschland
| |
Collapse
|
91
|
Ameratunga R, Edwards ESJ, Lehnert K, Leung E, Woon ST, Lea E, Allan C, Chan L, Steele R, Longhurst H, Bryant VL. The Rapidly Expanding Genetic Spectrum of Common Variable Immunodeficiency-Like Disorders. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2023; 11:1646-1664. [PMID: 36796510 DOI: 10.1016/j.jaip.2023.01.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/21/2023] [Accepted: 01/27/2023] [Indexed: 02/16/2023]
Abstract
The understanding of common variable immunodeficiency disorders (CVID) is in evolution. CVID was previously a diagnosis of exclusion. New diagnostic criteria have allowed the disorder to be identified with greater precision. With the advent of next-generation sequencing (NGS), it has become apparent that an increasing number of patients with a CVID phenotype have a causative genetic variant. If a pathogenic variant is identified, these patients are removed from the overarching diagnosis of CVID and are deemed to have a CVID-like disorder. In populations where consanguinity is more prevalent, the majority of patients with severe primary hypogammaglobulinemia will have an underlying inborn error of immunity, usually an early-onset autosomal recessive disorder. In nonconsanguineous societies, pathogenic variants are identified in approximately 20% to 30% of patients. These are often autosomal dominant mutations with variable penetrance and expressivity. To add to the complexity of CVID and CVID-like disorders, some genetic variants such as those in TNFSF13B (transmembrane activator calcium modulator cyclophilin ligand interactor) predispose to, or enhance, disease severity. These variants are not causative but can have epistatic (synergistic) interactions with more deleterious mutations to worsen disease severity. This review is a description of the current understanding of genes associated with CVID and CVID-like disorders. This information will assist clinicians in interpreting NGS reports when investigating the genetic basis of disease in patients with a CVID phenotype.
Collapse
Affiliation(s)
- Rohan Ameratunga
- Department of Clinical immunology, Auckland Hospital, Auckland, New Zealand; Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand; Department of Molecular Medicine and Pathology, School of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand.
| | - Emily S J Edwards
- The Jeffrey Modell Diagnostic and Research Centre for Primary Immunodeficiencies, and Allergy and Clinical Immunology Laboratory, Department of Immunology, Monash University, Melbourne, VIC, Australia
| | - Klaus Lehnert
- Applied Translational Genetics Group, School of Biological Sciences, University of Auckland, Auckland, New Zealand; Maurice Wilkins Centre, School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Euphemia Leung
- Auckland Cancer Society Research Centre, School of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - See-Tarn Woon
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand
| | - Edward Lea
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand
| | - Caroline Allan
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand
| | - Lydia Chan
- Department of Clinical immunology, Auckland Hospital, Auckland, New Zealand
| | - Richard Steele
- Department of Virology and Immunology, Auckland Hospital, Auckland, New Zealand; Department of Respiratory Medicine, Wellington Hospital, Wellington, New Zealand
| | - Hilary Longhurst
- Department of Medicine, School of Medicine, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Vanessa L Bryant
- Department of Immunology, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia; Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia; Department of Clinical Immunology and Allergy, Royal Melbourne Hospital, Parkville, VIC, Australia
| |
Collapse
|
92
|
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y, Borsari B, Morabito S, Liang HY, McGill CJ, Rahmanian S, Sakr J, Jiang S, Zeng W, Carvalho K, Weimer AK, Dionne LA, McShane A, Bedi K, Elhajjajy SI, Upchurch S, Jou J, Youngworth I, Gabdank I, Sud P, Jolanki O, Strattan JS, Kagda MS, Snyder MP, Hitz BC, Moore JE, Weng Z, Bennett D, Reinholdt L, Ljungman M, Beer MA, Gerstein MB, Pachter L, Guigó R, Wold BJ, Mortazavi A. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.15.540865. [PMID: 37292896 PMCID: PMC10245583 DOI: 10.1101/2023.05.15.540865] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
Collapse
Affiliation(s)
- Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Muhammed Hasan Çelik
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Narges Rezaie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Milad Razavi-Mohseni
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Samuel Morabito
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Heidi Yahan Liang
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Cassandra J McGill
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Jasmine Sakr
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Shan Jiang
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Weihua Zeng
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Klebea Carvalho
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Louise A Dionne
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Ariel McShane
- Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, USA
- Department of Radiation Oncology, University of Michigan, Ann Arbor, USA
| | - Karan Bedi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
| | - Shaimae I Elhajjajy
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Sean Upchurch
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ingrid Youngworth
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ben C Hitz
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - David Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, USA
| | - Laura Reinholdt
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Mats Ljungman
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
- Departments of Radiation Oncology and Environmental Health Sciences, University of Michigan, Ann Arbor, USA
| | - Michael A Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, USA
- Department of Statistics and Data Science, Yale University, New Haven, USA
- Department of Computer Science, Yale University, New Haven, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| |
Collapse
|
93
|
Toomata Z, Leask M, Krishnan M, Cadzow M, Dalbeth N, Stamp LK, de Zoysa J, Merriman T, Wilcox P, Dewes O, Murphy R. Genetic testing for misclassified monogenic diabetes in Māori and Pacific peoples in Aōtearoa New Zealand with early-onset type 2 diabetes. Front Endocrinol (Lausanne) 2023; 14:1174699. [PMID: 37234800 PMCID: PMC10206310 DOI: 10.3389/fendo.2023.1174699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 04/20/2023] [Indexed: 05/28/2023] Open
Abstract
Aims Monogenic diabetes accounts for 1-2% of diabetes cases yet is often misdiagnosed as type 2 diabetes. The aim of this study was to examine in Māori and Pacific adults clinically diagnosed with type 2 diabetes within 40 years of age, (a) the prevalence of monogenic diabetes in this population (b) the prevalence of beta-cell autoantibodies and (c) the pre-test probability of monogenic diabetes. Methods Targeted sequencing data of 38 known monogenic diabetes genes was analyzed in 199 Māori and Pacific peoples with BMI of 37.9 ± 8.6 kg/m2 who had been diagnosed with type 2 diabetes between 3 and 40 years of age. A triple-screen combined autoantibody assay was used to test for GAD, IA-2, and ZnT8. MODY probability calculator score was generated in those with sufficient clinical information (55/199). Results No genetic variants curated as likely pathogenic or pathogenic were found. One individual (1/199) tested positive for GAD/IA-2/ZnT8 antibodies. The pre-test probability of monogenic diabetes was calculated in 55 individuals with 17/55 (31%) scoring above the 20% threshold considered for diagnostic testing referral. Discussion Our findings suggest that monogenic diabetes is rare in Māori and Pacific people with clinical age, and the MODY probability calculator likely overestimates the likelihood of a monogenic cause for diabetes in this population.
Collapse
Affiliation(s)
- Zanetta Toomata
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - Megan Leask
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Mohanraj Krishnan
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pittsburgh, PA, United States
| | - Murray Cadzow
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Nicola Dalbeth
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
| | - Lisa K. Stamp
- Department of Medicine, University of Otago, Christchurch, Christchurch, New Zealand
| | - Janak de Zoysa
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
| | - Tony Merriman
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Phillip Wilcox
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
| | - Ofa Dewes
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- Langimalie Research Centre, Auckland, New Zealand
- Centre of Methods and Policy Application in the Social Sciences, The University of Auckland, Auckland, New Zealand
| | - Rinki Murphy
- Department of Medicine, Waipapa Taumata Rau, The University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| |
Collapse
|
94
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
95
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
96
|
Hofman DA, Ruiz-Orera J, Yannuzzi I, Murugesan R, Brown A, Clauser KR, Condurat AL, van Dinter JT, Engels SA, Goodale A, van der Lugt J, Abid T, Wang L, Zhou KN, Vogelzang J, Ligon KL, Phoenix TN, Roth JA, Root DE, Hubner N, Golub TR, Bandopadhayay P, van Heesch S, Prensner JR. Translation of non-canonical open reading frames as a cancer cell survival mechanism in childhood medulloblastoma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539399. [PMID: 37205492 PMCID: PMC10187264 DOI: 10.1101/2023.05.04.539399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A hallmark of high-risk childhood medulloblastoma is the dysregulation of RNA translation. Currently, it is unknown whether medulloblastoma dysregulates the translation of putatively oncogenic non-canonical open reading frames. To address this question, we performed ribosome profiling of 32 medulloblastoma tissues and cell lines and observed widespread non-canonical ORF translation. We then developed a step-wise approach to employ multiple CRISPR-Cas9 screens to elucidate functional non-canonical ORFs implicated in medulloblastoma cell survival. We determined that multiple lncRNA-ORFs and upstream open reading frames (uORFs) exhibited selective functionality independent of the main coding sequence. One of these, ASNSD1-uORF or ASDURF, was upregulated, associated with the MYC family oncogenes, and was required for medulloblastoma cell survival through engagement with the prefoldin-like chaperone complex. Our findings underscore the fundamental importance of non-canonical ORF translation in medulloblastoma and provide a rationale to include these ORFs in future cancer genomics studies seeking to define new cancer targets.
Collapse
Affiliation(s)
- Damon A. Hofman
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
- These authors contributed equally
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
- These authors contributed equally
| | - Ian Yannuzzi
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | | | - Adam Brown
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Current address: Arbor Biotechnologies, Cambridge, MA, 02140, USA
| | - Karl R. Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alexandra L. Condurat
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jip T. van Dinter
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Sem A.G. Engels
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Amy Goodale
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jasper van der Lugt
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Tanaz Abid
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Li Wang
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Kevin N. Zhou
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Current address: Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA, 91101, USA
| | - Jayne Vogelzang
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, 02215, USA
| | - Keith L. Ligon
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, 02215, USA
- Department of Pathology, Boston Children’s Hospital, Boston MA 02115
| | - Timothy N. Phoenix
- Division of Pharmaceutical Sciences, James L. Winkle College of Pharmacy, University of Cincinnati, Cincinnati, OH, 45229, USA
| | | | - David E. Root
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
- Charité-Universitätsmedizin, 10117 Berlin, Germany
- German Centre for Cardiovascular Research, Partner Site Berlin, 13347 Berlin, Germany
| | - Todd R. Golub
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115, USA
| | - Pratiti Bandopadhayay
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - John R. Prensner
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115, USA
- Current address: Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
97
|
Pagni S, Custodio HM, Frankish A, Mudge JM, Mills JD, Sisodiya SM. SCN1A: bioinformatically informed revised boundaries for promoter and enhancer regions. Hum Mol Genet 2023; 32:1753-1763. [PMID: 36715146 PMCID: PMC10162429 DOI: 10.1093/hmg/ddad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/06/2023] [Accepted: 01/24/2023] [Indexed: 01/31/2023] Open
Abstract
Pathogenic variations in the sodium voltage-gated channel alpha subunit 1 (SCN1A) gene are responsible for multiple epilepsy phenotypes, including Dravet syndrome, febrile seizures (FS) and genetic epilepsy with FS plus. Phenotypic heterogeneity is a hallmark of SCN1A-related epilepsies, the causes of which are yet to be clarified. Genetic variation in the non-coding regulatory regions of SCN1A could be one potential causal factor. However, a comprehensive understanding of the SCN1A regulatory landscape is currently lacking. Here, we summarized the current state of knowledge of SCN1A regulation, providing details on its promoter and enhancer regions. We then integrated currently available data on SCN1A promoters by extracting information related to the SCN1A locus from genome-wide repositories and clearly defined the promoter and enhancer regions of SCN1A. Further, we explored the cellular specificity of differential SCN1A promoter usage. We also reviewed and integrated the available human brain-derived enhancer databases and mouse-derived data to provide a comprehensive computationally developed summary of SCN1A brain-active enhancers. By querying genome-wide data repositories, extracting SCN1A-specific data and integrating the different types of independent evidence, we created a comprehensive catalogue that better defines the regulatory landscape of SCN1A, which could be used to explore the role of SCN1A regulatory regions in disease.
Collapse
Affiliation(s)
- Susanna Pagni
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
| | - Helena Martins Custodio
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - James D Mills
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
- Amsterdam UMC, Department of (Neuro) Pathology, Amsterdam Neuroscience, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| | - Sanjay M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Bucks SL9 0RJ, UK
| |
Collapse
|
98
|
Zhang Q, Shao M. Transcript Assembly and Annotations: Bias and Adjustment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.20.537700. [PMID: 37131680 PMCID: PMC10153229 DOI: 10.1101/2023.04.20.537700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Motivation Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. Results We investigate the impact of annotations on transcript assembly. We observe that conflicting conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
- Huck Institutes of the Life Sciences, The Pennsylvania State University
| |
Collapse
|
99
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. Systematic visualisation of molecular QTLs reveals variant mechanisms at GWAS loci. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535816. [PMID: 37066341 PMCID: PMC10104061 DOI: 10.1101/2023.04.06.535816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Splicing quantitative trait loci (QTLs) have been implicated as a common mechanism underlying complex trait associations. However, utilising splicing QTLs in target discovery and prioritisation has been challenging due to extensive data normalisation which often renders the direction of the genetic effect as well as its magnitude difficult to interpret. This is further complicated by the fact that strong expression QTLs often manifest as weak splicing QTLs and vice versa, making it difficult to uniquely identify the underlying molecular mechanism at each locus. We find that these ambiguities can be mitigated by visualising the association between the genotype and average RNA sequencing read coverage in the region. Here, we generate these QTL coverage plots for 1.7 million molecular QTL associations in the eQTL Catalogue identified with five quantification methods. We illustrate the utility of these QTL coverage plots by performing colocalisation between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. We find that while visually confirmed splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases. All our association summary statistics and QTL coverage plots are freely available at https://www.ebi.ac.uk/eqtl/.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - James D Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Hans J Teras
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital LMU Munich, Munich, Germany
| | - Will Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, 51009, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
100
|
Omenn GS, Lane L, Overall CM, Pineau C, Packer NH, Cristea IM, Lindskog C, Weintraub ST, Orchard S, Roehrl MH, Nice E, Liu S, Bandeira N, Chen YJ, Guo T, Aebersold R, Moritz RL, Deutsch EW. The 2022 Report on the Human Proteome from the HUPO Human Proteome Project. J Proteome Res 2023; 22:1024-1042. [PMID: 36318223 PMCID: PMC10081950 DOI: 10.1021/acs.jproteome.2c00498] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".
Collapse
Affiliation(s)
- Gilbert S. Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015 Lausanne, Switzerland
| | | | - Charles Pineau
- French Institute of Health and Medical Research, 35042 RENNES Cedex, France
| | - Nicolle H. Packer
- Macquarie University, Sydney, NSW 2109, Australia
- Griffith University’s Institute for Glycomics, Sydney, NSW 2109, Australia
| | | | | | - Susan T. Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas 78229-3900, United States
| | - Sandra Orchard
- EMBL-EBI, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Michael H.A. Roehrl
- Memorial Sloan Kettering Cancer Center, New York, New York, 10065, United States
| | | | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California 92093, United States
| | - Yu-Ju Chen
- National Taiwan University, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Tiannan Guo
- Westlake University Guomics Laboratory of Big Proteomic Data, Hangzhou 310024, Zhejiang Province, China
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092 Zurich, Switzerland
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|