51
|
Kelleher KJ, Sheils TK, Mathias SL, Yang JJ, Metzger V, Siramshetty V, Nguyen DT, Jensen LJ, Vidović D, Schürer S, Holmes J, Sharma K, Pillai A, Bologa C, Edwards J, Mathé E, Oprea T. Pharos 2023: an integrated resource for the understudied human proteome. Nucleic Acids Res 2023; 51:D1405-D1416. [PMID: 36624666 PMCID: PMC9825581 DOI: 10.1093/nar/gkac1033] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/12/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Collapse
Affiliation(s)
- Keith J Kelleher
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Timothy K Sheils
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Stephen L Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Vincent T Metzger
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Vishal B Siramshetty
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen 2200, Copenhagen, Denmark
| | - Dušica Vidović
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Stephan C Schürer
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Karlie R Sharma
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ajay Pillai
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy S Edwards
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Ewy A Mathé
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| |
Collapse
|
52
|
Holm L, Laiho A, Törönen P, Salgado M. DALI shines a light on remote homologs: One hundred discoveries. Protein Sci 2023; 32:e4519. [PMID: 36419248 PMCID: PMC9793968 DOI: 10.1002/pro.4519] [Citation(s) in RCA: 308] [Impact Index Per Article: 154.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/15/2022] [Accepted: 11/20/2022] [Indexed: 11/25/2022]
Abstract
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.
Collapse
Affiliation(s)
- Liisa Holm
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Aleksi Laiho
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Petri Törönen
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Marco Salgado
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
53
|
Pauza AG, Murphy D, Paton JFR. Transcriptomics of the Carotid Body. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1427:1-11. [PMID: 37322330 DOI: 10.1007/978-3-031-32371-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The carotid body (CB) has emerged as a potential therapeutic target for treating sympathetically mediated cardiovascular, respiratory, and metabolic diseases. In adjunct to its classical role as an arterial O2 sensor, the CB is a multimodal sensor activated by a range of stimuli in the circulation. However, consensus on how CB multimodality is achieved is lacking; even the best studied O2-sensing appears to involve multiple convergent mechanisms. A strategy to understand multimodal sensing is to adopt a hypothesis-free, high-throughput transcriptomic approach. This has proven instrumental for understanding fundamental mechanisms of CB response to hypoxia and other stimulants, its developmental niche, cellular heterogeneity, laterality, and pathophysiological remodeling in disease states. Herein, we review this published work that reveals novel molecular mechanisms underpinning multimodal sensing and reveals numerous gaps in knowledge that require experimental testing.
Collapse
Affiliation(s)
- Audrys G Pauza
- Manaaki Manawa - The Centre for Heart Research, Department of Physiology, Faculty of Medical & Health Sciences, University of Auckland, Auckland, New Zealand.
| | - David Murphy
- Molecular Neuroendocrinology Research Group, Bristol Medical School, Translational Health Sciences, University of Bristol, Bristol, UK
| | - Julian F R Paton
- Manaaki Manawa - The Centre for Heart Research, Department of Physiology, Faculty of Medical & Health Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
54
|
De Paolis Kaluza MC, Jain S, Radivojac P. An Approach to Identifying and Quantifying Bias in Biomedical Data. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:311-322. [PMID: 36540987 PMCID: PMC9782737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Data biases are a known impediment to the development of trustworthy machine learning models and their application to many biomedical problems. When biased data is suspected, the assumption that the labeled data is representative of the population must be relaxed and methods that exploit a typically representative unlabeled data must be developed. To mitigate the adverse effects of unrepresentative data, we consider a binary semi-supervised setting and focus on identifying whether the labeled data is biased and to what extent. We assume that the class-conditional distributions were generated by a family of component distributions represented at different proportions in labeled and unlabeled data. We also assume that the training data can be transformed to and subsequently modeled by a nested mixture of multivariate Gaussian distributions. We then develop a multi-sample expectation-maximization algorithm that learns all individual and shared parameters of the model from the combined data. Using these parameters, we develop a statistical test for the presence of the general form of bias in labeled data and estimate the level of this bias by computing the distance between corresponding class-conditional distributions in labeled and unlabeled data. We first study the new methods on synthetic data to understand their behavior and then apply them to real-world biomedical data to provide evidence that the bias estimation procedure is both possible and effective.
Collapse
|
55
|
Papadakos KS, Ekström A, Slipek P, Skourti E, Reid S, Pietras K, Blom AM. Sushi domain-containing protein 4 binds to epithelial growth factor receptor and initiates autophagy in an EGFR phosphorylation independent manner. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2022; 41:363. [PMID: 36578014 PMCID: PMC9798675 DOI: 10.1186/s13046-022-02565-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/07/2022] [Indexed: 12/29/2022]
Abstract
BACKGROUND Sushi domain-containing protein 4 (SUSD4) is a recently discovered protein with unknown cellular functions. We previously revealed that SUSD4 can act as complement inhibitor and as a potential tumor suppressor. METHODS In a syngeneic mouse model of breast cancer, tumors expressing SUSD4 had a smaller volume compared with the corresponding mock control tumors. Additionally, data from three different expression databases and online analysis tools confirm that for breast cancer patients, high mRNA expression of SUSD4 in the tumor tissue correlates with a better prognosis. In vitro experiments utilized triple-negative breast cancer cell lines (BT-20 and MDA-MB-468) stably expressing SUSD4. Moreover, we established a cell line based on BT-20 in which the gene for EGFR was knocked out with the CRISPR-Cas9 method. RESULTS We discovered that the Epithelial Growth Factor Receptor (EGFR) interacts with SUSD4. Furthermore, triple-negative breast cancer cell lines stably expressing SUSD4 had higher autophagic flux. The initiation of autophagy required the expression of EGFR but not phosphorylation of the receptor. Expression of SUSD4 in the breast cancer cells led to activation of the tumor suppressor LKB1 and consequently to the activation of AMPKα1. Finally, autophagy was initiated after stimulation of the ULK1, Atg14 and Beclin-1 axis in SUSD4 expressing cells. CONCLUSIONS In this study we provide novel insight into the molecular mechanism of action whereby SUSD4 acts as an EGFR inhibitor without affecting the phosphorylation of the receptor and may potentially influence the recycling of EGFR to the plasma membrane.
Collapse
Affiliation(s)
- Konstantinos S. Papadakos
- grid.4514.40000 0001 0930 2361Division of Medical Protein Chemistry, Department of Translational Medicine, Lund University, Inga Maria Nilsson’s street 53, 214 28 Malmö, Sweden
| | - Alexander Ekström
- grid.4514.40000 0001 0930 2361Division of Medical Protein Chemistry, Department of Translational Medicine, Lund University, Inga Maria Nilsson’s street 53, 214 28 Malmö, Sweden
| | - Piotr Slipek
- grid.4514.40000 0001 0930 2361Division of Medical Protein Chemistry, Department of Translational Medicine, Lund University, Inga Maria Nilsson’s street 53, 214 28 Malmö, Sweden
| | - Eleni Skourti
- grid.4514.40000 0001 0930 2361Division of Medical Protein Chemistry, Department of Translational Medicine, Lund University, Inga Maria Nilsson’s street 53, 214 28 Malmö, Sweden
| | - Steven Reid
- grid.4514.40000 0001 0930 2361Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Kristian Pietras
- grid.4514.40000 0001 0930 2361Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Anna M. Blom
- grid.4514.40000 0001 0930 2361Division of Medical Protein Chemistry, Department of Translational Medicine, Lund University, Inga Maria Nilsson’s street 53, 214 28 Malmö, Sweden
| |
Collapse
|
56
|
Delmas M, Filangi O, Duperier C, Paulhe N, Vinson F, Rodriguez-Mier P, Giacomoni F, Jourdan F, Frainay C. Suggesting disease associations for overlooked metabolites using literature from metabolic neighbors. Gigascience 2022; 12:giad065. [PMID: 37712592 PMCID: PMC10502579 DOI: 10.1093/gigascience/giad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 06/13/2023] [Accepted: 07/28/2023] [Indexed: 09/16/2023] Open
Abstract
In human health research, metabolic signatures extracted from metabolomics data have a strong added value for stratifying patients and identifying biomarkers. Nevertheless, one of the main challenges is to interpret and relate these lists of discriminant metabolites to pathological mechanisms. This task requires experts to combine their knowledge with information extracted from databases and the scientific literature. However, we show that most compounds (>99%) in the PubChem database lack annotated literature. This dearth of available information can have a direct impact on the interpretation of metabolic signatures, which is often restricted to a subset of significant metabolites. To suggest potential pathological phenotypes related to overlooked metabolites that lack annotated literature, we extend the "guilt-by-association" principle to literature information by using a Bayesian framework. The underlying assumption is that the literature associated with the metabolic neighbors of a compound can provide valuable insights, or an a priori, into its biomedical context. The metabolic neighborhood of a compound can be defined from a metabolic network and correspond to metabolites to which it is connected through biochemical reactions. With the proposed approach, we suggest more than 35,000 associations between 1,047 overlooked metabolites and 3,288 diseases (or disease families). All these newly inferred associations are freely available on the FORUM ftp server (see information at https://github.com/eMetaboHUB/Forum-LiteraturePropagation).
Collapse
Affiliation(s)
- Maxime Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Olivier Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Christophe Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Florence Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Pablo Rodriguez-Mier
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Fabien Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Clément Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| |
Collapse
|
57
|
Stankiewicz AM, Jaszczyk A, Goscik J, Juszczak GR. Stress and the brain transcriptome: Identifying commonalities and clusters in standardized data from published experiments. Prog Neuropsychopharmacol Biol Psychiatry 2022; 119:110558. [PMID: 35405299 DOI: 10.1016/j.pnpbp.2022.110558] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 03/17/2022] [Accepted: 04/04/2022] [Indexed: 12/28/2022]
Abstract
Interpretation of transcriptomic experiments is hindered by many problems including false positives/negatives inherent to big-data methods and changes in gene nomenclature. To find the most consistent effect of stress on brain transcriptome, we retrieved data from 79 studies applying animal models and 3 human studies investigating post-traumatic stress disorder (PTSD). The analyzed data were obtained either with microarrays or RNA sequencing applied to samples collected from more than 1887 laboratory animals and from 121 human subjects. Based on the initial database containing a quarter million differential expression effect sizes representing transcripts in three species, we identified the most frequently reported genes in 223 stress-control comparisons. Additionally, the analysis considers sex, individual vulnerability and contribution of glucocorticoids. We also found an overlap between gene expression in PTSD patients and animals which indicates relevance of laboratory models for human stress response. Our analysis points to genes that, as far as we know, were not specifically tested for their role in stress response (Pllp, Arrdc2, Midn, Mfsd2a, Ccn1, Htra1, Csrnp1, Tenm4, Tnfrsf25, Sema3b, Fmo2, Adamts4, Gjb1, Errfi1, Fgf18, Galnt6, Slc25a42, Ifi30, Slc4a1, Cemip, Klf10, Tom1, Dcdc2c, Fancd2, Luzp2, Trpm1, Abcc12, Osbpl1a, Ptp4a2). Provided transcriptomic resource will be useful for guiding the new research.
Collapse
Affiliation(s)
- Adrian M Stankiewicz
- Department of Molecular Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Jastrzebiec, Poland
| | - Aneta Jaszczyk
- Department of Animal Behavior and Welfare, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Jastrzebiec, Poland
| | - Joanna Goscik
- Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland
| | - Grzegorz R Juszczak
- Department of Animal Behavior and Welfare, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Jastrzebiec, Poland.
| |
Collapse
|
58
|
Functional genomic tools for emerging model species. Trends Ecol Evol 2022; 37:1104-1115. [PMID: 35914975 DOI: 10.1016/j.tree.2022.07.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 01/12/2023]
Abstract
Most studies in the field of ecology and evolution aiming to connect genotype to phenotype rarely validate identified loci using functional tools. Recent developments in RNA interference (RNAi) and clustered regularly interspaced palindromic repeats (CRISPR)-Cas genome editing have dramatically increased the feasibility of functional validation. However, these methods come with specific challenges when applied to emerging model organisms, including limited spatial control of gene silencing, low knock-in efficiencies, and low throughput of functional validation. Moreover, many functional studies to date do not recapitulate ecologically relevant variation, and this limits their scope for deeper insights into evolutionary processes. We therefore argue that increased use of gene editing by allelic replacement through homology-directed repair (HDR) would greatly benefit the field of ecology and evolution.
Collapse
|
59
|
Yu JSL, Heineike BM, Hartl J, Aulakh SK, Correia-Melo C, Lehmann A, Lemke O, Agostini F, Lee CT, Demichev V, Messner CB, Mülleder M, Ralser M. Inorganic sulfur fixation via a new homocysteine synthase allows yeast cells to cooperatively compensate for methionine auxotrophy. PLoS Biol 2022; 20:e3001912. [PMID: 36455053 PMCID: PMC9757880 DOI: 10.1371/journal.pbio.3001912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/16/2022] [Accepted: 11/14/2022] [Indexed: 12/03/2022] Open
Abstract
The assimilation, incorporation, and metabolism of sulfur is a fundamental process across all domains of life, yet how cells deal with varying sulfur availability is not well understood. We studied an unresolved conundrum of sulfur fixation in yeast, in which organosulfur auxotrophy caused by deletion of the homocysteine synthase Met17p is overcome when cells are inoculated at high cell density. In combining the use of self-establishing metabolically cooperating (SeMeCo) communities with proteomic, genetic, and biochemical approaches, we discovered an uncharacterized gene product YLL058Wp, herein named Hydrogen Sulfide Utilizing-1 (HSU1). Hsu1p acts as a homocysteine synthase and allows the cells to substitute for Met17p by reassimilating hydrosulfide ions leaked from met17Δ cells into O-acetyl-homoserine and forming homocysteine. Our results show that cells can cooperate to achieve sulfur fixation, indicating that the collective properties of microbial communities facilitate their basic metabolic capacity to overcome sulfur limitation.
Collapse
Affiliation(s)
- Jason S. L. Yu
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Benjamin M. Heineike
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Johannes Hartl
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Simran K. Aulakh
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Clara Correia-Melo
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Andrea Lehmann
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Oliver Lemke
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Federica Agostini
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Cory T. Lee
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| | - Christoph B. Messner
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Michael Mülleder
- Core Facility—High Throughput Mass Spectrometry, Charité Universitätsmedizin, Berlin, Germany
| | - Markus Ralser
- Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, United Kingdom
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
| |
Collapse
|
60
|
Stoeger T, Grant RA, McQuattie-Pimentel AC, Anekalla KR, Liu SS, Tejedor-Navarro H, Singer BD, Abdala-Valencia H, Schwake M, Tetreault MP, Perlman H, Balch WE, Chandel NS, Ridge KM, Sznajder JI, Morimoto RI, Misharin AV, Budinger GRS, Nunes Amaral LA. Aging is associated with a systemic length-associated transcriptome imbalance. NATURE AGING 2022; 2:1191-1206. [PMID: 37118543 PMCID: PMC10154227 DOI: 10.1038/s43587-022-00317-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 10/21/2022] [Indexed: 12/14/2022]
Abstract
Aging is among the most important risk factors for morbidity and mortality. To contribute toward a molecular understanding of aging, we analyzed age-resolved transcriptomic data from multiple studies. Here, we show that transcript length alone explains most transcriptional changes observed with aging in mice and humans. We present three lines of evidence supporting the biological importance of the uncovered transcriptome imbalance. First, in vertebrates the length association primarily displays a lower relative abundance of long transcripts in aging. Second, eight antiaging interventions of the Interventions Testing Program of the National Institute on Aging can counter this length association. Third, we find that in humans and mice the genes with the longest transcripts enrich for genes reported to extend lifespan, whereas those with the shortest transcripts enrich for genes reported to shorten lifespan. Our study opens fundamental questions on aging and the organization of transcriptomes.
Collapse
Affiliation(s)
- Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA.
| | - Rogan A Grant
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, USA
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
| | | | - Kishore R Anekalla
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
| | - Sophia S Liu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | | | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
- Simpson Querrey Lung Institute for Translational Science at Northwestern University (SQLIFTSNU), Evanston, IL, USA
- Department of Biochemistry and Molecular Genetics, Northwestern University, Evanston, IL, USA
| | - Hiam Abdala-Valencia
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
| | - Michael Schwake
- Department of Neurology, Northwestern University, Evanston, IL, USA
- Faculty of Chemistry, University of Bielefeld, Bielefeld, Germany
| | - Marie-Pier Tetreault
- Division of Gastroenterology and Hepatology, Northwestern University, Evanston, IL, USA
| | - Harris Perlman
- Division of Rheumatology, Northwestern University, Evanston, IL, USA
| | | | - Navdeep S Chandel
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
- Simpson Querrey Lung Institute for Translational Science at Northwestern University (SQLIFTSNU), Evanston, IL, USA
| | - Karen M Ridge
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
- Simpson Querrey Lung Institute for Translational Science at Northwestern University (SQLIFTSNU), Evanston, IL, USA
| | - Jacob I Sznajder
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA
- Simpson Querrey Lung Institute for Translational Science at Northwestern University (SQLIFTSNU), Evanston, IL, USA
| | - Richard I Morimoto
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, USA.
- Rice Institute for Biomedical Research, Northwestern University, Evanston, IL, USA.
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA.
- Simpson Querrey Lung Institute for Translational Science at Northwestern University (SQLIFTSNU), Evanston, IL, USA.
| | - G R Scott Budinger
- Division of Pulmonary and Critical Care Medicine, Northwestern University, Evanston, IL, USA.
- Simpson Querrey Lung Institute for Translational Science at Northwestern University (SQLIFTSNU), Evanston, IL, USA.
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA.
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
61
|
Byrne JA, Park Y, Richardson RAK, Pathmendra P, Sun M, Stoeger T. Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 2022; 50:12058-12070. [PMID: 36477580 PMCID: PMC9757046 DOI: 10.1093/nar/gkac1139] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022] Open
Abstract
Human gene research generates new biology insights with translational potential, yet few studies have considered the health of the human gene literature. The accessibility of human genes for targeted research, combined with unreasonable publication pressures and recent developments in scholarly publishing, may have created a market for low-quality or fraudulent human gene research articles, including articles produced by contract cheating organizations known as paper mills. This review summarises the evidence that paper mills contribute to the human gene research literature at scale and outlines why targeted gene research may be particularly vulnerable to systematic research fraud. To raise awareness of targeted gene research from paper mills, we highlight features of problematic manuscripts and publications that can be detected by gene researchers and/or journal staff. As improved awareness and detection could drive the further evolution of paper mill-supported publications, we also propose changes to academic publishing to more effectively deter and correct problematic publications at scale. In summary, the threat of paper mill-supported gene research highlights the need for all researchers to approach the literature with a more critical mindset, and demand publications that are underpinned by plausible research justifications, rigorous experiments and fully transparent reporting.
Collapse
Affiliation(s)
- Jennifer A Byrne
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
- NSW Health Statewide Biobank, NSW Health Pathology, Camperdown, NSW, Australia
| | - Yasunori Park
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Reese A K Richardson
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Pranujan Pathmendra
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, NSW, Australia
| | - Mengyi Sun
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Evanston, USA
- Center for Genetic Medicine, Northwestern University School of Medicine, Chicago, USA
| |
Collapse
|
62
|
Sales de Queiroz A, Sales Santa Cruz G, Jean-Marie A, Mazauric D, Roux J, Cazals F. Gene prioritization based on random walks with restarts and absorbing states, to define gene sets regulating drug pharmacodynamics from single-cell analyses. PLoS One 2022; 17:e0268956. [PMID: 36342924 PMCID: PMC9639845 DOI: 10.1371/journal.pone.0268956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 05/12/2022] [Indexed: 11/09/2022] Open
Abstract
Prioritizing genes for their role in drug sensitivity, is an important step in understanding drugs mechanisms of action and discovering new molecular targets for co-treatment. To formalize this problem, we consider two sets of genes X and P respectively composing the gene signature of cell sensitivity at the drug IC50 and the genes involved in its mechanism of action, as well as a protein interaction network (PPIN) containing the products of X and P as nodes. We introduce Genetrank, a method to prioritize the genes in X for their likelihood to regulate the genes in P. Genetrank uses asymmetric random walks with restarts, absorbing states, and a suitable renormalization scheme. Using novel so-called saturation indices, we show that the conjunction of absorbing states and renormalization yields an exploration of the PPIN which is much more progressive than that afforded by random walks with restarts only. Using MINT as underlying network, we apply Genetrank to a predictive gene signature of cancer cells sensitivity to tumor-necrosis-factor-related apoptosis-inducing ligand (TRAIL), performed in single-cells. Our ranking provides biological insights on drug sensitivity and a gene set considerably enriched in genes regulating TRAIL pharmacodynamics when compared to the most significant differentially expressed genes obtained from a statistical analysis framework alone. We also introduce gene expression radars, a visualization tool embedded in MA plots to assess all pairwise interactions at a glance on graphical representations of transcriptomics data. Genetrank is made available in the Structural Bioinformatics Library (https://sbl.inria.fr/doc/Genetrank-user-manual.html). It should prove useful for mining gene sets in conjunction with a signaling pathway, whenever other approaches yield relatively large sets of genes.
Collapse
Affiliation(s)
| | | | | | | | - Jérémie Roux
- CNRS UMR 7284, Inserm U 1081, Institut de Recherche sur le Cancer et le Vieillissement de Nice, Centre Antoine Lacassagne, Universite Côte d’Azur, Nice, France
- * E-mail: (FC); (JR)
| | - Frédéric Cazals
- Inria, Université Côte d’Azur, Nice, France
- * E-mail: (FC); (JR)
| |
Collapse
|
63
|
Gable AL, Szklarczyk D, Lyon D, Matias Rodrigues JF, von Mering C. Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments. Brief Bioinform 2022; 23:bbac355. [PMID: 36088548 PMCID: PMC9487593 DOI: 10.1093/bib/bbac355] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/13/2022] [Accepted: 07/30/2022] [Indexed: 11/14/2022] Open
Abstract
A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.
Collapse
Affiliation(s)
- Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - David Lyon
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | | | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
64
|
Seale C, Tepeli Y, Gonçalves JP. Overcoming selection bias in synthetic lethality prediction. Bioinformatics 2022; 38:4360-4368. [PMID: 35876858 PMCID: PMC9477536 DOI: 10.1093/bioinformatics/btac523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/13/2022] [Accepted: 07/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data. RESULTS We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples. AVAILABILITY AND IMPLEMENTATION https://github.com/joanagoncalveslab/sbsl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Colm Seale
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands
- Holland Proton Therapy Center (HollandPTC), Delft 2600 AC, The Netherlands
| | - Yasin Tepeli
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands
| | - Joana P Gonçalves
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands
| |
Collapse
|
65
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:baac062. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology, Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier, Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai, Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center, Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories, Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW, Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs, Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
66
|
Li C, Feng Y, Fu Z, Deng J, Gu Y, Wang H, Wu X, Huang Z, Zhu Y, Liu Z, Huang M, Wang T, Hu S, Yao B, Zeng Y, Zhou CJ, Brown SDM, Liu Y, Vidal-Puig A, Dong Y, Xu Y. Human-specific gene CT47 blocks PRMT5 degradation to lead to meiosis arrest. Cell Death Discov 2022; 8:345. [PMID: 35918318 PMCID: PMC9345867 DOI: 10.1038/s41420-022-01139-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 07/13/2022] [Accepted: 07/18/2022] [Indexed: 11/25/2022] Open
Abstract
Exploring the functions of human-specific genes (HSGs) is challenging due to the lack of a tractable genetic model system. Testosterone is essential for maintaining human spermatogenesis and fertility, but the underlying mechanism is unclear. Here, we identified Cancer/Testis Antigen gene family 47 (CT47) as an essential regulator of human-specific spermatogenesis by stabilizing arginine methyltransferase 5 (PRMT5). A humanized mouse model revealed that CT47 functions to arrest spermatogenesis by interacting with and regulating CT47/PRMT5 accumulation in the nucleus during the leptotene/zygotene-to-pachytene transition of meiosis. We demonstrate that testosterone induces nuclear depletion of CT47/PRMT5 and rescues leptotene-arrested spermatocyte progression in humanized testes. Loss of CT47 in human embryonic stem cells (hESCs) by CRISPR/Cas9 led to an increase in haploid cells but blocked the testosterone-induced increase in haploid cells when hESCs were differentiated into haploid spermatogenic cells. Moreover, CT47 levels were decreased in nonobstructive azoospermia. Together, these results established CT47 as a crucial regulator of human spermatogenesis by preventing meiosis initiation before the testosterone surge.
Collapse
Affiliation(s)
- Chao Li
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Yuming Feng
- Department of Reproductive Medical Center, Jinling Hospital, Medical School of Nanjing University, Nanjing, Jiangsu, 210002, China
| | - Zhenxin Fu
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Junjie Deng
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Yue Gu
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Hanben Wang
- State Key Laboratory of Reproductive Medicine (SKLRM), Nanjing Medical University, Nanjing, Jiangsu, 210029, China
| | - Xin Wu
- State Key Laboratory of Reproductive Medicine (SKLRM), Nanjing Medical University, Nanjing, Jiangsu, 210029, China
| | - Zhengyun Huang
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Yichen Zhu
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Zhiwei Liu
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Moli Huang
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Tao Wang
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Shijun Hu
- Department of Cardiovascular Surgery of the First Affiliated Hospital & Institute for Cardiovascular Science, Collaborative Innovation Center of Hematology, State Key Laboratory of Radiation Medicine and Protection, Medical College, Soochow University, Suzhou, 215000, China
| | - Bing Yao
- Department of Reproductive Medical Center, Jinling Hospital, Medical School of Nanjing University, Nanjing, Jiangsu, 210002, China
| | - Yizhun Zeng
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China
| | - Chengji J Zhou
- Department of Biochemistry and Molecular Medicine, University of California at Davis, School of Medicine, Sacramento, CA, USA
| | - Steve D M Brown
- Medical Research Council (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, UK
| | - Yi Liu
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Antonio Vidal-Puig
- University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, MDU MRC, Cambridge, UK
| | - Yingying Dong
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China.
| | - Ying Xu
- Cambridge-Su Genomic Resource Center, Jiangsu Key Laboratory of Neuropsychiatric Diseases, Medical School of Soochow University, Suzhou, Jiangsu, 215123, China.
| |
Collapse
|
67
|
Wasilewska K, Gambin T, Rydzanicz M, Szczałuba K, Płoski R. Postzygotic mutations and where to find them - Recent advances and future implications in the field of non-neoplastic somatic mosaicism. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2022; 790:108426. [PMID: 35690331 DOI: 10.1016/j.mrrev.2022.108426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 05/05/2022] [Accepted: 06/03/2022] [Indexed: 01/01/2023]
Abstract
The technological progress of massively parallel sequencing (MPS) has triggered a remarkable development in the research on postzygotic mutations. Although the overwhelming majority of studies in the field focus on oncogenesis, non-neoplastic diseases are attracting more and more attention. The aim of this review was to summarize some of the most recent findings in the field of somatic mosaicism in diseases other than neoplastic events. We discuss the abundance and role of postzygotic mutations, with a special emphasis on disorders which occur only in a mosaic form (obligatory mosaic diseases; OMDs). Based on the list of OMDs compiled from the published literature and three databases (OMIM, Orphanet and MosaicBase), we demonstrate the prevalence of cancer-related genes across OMDs and suggest other sources to further explore OMDs and OMD-related genes. Additionally, we comment on some practical aspects related to mosaic diseases, such as approaches to tissue sampling, the MPS coverage required to detect variants at a very low frequency, as well as on bioinformatic and molecular tools dedicated to detect somatic mutations in MPS data.
Collapse
Affiliation(s)
- Krystyna Wasilewska
- Department of Medical Genetics, Medical University of Warsaw, ul. Pawińskiego 3c, 02-106 Warsaw, Poland
| | - Tomasz Gambin
- Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Małgorzata Rydzanicz
- Department of Medical Genetics, Medical University of Warsaw, ul. Pawińskiego 3c, 02-106 Warsaw, Poland
| | - Krzysztof Szczałuba
- Department of Medical Genetics, Medical University of Warsaw, ul. Pawińskiego 3c, 02-106 Warsaw, Poland
| | - Rafał Płoski
- Department of Medical Genetics, Medical University of Warsaw, ul. Pawińskiego 3c, 02-106 Warsaw, Poland.
| |
Collapse
|
68
|
|
69
|
Sharma VS, Fossati A, Ciuffa R, Buljan M, Williams EG, Chen Z, Shao W, Pedrioli PGA, Purcell AW, Martínez MR, Song J, Manica M, Aebersold R, Li C. PCfun: a hybrid computational framework for systematic characterization of protein complex function. Brief Bioinform 2022; 23:6611913. [PMID: 35724564 PMCID: PMC9310514 DOI: 10.1093/bib/bbac239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 05/05/2022] [Accepted: 05/21/2022] [Indexed: 11/14/2022] Open
Abstract
In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.
Collapse
Affiliation(s)
- Varun S Sharma
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland.,CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Andrea Fossati
- Quantitative Biosciences Institute (QBI) and Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA 94158, USA.,J. David Gladstone Institutes, San Francisco, CA 94158, USA
| | - Rodolfo Ciuffa
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
| | - Marija Buljan
- Empa - Swiss Federal Laboratories for Materials Science and Technology, St. Gallen, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Evan G Williams
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette Luxembourg
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
| | - Patrick G A Pedrioli
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
| | - Anthony W Purcell
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | | | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | | | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland.,Faculty of Science, University of Zurich, Switzerland
| | - Chen Li
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland.,Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
70
|
Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber J. An open invitation to the Understudied Proteins Initiative. Nat Biotechnol 2022; 40:815-817. [PMID: 35534555 DOI: 10.1038/s41587-022-01316-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Georg Kustatscher
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK.
| | | | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.,Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, China
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Emma Lundberg
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden.,Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Pathology, Stanford University, Stanford, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, USA
| | - Markus Ralser
- Department of Biochemistry, Charité University Medicine, Berlin, Germany.,The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Juri Rappsilber
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK. .,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany. .,Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
71
|
Amaral LAN. A cautionary tale from the machine scientist. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00491-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
72
|
|
73
|
Park Y, West RA, Pathmendra P, Favier B, Stoeger T, Capes-Davis A, Cabanac G, Labbé C, Byrne JA. Identification of human gene research articles with wrongly identified nucleotide sequences. Life Sci Alliance 2022; 5:e202101203. [PMID: 35022248 PMCID: PMC8807875 DOI: 10.26508/lsa.202101203] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 12/27/2021] [Accepted: 12/28/2021] [Indexed: 01/01/2023] Open
Abstract
Nucleotide sequence reagents underpin molecular techniques that have been applied across hundreds of thousands of publications. We have previously reported wrongly identified nucleotide sequence reagents in human research publications and described a semi-automated screening tool Seek & Blastn to fact-check their claimed status. We applied Seek & Blastn to screen >11,700 publications across five literature corpora, including all original publications in Gene from 2007 to 2018 and all original open-access publications in Oncology Reports from 2014 to 2018. After manually checking Seek & Blastn outputs for >3,400 human research articles, we identified 712 articles across 78 journals that described at least one wrongly identified nucleotide sequence. Verifying the claimed identities of >13,700 sequences highlighted 1,535 wrongly identified sequences, most of which were claimed targeting reagents for the analysis of 365 human protein-coding genes and 120 non-coding RNAs. The 712 problematic articles have received >17,000 citations, including citations by human clinical trials. Given our estimate that approximately one-quarter of problematic articles may misinform the future development of human therapies, urgent measures are required to address unreliable gene research articles.
Collapse
Affiliation(s)
- Yasunori Park
- Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Rachael A West
- Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
- Children's Cancer Research Unit, Kids Research, The Children's Hospital at Westmead, Westmead, Australia
| | | | - Bertrand Favier
- Université Grenoble Alpes, Translationnelle et Innovation en Médecine et Complexité, Grenoble, France
| | - Thomas Stoeger
- Successful Clinical Response in Pneumonia Therapy Systems Biology Center, Northwestern University, Evanston, IL, USA
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Center for Genetic Medicine, Northwestern University School of Medicine, Chicago, IL, USA
| | - Amanda Capes-Davis
- Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
- CellBank Australia, Children's Medical Research Institute, Westmead, Australia
| | - Guillaume Cabanac
- Computer Science Department, Institut de Recherche en Informatique de Toulouse, Unité Mixte de Recherche 5505 Centre National de la Recherche Scientifique (CNRS), University of Toulouse, Toulouse, France
| | - Cyril Labbé
- Université Grenoble Alpes, CNRS, Grenoble INP, Laboratoire d'Informatique de Grenoble, Grenoble, France
| | - Jennifer A Byrne
- Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
- New South Wales Health Statewide Biobank, New South Wales Health Pathology, Camperdown, Australia
| |
Collapse
|
74
|
Cho NH, Cheveralls KC, Brunner AD, Kim K, Michaelis AC, Raghavan P, Kobayashi H, Savy L, Li JY, Canaj H, Kim JY, Stewart EM, Gnann C, McCarthy F, Cabrera JP, Brunetti RM, Chhun BB, Dingle G, Hein MY, Huang B, Mehta SB, Weissman JS, Gómez-Sjöberg R, Itzhak DN, Royer LA, Mann M, Leonetti MD. OpenCell: Endogenous tagging for the cartography of human cellular organization. Science 2022; 375:eabi6983. [PMID: 35271311 PMCID: PMC9119736 DOI: 10.1126/science.abi6983] [Citation(s) in RCA: 275] [Impact Index Per Article: 91.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Elucidating the wiring diagram of the human cell is a central goal of the postgenomic era. We combined genome engineering, confocal live-cell imaging, mass spectrometry, and data science to systematically map the localization and interactions of human proteins. Our approach provides a data-driven description of the molecular and spatial networks that organize the proteome. Unsupervised clustering of these networks delineates functional communities that facilitate biological discovery. We found that remarkably precise functional information can be derived from protein localization patterns, which often contain enough information to identify molecular interactions, and that RNA binding proteins form a specific subgroup defined by unique interaction and localization properties. Paired with a fully interactive website (opencell.czbiohub.org), our work constitutes a resource for the quantitative cartography of human cellular organization.
Collapse
Affiliation(s)
| | | | - Andreas-David Brunner
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Kibeom Kim
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - André C. Michaelis
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Laura Savy
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Jason Y. Li
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Hera Canaj
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | | | | | - Christian Gnann
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden
| | | | | | - Rachel M. Brunetti
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
| | | | - Greg Dingle
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | | | - Bo Huang
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
| | | | - Jonathan S. Weissman
- Whitehead Institute, Koch Institute, Howard Hughes Medical Institute, and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | | | | | | | - Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
75
|
Peterson KA, Murray SA. Progress towards completing the mutant mouse null resource. Mamm Genome 2022; 33:123-134. [PMID: 34698892 PMCID: PMC8913489 DOI: 10.1007/s00335-021-09905-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/10/2021] [Indexed: 11/13/2022]
Abstract
The generation of a comprehensive catalog of null alleles covering all protein-coding genes is the goal of the International Mouse Phenotyping Consortium. Over the past 20 years, significant progress has been made towards achieving this goal through the combined efforts of many large-scale programs that built an embryonic stem cell resource to generate knockout mice and more recently employed CRISPR/Cas9-based mutagenesis to delete critical regions predicted to result in frameshift mutations, thus, ablating gene function. The IMPC initiative builds on prior and ongoing work by individual research groups creating gene knockouts in the mouse. Here, we analyze the collective efforts focusing on the combined null allele resource resulting from strains developed by the research community and large-scale production programs. Based upon this pooled analysis, we examine the remaining fraction of protein-coding genes focusing on clearly defined mouse-human orthologs as the highest priority for completing the mutant mouse null resource. In summary, we find that there are less than 3400 mouse-human orthologs remaining in the genome without a targeted null allele that can be further prioritized to achieve our overall goal of the complete functional annotation of the protein-coding portion of a mammalian genome.
Collapse
|
76
|
Multiplexed Genome Editing for Efficient Phenotypic Screening in Zebrafish. Vet Sci 2022; 9:vetsci9020092. [PMID: 35202345 PMCID: PMC8879510 DOI: 10.3390/vetsci9020092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 12/30/2022] Open
Abstract
Zebrafish are widely used to investigate candidate genes for human diseases. While the emergence of CRISPR-Cas9 technology has revolutionized gene editing, the use of individual guide RNAs limits the efficiency and application of this technology in functional genetics research. Multiplexed genome editing significantly enhances the efficiency and scope of gene editing. Herein, we describe an efficient multiplexed genome editing strategy to generate zebrafish mutants. Following behavioural tests and histological examination, we identified one new candidate gene (tmem183a) for hearing loss. This study provides a robust genetic platform to quickly obtain zebrafish mutants and to identify candidate genes by phenotypic readouts.
Collapse
|
77
|
Roberts RG, on behalf of the PLOS Biology Staff Editors. The first six years of meta-research at PLOS Biology. PLoS Biol 2022; 20:e3001553. [PMID: 35100252 PMCID: PMC8830785 DOI: 10.1371/journal.pbio.3001553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 02/10/2022] [Indexed: 11/18/2022] Open
Affiliation(s)
- Roland G. Roberts
- Public Library of Science, San Francisco, California, United States of America and Cambridge, United Kingdom
- * E-mail:
| | | |
Collapse
|
78
|
Saxena A, Sharma V, Muthuirulan P, Neufeld SJ, Tran MP, Gutierrez HL, Chen KD, Erberich JM, Birmingham A, Capellini TD, Cobb J, Hiller M, Cooper KL. Interspecies transcriptomics identify genes that underlie disproportionate foot growth in jerboas. Curr Biol 2022; 32:289-303.e6. [PMID: 34793695 PMCID: PMC8792248 DOI: 10.1016/j.cub.2021.10.063] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 07/16/2021] [Accepted: 10/28/2021] [Indexed: 01/26/2023]
Abstract
Despite the great diversity of vertebrate limb proportion and our deep understanding of the genetic mechanisms that drive skeletal elongation, little is known about how individual bones reach different lengths in any species. Here, we directly compare the transcriptomes of homologous growth cartilages of the mouse (Mus musculus) and bipedal jerboa (Jaculus jaculus), the latter of which has "mouse-like" arms but extremely long metatarsals of the feet. Intersecting gene-expression differences in metatarsals and forearms of the two species revealed that about 10% of orthologous genes are associated with the disproportionately rapid elongation of neonatal jerboa feet. These include genes and enriched pathways not previously associated with endochondral elongation as well as those that might diversify skeletal proportion in addition to their known requirements for bone growth throughout the skeleton. We also identified transcription regulators that might act as "nodes" for sweeping differences in genome expression between species. Among these, Shox2, which is necessary for proximal limb elongation, has gained expression in jerboa metatarsals where it has not been detected in other vertebrates. We show that Shox2 is sufficient to increase mouse distal limb length, and a nearby putative cis-regulatory region is preferentially accessible in jerboa metatarsals. In addition to mechanisms that might directly promote growth, we found evidence that jerboa foot elongation may occur in part by de-repressing latent growth potential. The genes and pathways that we identified here provide a framework to understand the modular genetic control of skeletal growth and the remarkable malleability of vertebrate limb proportion.
Collapse
Affiliation(s)
- Aditya Saxena
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, Dresden 01307, Germany; Max Planck Institute for the Physics of Complex Systems, Nothnitzerstraße 38, Dresden 01187, Germany
| | - Pushpanathan Muthuirulan
- Department of Human Evolutionary Biology, Harvard University, 11 Divinity Avenue, Cambridge, MA 02138, USA
| | - Stanley J Neufeld
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada
| | - Mai P Tran
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Haydee L Gutierrez
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Kevin D Chen
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Joel M Erberich
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Amanda Birmingham
- Center for Computational Biology and Bioinformatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Terence D Capellini
- Department of Human Evolutionary Biology, Harvard University, 11 Divinity Avenue, Cambridge, MA 02138, USA
| | - John Cobb
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, Dresden 01307, Germany; Max Planck Institute for the Physics of Complex Systems, Nothnitzerstraße 38, Dresden 01187, Germany
| | - Kimberly L Cooper
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| |
Collapse
|
79
|
Erbe R, Gore J, Gemmill K, Gaykalova DA, Fertig EJ. The use of machine learning to discover regulatory networks controlling biological systems. Mol Cell 2022; 82:260-273. [PMID: 35016036 PMCID: PMC8905511 DOI: 10.1016/j.molcel.2021.12.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 01/22/2023]
Abstract
Biological systems are composed of a vast web of multiscale molecular interactors and interactions. High-throughput technologies, both bulk and single cell, now allow for investigation of the properties and quantities of these interactors. Computational algorithms and machine learning methods then provide the tools to derive meaningful insights from the resulting data sets. One such approach is graphical network modeling, which provides a computational framework to explicitly model the molecular interactions within and between the cells comprising biological systems. These graphical networks aim to describe a putative chain of cause and effect between interacting molecules. This feature allows for determination of key molecules in a biological process, accelerated generation of mechanistic hypotheses, and simulation of experimental outcomes. We review the computational concepts and applications of graphical network models across molecular scales for both intracellular and intercellular regulatory biology, examples of successful applications, and the future directions needed to overcome current limitations.
Collapse
Affiliation(s)
- Rossin Erbe
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Jessica Gore
- Institute for Genome Sciences, University of Maryland Medical Center, Baltimore, MD, USA
| | - Kelly Gemmill
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Daria A Gaykalova
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Institute for Genome Sciences, University of Maryland Medical Center, Baltimore, MD, USA; Department of Otorhinolaryngology-Head and Neck Surgery, University of Maryland Medical Center, Baltimore, MD, USA; Marlene & Stewart Greenebaum Comprehensive Cancer Center, University of Maryland Medical Center, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
80
|
Williams EG, Pfister N, Roy S, Statzer C, Haverty J, Ingels J, Bohl C, Hasan M, Čuklina J, Bühlmann P, Zamboni N, Lu L, Ewald CY, Williams RW, Aebersold R. Multiomic profiling of the liver across diets and age in a diverse mouse population. Cell Syst 2022; 13:43-57.e6. [PMID: 34666007 PMCID: PMC8776606 DOI: 10.1016/j.cels.2021.09.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/12/2021] [Accepted: 09/14/2021] [Indexed: 01/21/2023]
Abstract
We profiled the liver transcriptome, proteome, and metabolome in 347 individuals from 58 isogenic strains of the BXD mouse population across age (7 to 24 months) and diet (low or high fat) to link molecular variations to metabolic traits. Several hundred genes are affected by diet and/or age at the transcript and protein levels. Orthologs of two aging-associated genes, St7 and Ctsd, were knocked down in C. elegans, reducing longevity in wild-type and mutant long-lived strains. The multiomics data were analyzed as segregating gene networks according to each independent variable, providing causal insight into dietary and aging effects. Candidates were cross-examined in an independent diversity outbred mouse liver dataset segregating for similar diets, with ∼80%-90% of diet-related candidate genes found in common across datasets. Together, we have developed a large multiomics resource for multivariate analysis of complex traits and demonstrate a methodology for moving from observational associations to causal connections.
Collapse
Affiliation(s)
- Evan G Williams
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Niklas Pfister
- Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Suheeta Roy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Cyril Statzer
- Department of Health Sciences and Technology, ETH Zürich, Zurich, Switzerland
| | - Jack Haverty
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jesse Ingels
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Casey Bohl
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Moaraj Hasan
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Jelena Čuklina
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Peter Bühlmann
- Department of Mathematics, Seminar for Statistics, ETH Zürich, Zurich, Switzerland
| | - Nicola Zamboni
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Lu Lu
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Collin Y Ewald
- Department of Health Sciences and Technology, ETH Zürich, Zurich, Switzerland
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland; Faculty of Science, University of Zürich, Zurich, Switzerland
| |
Collapse
|
81
|
Rodriguez-Esteban R. The speed of information propagation in the scientific network distorts biomedical research. PeerJ 2022; 10:e12764. [PMID: 35070506 PMCID: PMC8759377 DOI: 10.7717/peerj.12764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/17/2021] [Indexed: 01/07/2023] Open
Abstract
Delays in the propagation of scientific discoveries across scientific communities have been an oft-maligned feature of scientific research for introducing a bias towards knowledge that is produced within a scientist's closest community. The vastness of the scientific literature has been commonly blamed for this phenomenon, despite recent improvements in information retrieval and text mining. Its actual negative impact on scientific progress, however, has never been quantified. This analysis attempts to do so by exploring its effects on biomedical discovery, particularly in the discovery of relations between diseases, genes and chemical compounds. Results indicate that the probability that two scientific facts will enable the discovery of a new fact depends on how far apart these two facts were originally within the scientific landscape. In particular, the probability decreases exponentially with the citation distance. Thus, the direction of scientific progress is distorted based on the location in which each scientific fact is published, representing a path-dependent bias in which originally closely-located discoveries drive the sequence of future discoveries. To counter this bias, scientists should open the scope of their scientific work with modern information retrieval and extraction approaches.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|
82
|
Stoeger T, Nunes Amaral LA. The characteristics of early-stage research into human genes are substantially different from subsequent research. PLoS Biol 2022; 20:e3001520. [PMID: 34990452 PMCID: PMC8769369 DOI: 10.1371/journal.pbio.3001520] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/19/2022] [Accepted: 12/21/2021] [Indexed: 11/19/2022] Open
Abstract
Throughout the last 2 decades, several scholars observed that present day research into human genes rarely turns toward genes that had not already been extensively investigated in the past. Guided by hypotheses derived from studies of science and innovation, we present here a literature-wide data-driven meta-analysis to identify the specific scientific and organizational contexts that coincided with early-stage research into human genes throughout the past half century. We demonstrate that early-stage research into human genes differs in team size, citation impact, funding mechanisms, and publication outlet, but that generalized insights derived from studies of science and innovation only partially apply to early-stage research into human genes. Further, we demonstrate that, presently, genome biology accounts for most of the initial early-stage research, while subsequent early-stage research can engage other life sciences fields. We therefore anticipate that the specificity of our findings will enable scientists and policymakers to better promote early-stage research into human genes and increase overall innovation within the life sciences.
Collapse
Affiliation(s)
- Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, Illinois, United States of America
- Center for Genetic Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Luís A. Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, Illinois, United States of America
- Department of Molecular Bioscience, Northwestern University, Evanston, Illinois, United States of America
- Department of Physics and Astronomy, Northwestern University, Evanston, Illinois, United States of America
- Department of Medicine, Northwestern University School of Medicine, Chicago, Illinois, United States of America
| |
Collapse
|
83
|
Clark KC, Kwitek AE. Multi-Omic Approaches to Identify Genetic Factors in Metabolic Syndrome. Compr Physiol 2021; 12:3045-3084. [PMID: 34964118 PMCID: PMC9373910 DOI: 10.1002/cphy.c210010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Metabolic syndrome (MetS) is a highly heritable disease and a major public health burden worldwide. MetS diagnosis criteria are met by the simultaneous presence of any three of the following: high triglycerides, low HDL/high LDL cholesterol, insulin resistance, hypertension, and central obesity. These diseases act synergistically in people suffering from MetS and dramatically increase risk of morbidity and mortality due to stroke and cardiovascular disease, as well as certain cancers. Each of these component features is itself a complex disease, as is MetS. As a genetically complex disease, genetic risk factors for MetS are numerous, but not very powerful individually, often requiring specific environmental stressors for the disease to manifest. When taken together, all sequence variants that contribute to MetS disease risk explain only a fraction of the heritable variance, suggesting additional, novel loci have yet to be discovered. In this article, we will give a brief overview on the genetic concepts needed to interpret genome-wide association studies (GWAS) and quantitative trait locus (QTL) data, summarize the state of the field of MetS physiological genomics, and to introduce tools and resources that can be used by the physiologist to integrate genomics into their own research on MetS and any of its component features. There is a wealth of phenotypic and molecular data in animal models and humans that can be leveraged as outlined in this article. Integrating these multi-omic QTL data for complex diseases such as MetS provides a means to unravel the pathways and mechanisms leading to complex disease and promise for novel treatments. © 2022 American Physiological Society. Compr Physiol 12:1-40, 2022.
Collapse
Affiliation(s)
- Karen C Clark
- Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
84
|
Janbain A, Reynès C, Assaghir Z, Zeineddine H, Sabatier R, Journot L. TopoFun: a machine learning method to improve the functional similarity of gene co-expression modules. NAR Genom Bioinform 2021; 3:lqab103. [PMID: 34761220 PMCID: PMC8573820 DOI: 10.1093/nargab/lqab103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 09/22/2021] [Accepted: 10/13/2021] [Indexed: 11/14/2022] Open
Abstract
A comprehensive, accurate functional annotation of genes is key to systems-level approaches. As functionally related genes tend to be co-expressed, one possible approach to identify functional modules or supplement existing gene annotations is to analyse gene co-expression. We describe TopoFun, a machine learning method that combines topological and functional information to improve the functional similarity of gene co-expression modules. Using LASSO, we selected topological descriptors that discriminated modules made of functionally related genes and random modules. Using the selected topological descriptors, we performed linear discriminant analysis to construct a topological score that predicted the type of a module, random-like or functional-like. We combined the topological score with a functional similarity score in a fitness function that we used in a genetic algorithm to explore the co-expression network. To illustrate the use of TopoFun, we started from a subset of the Gene Ontology Biological Processes (GO-BPs) and showed that TopoFun efficiently retrieved genes that we omitted, and aggregated a number of novel genes to the initial GO-BP while improving module topology and functional similarity. Using an independent protein-protein interaction database, we confirmed that the novel genes gathered by TopoFun were functionally related to the original gene set.
Collapse
Affiliation(s)
- Ali Janbain
- IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France
| | | | - Zainab Assaghir
- Applied Mathematics Department, Lebanese University, Beirut 1003, Lebanon
| | - Hassan Zeineddine
- Applied Mathematics Department, Lebanese University, Beirut 1003, Lebanon
| | - Robert Sabatier
- IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France
| | - Laurent Journot
- IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France
| |
Collapse
|
85
|
Stupp D, Sharon E, Bloch I, Zitnik M, Zuk O, Tabach Y. Co-evolution based machine-learning for predicting functional interactions between human genes. Nat Commun 2021; 12:6454. [PMID: 34753957 PMCID: PMC8578642 DOI: 10.1038/s41467-021-26792-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/09/2021] [Indexed: 12/20/2022] Open
Abstract
Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il. With the rise in number of eukaryotic species being fully sequenced, large scale phylogenetic profiling can give insights on gene function, Here, the authors describe a machine-learning approach that integrates co-evolution across eukaryotic clades to predict gene function and functional interactions among human genes.
Collapse
Affiliation(s)
- Doron Stupp
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA, 02115, USA
| | - Or Zuk
- Department of Statistics and Data Science, The Hebrew University of Jerusalem, Jerusalem, 9190501, Israel.
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel.
| |
Collapse
|
86
|
Amici DR, Pinal-Fernandez I, Christopher-Stine L, Mammen AL, Mendillo ML. A network of core and subtype-specific gene expression programs in myositis. Acta Neuropathol 2021; 142:887-898. [PMID: 34499219 PMCID: PMC8555743 DOI: 10.1007/s00401-021-02365-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/30/2021] [Accepted: 08/25/2021] [Indexed: 12/29/2022]
Abstract
Myositis comprises a heterogeneous group of skeletal muscle disorders which converge on chronic muscle inflammation and weakness. Our understanding of myositis pathogenesis is limited, and many myositis patients lack effective therapies. Using muscle biopsy transcriptome profiles from 119 myositis patients (spanning major clinical and serological disease subtypes) and 20 normal controls, we generated a co-expression network of 8101 dynamically regulated transcripts. This network organized the myositis transcriptome into a map of gene expression modules representing interrelated biological processes and disease signatures. Universally myositis-upregulated network modules included muscle regeneration, specific cytokine signatures, the acute phase response, and neutrophil degranulation. Universally myositis-suppressed pathways included a specific subset of myofilaments, the mitochondrial envelope, and nuclear isoforms of the anti-apoptotic humanin protein. Myositis subtype-specific modules included type 1 interferon signaling and titin (dermatomyositis), RNA processing (antisynthetase syndrome), and vasculogenesis (inclusion body myositis). Importantly, therapies exist to target influential proteins in many myositis-dysregulated modules, and nearly all modules contained understudied proteins and non-coding RNAs - many of which were extraordinarily dysregulated in myositis and may represent novel therapeutic targets. Finally, we apply our network to patient classification, finding that a deep learning algorithm trained on patient-level network "images" successfully assigned patients to clinical groups and further into molecular subclusters. Altogether, we provide a global resource to probe and contextualize differential gene expression in myositis.
Collapse
Affiliation(s)
- David R Amici
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Simpson Querrey Center for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Medical Scientist Training Program, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Iago Pinal-Fernandez
- Muscle Disease Unit, Laboratory of Muscle Stem Cells and Gene Regulation, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Faculty of Health Sciences, Universitat Oberta de Catalunya, Barcelona, Spain
| | - Lisa Christopher-Stine
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Medicine, Division of Rheumatology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Andrew L Mammen
- Muscle Disease Unit, Laboratory of Muscle Stem Cells and Gene Regulation, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Medicine, Division of Rheumatology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Marc L Mendillo
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
- Simpson Querrey Center for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| |
Collapse
|
87
|
de Magalhães JP. Every gene can (and possibly will) be associated with cancer. Trends Genet 2021; 38:216-217. [PMID: 34756472 DOI: 10.1016/j.tig.2021.09.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/02/2021] [Accepted: 09/04/2021] [Indexed: 10/20/2022]
Abstract
A PubMed analysis shows that the vast majority of human genes have been studied in the context of cancer. As such, the study of nearly any human gene can be justified based on existing literature by its potential relevance to cancer. Moreover, these results have implications for analyzing and interpreting large-scale analyses.
Collapse
Affiliation(s)
- João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK.
| |
Collapse
|
88
|
Brown SDM. Advances in mouse genetics for the study of human disease. Hum Mol Genet 2021; 30:R274-R284. [PMID: 34089057 PMCID: PMC8490014 DOI: 10.1093/hmg/ddab153] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 01/11/2023] Open
Abstract
The mouse is the pre-eminent model organism for studies of mammalian gene function and has provided an extraordinarily rich range of insights into basic genetic mechanisms and biological systems. Over several decades, the characterization of mouse mutants has illuminated the relationship between gene and phenotype, providing transformational insights into the genetic bases of disease. However, if we are to deliver the promise of genomic and precision medicine, we must develop a comprehensive catalogue of mammalian gene function that uncovers the dark genome and elucidates pleiotropy. Advances in large-scale mouse mutagenesis programmes allied to high-throughput mouse phenomics are now addressing this challenge and systematically revealing novel gene function and multi-morbidities. Alongside the development of these pan-genomic mutational resources, mouse genetics is employing a range of diversity resources to delineate gene-gene and gene-environment interactions and to explore genetic context. Critically, mouse genetics is a powerful tool for assessing the functional impact of human genetic variation and determining the causal relationship between variant and disease. Together these approaches provide unique opportunities to dissect in vivo mechanisms and systems to understand pathophysiology and disease. Moreover, the provision and utility of mouse models of disease has flourished and engages cumulatively at numerous points across the translational spectrum from basic mechanistic studies to pre-clinical studies, target discovery and therapeutic development.
Collapse
|
89
|
Pleiotropy data resource as a primer for investigating co-morbidities/multi-morbidities and their role in disease. Mamm Genome 2021; 33:135-142. [PMID: 34524473 PMCID: PMC8913486 DOI: 10.1007/s00335-021-09917-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 09/06/2021] [Indexed: 11/06/2022]
Abstract
Most current biomedical and protein research focuses only on a small proportion of genes, which results in a lost opportunity to identify new gene-disease associations and explore new opportunities for therapeutic intervention. The International Mouse Phenotyping Consortium (IMPC) focuses on elucidating gene function at scale for poorly characterized and/or under-studied genes. A key component of the IMPC initiative is the implementation of a broad phenotyping pipeline, which is facilitating the discovery of pleiotropy. Characterizing pleiotropy is essential to identify gene-disease associations, and it is of particular importance when elucidating the genetic causes of syndromic disorders. Here we show how the IMPC is effectively uncovering pleiotropy and how the new mouse models and gene function hypotheses generated by the IMPC are increasing our understanding of the mammalian genome, forming the basis of new research and identifying new gene-disease associations.
Collapse
|
90
|
Huang JH, Liao YR, Lin TC, Tsai CH, Lai WY, Chou YK, Leu JY, Tsai HK, Kao CF. iTARGEX analysis of yeast deletome reveals novel regulators of transcriptional buffering in S phase and protein turnover. Nucleic Acids Res 2021; 49:7318-7329. [PMID: 34197604 PMCID: PMC8287957 DOI: 10.1093/nar/gkab555] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 05/12/2021] [Accepted: 06/29/2021] [Indexed: 11/24/2022] Open
Abstract
Integrating omics data with quantification of biological traits provides unparalleled opportunities for discovery of genetic regulators by in silico inference. However, current approaches to analyze genetic-perturbation screens are limited by their reliance on annotation libraries for prioritization of hits and subsequent targeted experimentation. Here, we present iTARGEX (identification of Trait-Associated Regulatory Genes via mixture regression using EXpectation maximization), an association framework with no requirement of a priori knowledge of gene function. After creating this tool, we used it to test associations between gene expression profiles and two biological traits in single-gene deletion budding yeast mutants, including transcription homeostasis during S phase and global protein turnover. For each trait, we discovered novel regulators without prior functional annotations. The functional effects of the novel candidates were then validated experimentally, providing solid evidence for their roles in the respective traits. Hence, we conclude that iTARGEX can reliably identify novel factors involved in given biological traits. As such, it is capable of converting genome-wide observations into causal gene function predictions. Further application of iTARGEX in other contexts is expected to facilitate the discovery of new regulators and provide observations for novel mechanistic hypotheses regarding different biological traits and phenotypes.
Collapse
Affiliation(s)
- Jia-Hsin Huang
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - You-Rou Liao
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan
| | - Tzu-Chieh Lin
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Cheng-Hung Tsai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Wei-Yun Lai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Yang-Kai Chou
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan
| | - Huai-Kuang Tsai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Cheng-Fu Kao
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
91
|
Levine TP. TMEM106B in humans and Vac7 and Tag1 in yeast are predicted to be lipid transfer proteins. Proteins 2021; 90:164-175. [PMID: 34347309 DOI: 10.1002/prot.26201] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/11/2021] [Accepted: 07/23/2021] [Indexed: 11/05/2022]
Abstract
TMEM106B is an integral membrane protein of late endosomes and lysosomes involved in neuronal function, its overexpression being associated with familial frontotemporal lobar degeneration, and point mutation linked to hypomyelination. It has also been identified in multiple screens for host proteins required for productive SARS-CoV-2 infection. Because standard approaches to understand TMEM106B at the sequence level find no homology to other proteins, it has remained a protein of unknown function. Here, the standard tool PSI-BLAST was used in a nonstandard way to show that the lumenal portion of TMEM106B is a member of the late embryogenesis abundant-2 (LEA-2) domain superfamily. More sensitive tools (HMMER, HHpred, and trRosetta) extended this to predict LEA-2 domains in two yeast proteins. One is Vac7, a regulator of PI(3,5)P2 production in the degradative vacuole, equivalent to the lysosome, which has a LEA-2 domain in its lumenal domain. The other is Tag1, another vacuolar protein, which signals to terminate autophagy and has three LEA-2 domains in its lumenal domain. Further analysis of LEA-2 structures indicated that LEA-2 domains have a long, conserved lipid-binding groove. This implies that TMEM106B, Vac7, and Tag1 may all be lipid transfer proteins in the lumen of late endocytic organelles.
Collapse
|
92
|
Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Sci Rep 2021; 11:15747. [PMID: 34344904 PMCID: PMC8333311 DOI: 10.1038/s41598-021-94897-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies.
Collapse
Affiliation(s)
- Guillermo Serrano Nájera
- Division of Cell and Developmental Biology, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - David Narganes Carlón
- Division of Cell and Developmental Biology, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
- Division of Population Health and Genomics, Ninewells Hospital, School of Medicine, University of Dundee, Dundee, DD1 9SY, UK
- Exscientia Ltd, Dundee One, River Court, 5 West Victoria Dock Road, Dundee, DD1 3JT, UK
| | - Daniel J Crowther
- Exscientia Ltd, Dundee One, River Court, 5 West Victoria Dock Road, Dundee, DD1 3JT, UK.
| |
Collapse
|
93
|
Jakutis G, Stainier DYR. Genotype-Phenotype Relationships in the Context of Transcriptional Adaptation and Genetic Robustness. Annu Rev Genet 2021; 55:71-91. [PMID: 34314597 DOI: 10.1146/annurev-genet-071719-020342] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genetic manipulations with a robust and predictable outcome are critical to investigate gene function, as well as for therapeutic genome engineering. For many years, knockdown approaches and reagents including RNA interference and antisense oligonucleotides dominated functional studies; however, with the advent of precise genome editing technologies, CRISPR-based knockout systems have become the state-of-the-art tools for such studies. These technologies have helped decipher the role of thousands of genes in development and disease. Their use has also revealed how limited our understanding of genotype-phenotype relationships is. The recent discovery that certain mutations can trigger the transcriptional modulation of other genes, a phenomenon called transcriptional adaptation, has provided an additional explanation for the contradicting phenotypes observed in knockdown versus knockout models and increased awareness about the use of each of these approaches. In this review, we first cover the strengths and limitations of different gene perturbation strategies. Then we highlight the diverse ways in which the genotype-phenotype relationship can be discordant between these different strategies. Finally, we review the genetic robustness mechanisms that can lead to such discrepancies, paying special attention to the recently discovered phenomenon of transcriptional adaptation. Expected final online publication date for the Annual Review of Genetics, Volume 55 is November 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Gabrielius Jakutis
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany;
| | - Didier Y R Stainier
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany; .,German Centre for Cardiovascular Research (DZHK), Partner site Rhine-Main, 60590 Frankfurt am Main, Germany.,Excellence Cluster Cardio-Pulmonary Institute (CPI), 35392 Giessen, Germany
| |
Collapse
|
94
|
Pauža AG, Mecawi AS, Paterson A, Hindmarch CCT, Greenwood M, Murphy D, Greenwood MP. Osmoregulation of the transcriptome of the hypothalamic supraoptic nucleus: A resource for the community. J Neuroendocrinol 2021; 33:e13007. [PMID: 34297454 DOI: 10.1111/jne.13007] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/25/2021] [Accepted: 06/20/2021] [Indexed: 01/13/2023]
Abstract
The hypothalamic supraoptic nucleus (SON) is a core osmoregulatory control centre that deciphers information about the metabolic state of the organism and orchestrates appropriate homeostatic (endocrine) and allostatic (behavioural) responses. We have used RNA sequencing to describe the polyadenylated transcriptome of the SON of the male Wistar Han rat. These data have been mined to generate comprehensive catalogues of functional classes of genes (enzymes, transcription factors, endogenous peptides, G protein coupled receptors, transporters, catalytic receptors, channels and other pharmacological targets) expressed in this nucleus in the euhydrated state, and that together form the basal substrate for its physiological interactions. We have gone on to show that fluid deprivation for 3 days (dehydration) results in changes in the expression levels of 2247 RNA transcripts, which have similarly been functionally catalogued, and further mined to describe enriched gene categories and putative regulatory networks (Regulons) that may have physiological importance in SON function related plasticity. We hope that the revelation of these genes, pathways and networks, most of which have no characterised roles in the SON, will encourage the neuroendocrine community to pursue new investigations into the new 'known-unknowns' reported in the present study.
Collapse
Affiliation(s)
- Audrys G Pauža
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, University of Bristol, Bristol, UK
| | - André Souza Mecawi
- Laboratory of Neuroendocrinology, Department of Biophysics, Paulista School of Medicine, Federal University of São Paulo, São Paulo, Brazil
| | - Alex Paterson
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, University of Bristol, Bristol, UK
- Bristol Genomics Facility, University of Bristol, Bristol, UK
| | - Charles C T Hindmarch
- Queen's Cardiopulmonary Unit (QCPU), Department of Medicine, Translational Institute of Medicine (TIME), Queen's University, Kingston, ON, Canada
| | - Mingkwan Greenwood
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, University of Bristol, Bristol, UK
| | - David Murphy
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, University of Bristol, Bristol, UK
| | - Michael P Greenwood
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, University of Bristol, Bristol, UK
| |
Collapse
|
95
|
Gold ER. The fall of the innovation empire and its possible rise through open science. RESEARCH POLICY 2021; 50:104226. [PMID: 34083844 PMCID: PMC8024784 DOI: 10.1016/j.respol.2021.104226] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 01/26/2021] [Accepted: 02/28/2021] [Indexed: 12/13/2022]
Abstract
There is growing concern that the innovation system's ability to create wealth and attain social benefit is declining in effectiveness. This article explores the reasons for this decline and suggests a structure, the open science partnership, as one mechanism through which to slow down or reverse this decline. The article examines the empirical literature of the last century to document the decline. This literature suggests that the cost of research and innovation is increasing exponentially, that researcher productivity is declining, and, third, that these two phenomena have led to an overall flat or declining level of innovation productivity. The article then turns to three explanations for the decline - the growing complexity of science, a mismatch of incentives, and a balkanization of knowledge. Finally, the article explores the role that open science partnerships - public-private partnerships based on open access publications, open data and materials, and the avoidance of restrictive forms of intellectual property - can play in increasing the efficiency of the innovation system.
Collapse
Affiliation(s)
- E. Richard Gold
- McGill University, Faculty of Law and Faculty of Medicine, Canada
| |
Collapse
|
96
|
Drew K, Wallingford JB, Marcotte EM. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol Syst Biol 2021; 17:e10016. [PMID: 33973408 PMCID: PMC8111494 DOI: 10.15252/msb.202010016] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 12/30/2022] Open
Abstract
A general principle of biology is the self-assembly of proteins into functional complexes. Characterizing their composition is, therefore, required for our understanding of cellular functions. Unfortunately, we lack knowledge of the comprehensive set of identities of protein complexes in human cells. To address this gap, we developed a machine learning framework to identify protein complexes in over 15,000 mass spectrometry experiments which resulted in the identification of nearly 7,000 physical assemblies. We show our resource, hu.MAP 2.0, is more accurate and comprehensive than previous state of the art high-throughput protein complex resources and gives rise to many new hypotheses, including for 274 completely uncharacterized proteins. Further, we identify 253 promiscuous proteins that participate in multiple complexes pointing to possible moonlighting roles. We have made hu.MAP 2.0 easily searchable in a web interface (http://humap2.proteincomplexes.org/), which will be a valuable resource for researchers across a broad range of interests including systems biology, structural biology, and molecular explanations of disease.
Collapse
Affiliation(s)
- Kevin Drew
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
- Present address:
Department of Biological SciencesUniversity of Illinois at ChicagoChicagoILUSA
| | - John B Wallingford
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
| | - Edward M Marcotte
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
| |
Collapse
|
97
|
Birling MC, Yoshiki A, Adams DJ, Ayabe S, Beaudet AL, Bottomley J, Bradley A, Brown SDM, Bürger A, Bushell W, Chiani F, Chin HJG, Christou S, Codner GF, DeMayo FJ, Dickinson ME, Doe B, Donahue LR, Fray MD, Gambadoro A, Gao X, Gertsenstein M, Gomez-Segura A, Goodwin LO, Heaney JD, Hérault Y, de Angelis MH, Jiang ST, Justice MJ, Kasparek P, King RE, Kühn R, Lee H, Lee YJ, Liu Z, Lloyd KCK, Lorenzo I, Mallon AM, McKerlie C, Meehan TF, Fuentes VM, Newman S, Nutter LMJ, Oh GT, Pavlovic G, Ramirez-Solis R, Rosen B, Ryder EJ, Santos LA, Schick J, Seavitt JR, Sedlacek R, Seisenberger C, Seong JK, Skarnes WC, Sorg T, Steel KP, Tamura M, Tocchini-Valentini GP, Wang CKL, Wardle-Jones H, Wattenhofer-Donzé M, Wells S, Wiles MV, Willis BJ, Wood JA, Wurst W, Xu Y, Teboul L, Murray SA. A resource of targeted mutant mouse lines for 5,061 genes. Nat Genet 2021; 53:416-419. [PMID: 33833456 PMCID: PMC8397259 DOI: 10.1038/s41588-021-00825-y] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
| | | | | | - Shinya Ayabe
- RIKEN BioResource Research Center, Tsukuba, Japan
| | - Arthur L Beaudet
- Baylor College of Medicine, Houston, TX, USA
- Luna Genetics, Houston, TX, USA
| | | | - Allan Bradley
- Wellcome Sanger Institute, Hinxton, UK
- Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | | | - Antje Bürger
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Wendy Bushell
- Wellcome Sanger Institute, Hinxton, UK
- IONTAS, Cambridge, UK
| | - Francesco Chiani
- Monterotondo Mouse Clinic, Italian National Research Council (CNR), Institute of Cell Biology and Neurobiology, Monterotondo Scalo, Italy
| | - Hsian-Jean Genie Chin
- National Laboratory Animal Center, National Applied Research Laboratories (NARLabs), Taipei, Taiwan
| | | | | | - Francesco J DeMayo
- Baylor College of Medicine, Houston, TX, USA
- National Institute for Environmental Health Science Research, Durham, NC, USA
| | | | | | | | | | - Alessia Gambadoro
- Monterotondo Mouse Clinic, Italian National Research Council (CNR), Institute of Cell Biology and Neurobiology, Monterotondo Scalo, Italy
| | - Xiang Gao
- SKL of Pharmaceutical Biotechnology and Model Animal Research Center, Collaborative Innovation Center for Genetics and Development, Nanjing Biomedical Research Institute, Nanjing University, Nanjing, China
| | | | - Alba Gomez-Segura
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | - Yann Hérault
- Université de Strasbourg, CNRS, INSERM, PHENOMIN-ICS, IGBMC, Illkirch, France
| | - Martin Hrabe de Angelis
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising-Weihenstephan, Germany
- German Center for Diabetes Research, Neuherberg, Germany
| | - Si-Tse Jiang
- National Laboratory Animal Center, National Applied Research Laboratories (NARLabs), Taipei, Taiwan
| | - Monica J Justice
- Baylor College of Medicine, Houston, TX, USA
- Centre for Phenogenomics, Toronto, Ontario, Canada
- Hospital for Sick Children, Toronto, Ontario, Canada
| | - Petr Kasparek
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Vestec, Czech Republic
| | | | - Ralf Kühn
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Max Delbrueck Center for Molecular Medicine, Berlin, Germany
- Berlin Institute of Health, Berlin, Germany
| | - Ho Lee
- Korea Mouse Phenotyping Center (KMPC) and Graduate School of Cancer Science and Policy, National Cancer Center, Gyeonggi, Republic of Korea
| | - Young Jae Lee
- Korea Mouse Phenotyping Center (KMPC) and Lee Gil Ya Cancer and Diabetes Institute, Gachon University, Incheon, Republic of Korea
| | - Zhiwei Liu
- CAM-SU Genomic Resource Center, Soochow University, Suzhou, China
| | - K C Kent Lloyd
- Mouse Biology Program, University of California, Davis, Davis, CA, USA
| | | | | | - Colin McKerlie
- Centre for Phenogenomics, Toronto, Ontario, Canada
- Hospital for Sick Children, Toronto, Ontario, Canada
| | - Terrence F Meehan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- Kymab Group, Cambridge, UK
| | - Violeta Munoz Fuentes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stuart Newman
- Wellcome Sanger Institute, Hinxton, UK
- PetMedix, Cambridge, UK
| | - Lauryl M J Nutter
- Centre for Phenogenomics, Toronto, Ontario, Canada
- Hospital for Sick Children, Toronto, Ontario, Canada
| | - Goo Taeg Oh
- Immune and Vascular Cell Network Research Center, National Creative Initiatives and Department of Life Sciences, Ewha Womans Univesity, Seoul, Republic of Korea
| | - Guillaume Pavlovic
- Université de Strasbourg, CNRS, INSERM, PHENOMIN-ICS, IGBMC, Illkirch, France
| | | | - Barry Rosen
- Wellcome Sanger Institute, Hinxton, UK
- AstraZeneca, Discovery Sciences, Cambridge, UK
| | - Edward J Ryder
- Wellcome Sanger Institute, Hinxton, UK
- LGC, Sport and Specialised Analytical Services, Fordham, UK
| | - Luis A Santos
- MRC Harwell Institute, Mammalian Genetics Unit, Didcot, UK
| | - Joel Schick
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Genetics and Cellular Engineering Group, Institute of Molecular Toxicology and Pharmacology, Helmholtz Zentrum Munich, Neuherberg, Germany
| | | | - Radislav Sedlacek
- Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Vestec, Czech Republic
| | - Claudia Seisenberger
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Je Kyung Seong
- Korea Mouse Phenotyping Center (KMPC) and BK21 Program for Veterinary Science, Research Institute for Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, Republic of Korea
| | - William C Skarnes
- Wellcome Sanger Institute, Hinxton, UK
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Tania Sorg
- Université de Strasbourg, CNRS, INSERM, PHENOMIN-ICS, IGBMC, Illkirch, France
| | - Karen P Steel
- Wellcome Sanger Institute, Hinxton, UK
- Wolfson Centre for Age-Related Diseases, King's College London, London, UK
| | | | - Glauco P Tocchini-Valentini
- Monterotondo Mouse Clinic, Italian National Research Council (CNR), Institute of Cell Biology and Neurobiology, Monterotondo Scalo, Italy
| | - Chi-Kuang Leo Wang
- National Laboratory Animal Center, National Applied Research Laboratories (NARLabs), Taipei, Taiwan
| | | | | | - Sara Wells
- MRC Harwell Institute, Mary Lyon Centre, Didcot, UK
| | | | - Brandon J Willis
- Mouse Biology Program, University of California, Davis, Davis, CA, USA
| | - Joshua A Wood
- Mouse Biology Program, University of California, Davis, Davis, CA, USA
| | - Wolfgang Wurst
- Institute of Developmental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
- Developmental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising-Weihenstephan, Germany
- German Center for Neurodegenerative Diseases, Munich, Germany
| | - Ying Xu
- CAM-SU Genomic Resource Center, Soochow University, Suzhou, China
| | - Lydia Teboul
- MRC Harwell Institute, Mary Lyon Centre, Didcot, UK.
| | | |
Collapse
|
98
|
Meta-Analysis of Gene Popularity: Less Than Half of Gene Citations Stem from Gene Regulatory Networks. Genes (Basel) 2021; 12:genes12020319. [PMID: 33672419 PMCID: PMC7926953 DOI: 10.3390/genes12020319] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 02/14/2021] [Accepted: 02/20/2021] [Indexed: 12/04/2022] Open
Abstract
The reasons for selecting a gene for further study might vary from historical momentum to funding availability, thus leading to unequal attention distribution among all genes. However, certain biological features tend to be overlooked in evaluating a gene’s popularity. Here we present a meta-analysis of the reasons why different genes have been studied and to what extent, with a focus on the gene-specific biological features. From unbiased datasets we can define biological properties of genes that reasonably may affect their perceived importance. We make use of both linear and nonlinear computational approaches for estimating gene popularity to then compare their relative importance. We find that roughly 25% of the studies are the result of a historical positive feedback, which we may think of as social reinforcement. Of the remaining features, gene family membership is the most indicative followed by disease relevance and finally regulatory pathway association. Disease relevance has been an important driver until the 1990s, after which the focus shifted to exploring every single gene. We also present a resource that allows one to study the impact of reinforcement, which may guide our research toward genes that have not yet received proportional attention.
Collapse
|
99
|
Rentzsch P, Schubach M, Shendure J, Kircher M. CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 2021; 13:31. [PMID: 33618777 PMCID: PMC7901104 DOI: 10.1186/s13073-021-00835-9] [Citation(s) in RCA: 421] [Impact Index Per Article: 105.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 01/20/2021] [Indexed: 02/08/2023] Open
Abstract
Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00835-9.
Collapse
Affiliation(s)
- Philipp Rentzsch
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Max Schubach
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Jay Shendure
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Martin Kircher
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. .,Berlin Institute of Health (BIH), 10178, Berlin, Germany.
| |
Collapse
|
100
|
Palmer D, Fabris F, Doherty A, Freitas AA, de Magalhães JP. Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues. Aging (Albany NY) 2021; 13:3313-3341. [PMID: 33611312 PMCID: PMC7906136 DOI: 10.18632/aging.202648] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 10/29/2020] [Indexed: 12/22/2022]
Abstract
By combining transcriptomic data with other data sources, inferences can be made about functional changes during ageing. Thus, we conducted a meta-analysis on 127 publicly available microarray and RNA-Seq datasets from mice, rats and humans, identifying a transcriptomic signature of ageing across species and tissues. Analyses on subsets of these datasets produced transcriptomic signatures of ageing for brain, heart and muscle. We then applied enrichment analysis and machine learning to functionally describe these signatures, revealing overexpression of immune and stress response genes and underexpression of metabolic and developmental genes. Further analyses revealed little overlap between genes differentially expressed with age in different tissues, despite ageing differentially expressed genes typically being widely expressed across tissues. Additionally we show that the ageing gene expression signatures (particularly the overexpressed signatures) of the whole meta-analysis, brain and muscle tend to include genes that are central in protein-protein interaction networks. We also show that genes underexpressed with age in the brain are highly central in a co-expression network, suggesting that underexpression of these genes may have broad phenotypic consequences. In sum, we show numerous functional similarities between the ageing transcriptomes of these important tissues, along with unique network properties of genes differentially expressed with age in both a protein-protein interaction and co-expression networks.
Collapse
Affiliation(s)
- Daniel Palmer
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK.,Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock, Germany
| | - Fabio Fabris
- School of Computing, University of Kent, Canterbury, Kent, UK
| | - Aoife Doherty
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Alex A Freitas
- School of Computing, University of Kent, Canterbury, Kent, UK
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| |
Collapse
|