1
|
Hoffmann M, Poschenrieder JM, Incudini M, Baier S, Fitz A, Maier A, Hartung M, Hoffmann C, Trummer N, Adamowicz K, Picciani M, Scheibling E, Harl MV, Lesch I, Frey H, Kayser S, Wissenberg P, Schwartz L, Hafner L, Acharya A, Hackl L, Grabert G, Lee SG, Cho G, Cloward M, Jankowski J, Lee HK, Tsoy O, Wenke N, Pedersen AG, Bønnelykke K, Mandarino A, Melograna F, Schulz L, Climente-González H, Wilhelm M, Iapichino L, Wienbrandt L, Ellinghaus D, Van Steen K, Grossi M, Furth PA, Hennighausen L, Di Pierro A, Baumbach J, Kacprowski T, List M, Blumenthal DB. Network medicine-based epistasis detection in complex diseases: ready for quantum computing. medRxiv 2023:2023.11.07.23298205. [PMID: 38076997 PMCID: PMC10705612 DOI: 10.1101/2023.11.07.23298205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Collapse
Affiliation(s)
- Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Julian M. Poschenrieder
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Massimiliano Incudini
- Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
| | - Sylvie Baier
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Amelie Fitz
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Christian Hoffmann
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nico Trummer
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Evelyn Scheibling
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Maximilian V. Harl
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Ingmar Lesch
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Hunor Frey
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Simon Kayser
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Paul Wissenberg
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Leon Schwartz
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Leon Hafner
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
| | - Aakriti Acharya
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Lena Hackl
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Gordon Grabert
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Sung-Gwon Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, Korea
| | - Gyuhyeok Cho
- Department of Chemistry, Gwangju Institute of Science and Technology, Gwangju, Korea
| | - Matthew Cloward
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jakub Jankowski
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Hye Kyung Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nina Wenke
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Anders Gorm Pedersen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
| | - Klaus Bønnelykke
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Antonio Mandarino
- International Centre for Theory of Quantum Technologies, University of Gdańsk, 80-309 Gdańsk, Poland
| | - Federico Melograna
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Laura Schulz
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | | | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Luigi Iapichino
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | - Lars Wienbrandt
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - Kristel Van Steen
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Michele Grossi
- European Organization for Nuclear Research (CERN), Geneva 1211, Switzerland
| | - Priscilla A. Furth
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
| | - Lothar Hennighausen
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Alessandra Di Pierro
- Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - David B. Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
2
|
Climente-González H, Azencott CA, Yamada M. A network-guided protocol to discover susceptibility genes in genome-wide association studies using stability selection. STAR Protoc 2023; 4:101998. [PMID: 36609152 PMCID: PMC9850185 DOI: 10.1016/j.xpro.2022.101998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 11/20/2022] [Accepted: 12/15/2022] [Indexed: 01/09/2023] Open
Abstract
We present a network-based protocol to discover susceptibility genes in case-control genome-wide association studies (GWASs). In short, this protocol looks for biomarkers that are informative of disease status and interconnected in an underlying biological network. This boosts discovery and interpretability. Moreover, the protocol tackles the instability of network methods, producing a stable set of genes most likely to replicate in external cohorts. To apply the procedure to a provided GWAS dataset, install the required software and execute our command-line tool. For complete details on the use and execution of this protocol, please refer to Climente-González et al.1.
Collapse
Affiliation(s)
| | - Chloé-Agathe Azencott
- Centre for Computational Biology (CBIO), MINES ParisTech, PSL Research University, 75005 Paris, France; Institut Curie, PSL Research University, 75005 Paris, France; INSERM, U900, 75005 Paris, France
| | - Makoto Yamada
- RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan; Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan; Machine Learning and Data Science Unit, Okinawa Institute of Science and Technology Graduate University, Onna-son, Okinawa 904-0495, Japan
| |
Collapse
|
3
|
Duroux D, Climente-González H, Azencott CA, Van Steen K. Interpretable network-guided epistasis detection. Gigascience 2022; 11:6521880. [PMID: 35134928 PMCID: PMC8848319 DOI: 10.1093/gigascience/giab093] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 10/12/2021] [Accepted: 12/13/2021] [Indexed: 11/15/2022] Open
Abstract
Background Detecting epistatic interactions at the gene level is essential to understanding the biological mechanisms of complex diseases. Unfortunately, genome-wide interaction association studies involve many statistical challenges that make such detection hard. We propose a multi-step protocol for epistasis detection along the edges of a gene-gene co-function network. Such an approach reduces the number of tests performed and provides interpretable interactions while keeping type I error controlled. Yet, mapping gene interactions into testable single-nucleotide polymorphism (SNP)-interaction hypotheses, as well as computing gene pair association scores from SNP pair ones, is not trivial. Results Here we compare 3 SNP-gene mappings (positional overlap, expression quantitative trait loci, and proximity in 3D structure) and use the adaptive truncated product method to compute gene pair scores. This method is non-parametric, does not require a known null distribution, and is fast to compute. We apply multiple variants of this protocol to a genome-wide association study dataset on inflammatory bowel disease. Different configurations produced different results, highlighting that various mechanisms are implicated in inflammatory bowel disease, while at the same time, results overlapped with known disease characteristics. Importantly, the proposed pipeline also differs from a conventional approach where no network is used, showing the potential for additional discoveries when prior biological knowledge is incorporated into epistasis detection.
Collapse
Affiliation(s)
- Diane Duroux
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium
| | - Héctor Climente-González
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.,High-Dimensional Statistical Modeling Team, RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
| | - Chloé-Agathe Azencott
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | - Kristel Van Steen
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium.,BIO3 - Systems Medicine, Department of Human Genetics, KU Leuven, 3000 Leuven, Belgium, 49 3000 Leuven, Belgium
| |
Collapse
|
4
|
Climente-González H, Lonjou C, Lesueur F, Stoppa-Lyonnet D, Andrieu N, Azencott CA. Boosting GWAS using biological networks: A study on susceptibility to familial breast cancer. PLoS Comput Biol 2021; 17:e1008819. [PMID: 33735170 PMCID: PMC8009366 DOI: 10.1371/journal.pcbi.1008819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 03/30/2021] [Accepted: 02/18/2021] [Indexed: 11/20/2022] Open
Abstract
Genome-wide association studies (GWAS) explore the genetic causes of complex diseases. However, classical approaches ignore the biological context of the genetic variants and genes under study. To address this shortcoming, one can use biological networks, which model functional relationships, to search for functionally related susceptibility loci. Many such network methods exist, each arising from different mathematical frameworks, pre-processing steps, and assumptions about the network properties of the susceptibility mechanism. Unsurprisingly, this results in disparate solutions. To explore how to exploit these heterogeneous approaches, we selected six network methods and applied them to GENESIS, a nationwide French study on familial breast cancer. First, we verified that network methods recovered more interpretable results than a standard GWAS. We addressed the heterogeneity of their solutions by studying their overlap, computing what we called the consensus. The key gene in this consensus solution was COPS5, a gene related to multiple cancer hallmarks. Another issue we observed was that network methods were unstable, selecting very different genes on different subsamples of GENESIS. Therefore, we proposed a stable consensus solution formed by the 68 genes most consistently selected across multiple subsamples. This solution was also enriched in genes known to be associated with breast cancer susceptibility (BLM, CASP8, CASP10, DNAJC1, FGFR2, MRPS30, and SLC4A7, P-value = 3 × 10-4). The most connected gene was CUL3, a regulator of several genes linked to cancer progression. Lastly, we evaluated the biases of each method and the impact of their parameters on the outcome. In general, network methods preferred highly connected genes, even after random rewirings that stripped the connections of any biological meaning. In conclusion, we present the advantages of network-guided GWAS, characterize their shortcomings, and provide strategies to address them. To compute the consensus networks, implementations of all six methods are available at https://github.com/hclimente/gwas-tools.
Collapse
Affiliation(s)
- Héctor Climente-González
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
- RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan
| | - Christine Lonjou
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Fabienne Lesueur
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | | | - Dominique Stoppa-Lyonnet
- Service de Génétique, Institut Curie, Paris, France
- INSERM, U830, Paris, France
- Université Paris Descartes, Paris, France
| | - Nadine Andrieu
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Chloé-Agathe Azencott
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| |
Collapse
|
5
|
Abstract
Motivation Finding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks. Results We compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons. Availability and implementation Block HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Héctor Climente-González
- Institut Curie, PSL Research University, Paris, France.,INSERM, U900, Paris, France.,MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France.,RIKEN AIP, Tokyo, Japan
| | - Chloé-Agathe Azencott
- Institut Curie, PSL Research University, Paris, France.,INSERM, U900, Paris, France.,MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Makoto Yamada
- RIKEN AIP, Tokyo, Japan.,Department of intelligence science and technology, Kyoto University, Kyoto, Japan
| |
Collapse
|
6
|
Jayasinghe RG, Cao S, Gao Q, Wendl MC, Vo NS, Reynolds SM, Zhao Y, Climente-González H, Chai S, Wang F, Varghese R, Huang M, Liang WW, Wyczalkowski MA, Sengupta S, Li Z, Payne SH, Fenyö D, Miner JH, Walter MJ, Vincent B, Eyras E, Chen K, Shmulevich I, Chen F, Ding L. Systematic Analysis of Splice-Site-Creating Mutations in Cancer. Cell Rep 2019; 23:270-281.e3. [PMID: 29617666 PMCID: PMC6055527 DOI: 10.1016/j.celrep.2018.03.052] [Citation(s) in RCA: 144] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 02/21/2018] [Accepted: 03/13/2018] [Indexed: 12/31/2022] Open
Abstract
For the past decade, cancer genomic studies have focused on mutations leading to splice-site disruption, overlooking those having splice-creating potential. Here, we applied a bioinformatic tool, MiSplice, for the large-scale discovery of splice-site-creating mutations (SCMs) across 8,656 TCGA tumors. We report 1,964 originally mis-annotated mutations having clear evidence of creating alternative splice junctions. TP53 and GATA3 have 26 and 18 SCMs, respectively, and ATRX has 5 from lower-grade gliomas. Mutations in 11 genes, including PARP1, BRCA1, and BAP1, were experimentally validated for splice-site-creating function. Notably, we found that neoantigens induced by SCMs are likely several folds more immunogenic compared to missense mutations, exemplified by the recurrent GATA3 SCM. Further, high expression of PD-1 and PD-L1 was observed in tumors with SCMs, suggesting candidates for immune blockade therapy. Our work highlights the importance of integrating DNA and RNA data for understanding the functional and the clinical implications of mutations in human diseases.
Collapse
Affiliation(s)
- Reyka G Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Michael C Wendl
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA; Department of Mathematics, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Nam Sy Vo
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Yanyan Zhao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Héctor Climente-González
- Institut Curie, 75248 Paris Cedex, France; MINES ParisTech, PSL-Research University, CBIO-Centre for Computational Biology, 77300 Fontainebleau, France; INSERM U900, 75248 Paris Cedex, France
| | - Shengjie Chai
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Fang Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rajees Varghese
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; Division of Nephrology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Mo Huang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Wen-Wei Liang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Sohini Sengupta
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Zhi Li
- Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY 10016, USA; Institute for Systems Genetics, New York University School of Medicine, New York, NY 10016, USA
| | - Samuel H Payne
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - David Fenyö
- Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY 10016, USA; Institute for Systems Genetics, New York University School of Medicine, New York, NY 10016, USA
| | - Jeffrey H Miner
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; Division of Nephrology, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Matthew J Walter
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63110, USA
| | | | - Benjamin Vincent
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Eduardo Eyras
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain; Computational RNA Biology Group, Pompeu Fabra University (UPF), 08003 Barcelona, Spain
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Feng Chen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; Division of Nephrology, Washington University in St. Louis, St. Louis, MO 63110, USA.
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Division of Oncology, Washington University in St. Louis, St. Louis, MO 63110, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
7
|
Blanco JD, Radusky L, Climente-González H, Serrano L. FoldX accurate structural protein-DNA binding prediction using PADA1 (Protein Assisted DNA Assembly 1). Nucleic Acids Res 2019; 46:3852-3863. [PMID: 29608705 PMCID: PMC5934639 DOI: 10.1093/nar/gky228] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open
Abstract
The speed at which new genomes are being sequenced highlights the need for genome-wide methods capable of predicting protein–DNA interactions. Here, we present PADA1, a generic algorithm that accurately models structural complexes and predicts the DNA-binding regions of resolved protein structures. PADA1 relies on a library of protein and double-stranded DNA fragment pairs obtained from a training set of 2103 DNA–protein complexes. It includes a fast statistical force field computed from atom-atom distances, to evaluate and filter the 3D docking models. Using published benchmark validation sets and 212 DNA–protein structures published after 2016 we predicted the DNA-binding regions with an RMSD of <1.8 Å per residue in >95% of the cases. We show that the quality of the docked templates is compatible with FoldX protein design tool suite to identify the crystallized DNA molecule sequence as the most energetically favorable in 80% of the cases. We highlighted the biological potential of PADA1 by reconstituting DNA and protein conformational changes upon protein mutagenesis of a meganuclease and its variants, and by predicting DNA-binding regions and nucleotide sequences in proteins crystallized without DNA. These results opens up new perspectives for the engineering of DNA–protein interfaces.
Collapse
Affiliation(s)
- Javier Delgado Blanco
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Leandro Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Héctor Climente-González
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
8
|
Climente-González H, Porta-Pardo E, Godzik A, Eyras E. The Functional Impact of Alternative Splicing in Cancer. Cell Rep 2017; 20:2215-2226. [DOI: 10.1016/j.celrep.2017.08.012] [Citation(s) in RCA: 352] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 07/15/2017] [Accepted: 07/26/2017] [Indexed: 12/29/2022] Open
|
9
|
Lluch-Senar M, Mancuso FM, Climente-González H, Peña-Paz MI, Sabido E, Serrano L. Rescuing discarded spectra: Full comprehensive analysis of a minimal proteome. Proteomics 2015; 16:554-63. [PMID: 26702875 DOI: 10.1002/pmic.201500187] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 11/06/2015] [Accepted: 12/21/2015] [Indexed: 02/05/2023]
Abstract
A common problem encountered when performing large-scale MS proteome analysis is the loss of information due to the high percentage of unassigned spectra. To determine the causes behind this loss we have analyzed the proteome of one of the smallest living bacteria that can be grown axenically, Mycoplasma pneumoniae (729 ORFs). The proteome of M. pneumoniae cells, grown in defined media, was analyzed by MS. An initial search with both Mascot and a species-specific NCBInr database with common contaminants (NCBImpn), resulted in around 79% of the acquired spectra not having an assignment. The percentage of non-assigned spectra was reduced to 27% after re-analysis of the data with the PEAKS software, thereby increasing the proteome coverage of M. pneumoniae from the initial 60% to over 76%. Nonetheless, 33,413 spectra with assigned amino acid sequences could not be mapped to any NCBInr database protein sequence. Approximately, 1% of these unassigned peptides corresponded to PTMs and 4% to M. pneumoniae protein variants (deamidation and translation inaccuracies). The most abundant peptide sequence variants (Phe-Tyr and Ala-Ser) could be explained by alterations in the editing capacity of the corresponding tRNA synthases. About another 1% of the peptides not associated to any protein had repetitions of the same aromatic/hydrophobic amino acid at the N-terminus, or had Arg/Lys at the C-terminus. Thus, in a model system, we have maximized the number of assigned spectra to 73% (51,453 out of the 70,040 initial acquired spectra). All MS data have been deposited in the ProteomeXchange with identifier PXD002779 (http://proteomecentral.proteomexchange.org/dataset/PXD002779).
Collapse
Affiliation(s)
- Maria Lluch-Senar
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Francesco M Mancuso
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Proteomics Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Héctor Climente-González
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Marcia I Peña-Paz
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Proteomics Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Eduard Sabido
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Proteomics Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|