1
|
Xie X, Gu H, Ma J, Fu L, Ma J, Zhang J, Wu R, Chen Z. FOXO1 Single-Nucleotide Polymorphisms Are Associated with Bleeding Severity and Sensitivity of Glucocorticoid Treatment of Pediatric Immune Thrombocytopenia. DNA Cell Biol 2024; 43:279-287. [PMID: 38683649 DOI: 10.1089/dna.2023.0431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024] Open
Abstract
Immune thrombocytopenia (ITP) is an autoimmune-mediated hemorrhagic disease. Emerging evidence indicates that FOXO1 SNPs are related to the immune dysregulation of several autoimmune diseases suggesting that FOXO1 may be involved in inflammation and pathologic activities in patients with ITP. This study aimed to evaluate whether FOXO1 gene single-nucleotide polymorphisms (SNPs) are associated with susceptibility to ITP and clinical priorities of concern include bleeding severity and sensitivity of glucocorticoid treatment. This study recruited 327 newly diagnosed ITP and 220 healthy controls. Four SNPs (rs17446593, rs17446614, rs2721068, and rs2721068) of the FOXO1 gene were detected using the Sequenom MassArray system. Bleeding severity were classified into the mild and severe groups based on the bleeding scores. ITP patients were classified as sensitive and insensitive to glucocorticoid treatment according to the practice guideline for ITP (2019 version). The frequencies of the four SNPs did not show any significant differences between the ITP and healthy control groups. Patients with AA genotype at rs17446593 (p = 0.009) and GG genotype at rs17446614 (p = 0.009) suffered more severe bleeding than patients without them. Carriers of haplotype Grs17446593Ars17446614Crs2721068Trs2755213 were protective to severe bleeding (p = 0.002). The AA genotype at rs17446593 was significantly higher in ITP patients sensitive to glucocorticoid treatment than in those insensitive to glucocorticoid treatment (p = 0.03). Haplotype Grs17446593Grs17446614Trs2721068Trs2755213 increases the risk of glucocorticoid resistance (p = 0.007). Although FOXO1 gene polymorphisms were not associated with susceptibility to ITP, the AA genotype at rs17446593 and GG genotype at rs17446614 were associated with bleeding severity. Haplotype GACT have a protective effect against severe bleeding. Patients with AA genotype at rs17446593 may tend to have good responds to glucocorticoid treatment. However, the FOXO1 gene haplotype GGTT increases the risk of glucocorticoid-resistant. Trial registration: ChiCTR1900022419.
Collapse
Affiliation(s)
- Xingjuan Xie
- Hematologic Disease Laboratory, Beijing Key Laboratory of Pediatric Hematology Oncology, National Key Discipline of Pediatrics (Capital Medical University); Key Laboratory of Major Diseases in Children, Ministry of Education; Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Hao Gu
- Hematologic Disease Laboratory, Beijing Key Laboratory of Pediatric Hematology Oncology, National Key Discipline of Pediatrics (Capital Medical University); Key Laboratory of Major Diseases in Children, Ministry of Education; Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
- Department of Immunology, Ministry of Education Key Laboratory of Major Diseases in Children, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Jingyao Ma
- Department of Hematology, Beijing Key Laboratory of Pediatric Hematology Oncology; Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Lingling Fu
- Department of Hematology, Beijing Key Laboratory of Pediatric Hematology Oncology; Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Jie Ma
- Department of Hematology, Beijing Key Laboratory of Pediatric Hematology Oncology; Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Jialu Zhang
- Department of Hematology, Beijing Key Laboratory of Pediatric Hematology Oncology; Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Runhui Wu
- Department of Hematology, Beijing Key Laboratory of Pediatric Hematology Oncology; Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Zhenping Chen
- Hematologic Disease Laboratory, Beijing Key Laboratory of Pediatric Hematology Oncology, National Key Discipline of Pediatrics (Capital Medical University); Key Laboratory of Major Diseases in Children, Ministry of Education; Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| |
Collapse
|
2
|
PSnpBind-ML: predicting the effect of binding site mutations on protein-ligand binding affinity. J Cheminform 2023; 15:31. [PMID: 36864534 PMCID: PMC9983232 DOI: 10.1186/s13321-023-00701-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 02/17/2023] [Indexed: 03/04/2023] Open
Abstract
Protein mutations, especially those which occur in the binding site, play an important role in inter-individual drug response and may alter binding affinity and thus impact the drug's efficacy and side effects. Unfortunately, large-scale experimental screening of ligand-binding against protein variants is still time-consuming and expensive. Alternatively, in silico approaches can play a role in guiding those experiments. Methods ranging from computationally cheaper machine learning (ML) to the more expensive molecular dynamics have been applied to accurately predict the mutation effects. However, these effects have been mostly studied on limited and small datasets, while ideally a large dataset of binding affinity changes due to binding site mutations is needed. In this work, we used the PSnpBind database with six hundred thousand docking experiments to train a machine learning model predicting protein-ligand binding affinity for both wild-type proteins and their variants with a single-point mutation in the binding site. A numerical representation of the protein, binding site, mutation, and ligand information was encoded using 256 features, half of them were manually selected based on domain knowledge. A machine learning approach composed of two regression models is proposed, the first predicting wild-type protein-ligand binding affinity while the second predicting the mutated protein-ligand binding affinity. The best performing models reported an RMSE value within 0.5 [Formula: see text] 0.6 kcal/mol-1 on an independent test set with an R2 value of 0.87 [Formula: see text] 0.90. We report an improvement in the prediction performance compared to several reported models developed for protein-ligand binding affinity prediction. The obtained models can be used as a complementary method in early-stage drug discovery. They can be applied to rapidly obtain a better overview of the ligand binding affinity changes across protein variants carried by people in the population and narrow down the search space where more time-demanding methods can be used to identify potential leads that achieve a better affinity for all protein variants.
Collapse
|
3
|
Petrosino M, Novak L, Pasquo A, Chiaraluce R, Turina P, Capriotti E, Consalvi V. Analysis and Interpretation of the Impact of Missense Variants in Cancer. Int J Mol Sci 2021; 22:ijms22115416. [PMID: 34063805 PMCID: PMC8196604 DOI: 10.3390/ijms22115416] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/03/2021] [Accepted: 05/17/2021] [Indexed: 01/10/2023] Open
Abstract
Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer-driving variants. Our analysis suggests that the integration of experimental and computational approaches may contribute to evaluate the risk for complex disorders and develop more effective treatment strategies.
Collapse
Affiliation(s)
- Maria Petrosino
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Leonore Novak
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory FSN-TECFIS-DIM, 00044 Frascati, Italy;
| | - Roberta Chiaraluce
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Paola Turina
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
| | - Emidio Capriotti
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
- Correspondence: (E.C.); (V.C.)
| | - Valerio Consalvi
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
- Correspondence: (E.C.); (V.C.)
| |
Collapse
|
4
|
Richard M, Chuffart F, Duplus-Bottin H, Pouyet F, Spichty M, Fulcrand E, Entrevan M, Barthelaix A, Springer M, Jost D, Yvert G. Assigning function to natural allelic variation via dynamic modeling of gene network induction. Mol Syst Biol 2018; 14:e7803. [PMID: 29335276 PMCID: PMC5787706 DOI: 10.15252/msb.20177803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
More and more natural DNA variants are being linked to physiological traits. Yet, understanding what differences they make on molecular regulations remains challenging. Important properties of gene regulatory networks can be captured by computational models. If model parameters can be “personalized” according to the genotype, their variation may then reveal how DNA variants operate in the network. Here, we combined experiments and computations to visualize natural alleles of the yeast GAL3 gene in a space of model parameters describing the galactose response network. Alleles altering the activation of Gal3p by galactose were discriminated from those affecting its activity (production/degradation or efficiency of the activated protein). The approach allowed us to correctly predict that a non‐synonymous SNP would change the binding affinity of Gal3p with the Gal80p transcriptional repressor. Our results illustrate how personalizing gene regulatory models can be used for the mechanistic interpretation of genetic variants.
Collapse
Affiliation(s)
- Magali Richard
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France .,Univ. Grenoble Alpes, CNRS CHU Grenoble Alpes Grenoble INP TIMC-IMAG, Grenoble, France
| | - Florent Chuffart
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Hélène Duplus-Bottin
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Fanny Pouyet
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Martin Spichty
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Etienne Fulcrand
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Marianne Entrevan
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Audrey Barthelaix
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| | - Michael Springer
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Daniel Jost
- Univ. Grenoble Alpes, CNRS CHU Grenoble Alpes Grenoble INP TIMC-IMAG, Grenoble, France
| | - Gaël Yvert
- Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1 Université de Lyon, Lyon, France
| |
Collapse
|
5
|
Chen Z, Guo Z, Ma J, Liu F, Gao C, Liu S, Wang A, Wu R. STAT1 single nucleotide polymorphisms and susceptibility to immune thrombocytopenia. Autoimmunity 2015; 48:305-12. [PMID: 25707685 DOI: 10.3109/08916934.2015.1016218] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Primary immune thrombocytopenia (ITP) is an acquired autoimmune bleeding disorder. One of the key mediators of IFN-γ signaling is the signal transducer and activator of transcription 1 protein (STAT1). We evaluated the relationship between STAT1 gene single nucleotide polymorphisms (SNPs) and the associated risk of ITP in a prospective case-control study. A total of 548 children were recruited: 328 children with ITP and 220 healthy children as sex- and age-matched normal controls. The Sequenom MassArray system (Sequenom, San Diego, CA) was used to detect three SNPs genotypes in the STAT1 gene: rs10208033, rs12693591, and rs1467199. There is a statistically significant difference in STAT1 rs1467199 allele frequencies with comparison of each of the four clinical subgroups of ITP patients to the normal controls (p = 0.0432). Also, newly diagnosed ITP patients and chronic ITP patients demonstrate significant different genotypes (χ(2 )= 8.511, p = 0.0142) and allelic frequency (p = 0.0055). Although a positive STAT1 rs1467199 genotype subgroups to the STAT1 mRNA expression level cannot be established, there is a weak correlation between STAT1 mRNA level and the activity ratio of Type 1 T helper lymphocyte and Type 2 T helper lymphocyte (Th1/Th2 ratio) (p = 0.0544); correlation with IFN-γ alone did not reach statistical significance (p = 0.1715). The findings in our study suggest that STAT1 rs1467199 SNP plays a potential role in the IFN-γ dependent development of autoimmunity in children with ITP. The important clinical implication of STAT1 SNPs testing as a predictor of pediatric chronic ITP will be validated in future molecular and protein functional analysis.
Collapse
Affiliation(s)
- Zhenping Chen
- Beijing Key Laboratory of Pediatric Hematology Oncology, Capital Medical University , Beijing , China
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Talwar P, Silla Y, Grover S, Gupta M, Grewal GK, Kukreti R. Systems Pharmacology and Pharmacogenomics for Drug Discovery and Development. SYSTEMS AND SYNTHETIC BIOLOGY 2015. [DOI: 10.1007/978-94-017-9514-2_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
7
|
Klein A, Riazanov A, Hindle MM, Baker CJO. Benchmarking infrastructure for mutation text mining. J Biomed Semantics 2014; 5:11. [PMID: 24568600 PMCID: PMC3939821 DOI: 10.1186/2041-1480-5-11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Accepted: 02/05/2014] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. RESULTS We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. CONCLUSION We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Collapse
Affiliation(s)
- Artjom Klein
- Computer Science And Applied Statistics Department, University of New Brunswick, Saint John, Canada
| | | | - Matthew M Hindle
- Synthetic and Systems Biology, Edinburgh University, Edinburgh, UK
| | - Christopher JO Baker
- Computer Science And Applied Statistics Department, University of New Brunswick, Saint John, Canada
| |
Collapse
|
8
|
Yates CM, Sternberg MJE. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol 2013; 425:3949-63. [PMID: 23867278 DOI: 10.1016/j.jmb.2013.07.012] [Citation(s) in RCA: 152] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 07/02/2013] [Accepted: 07/09/2013] [Indexed: 12/23/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Sir Ernst Chain Building, Imperial College London, South Kensington, SW7 2AZ, UK.
| | | |
Collapse
|
9
|
Furlong LI. Human diseases through the lens of network biology. Trends Genet 2013; 29:150-9. [DOI: 10.1016/j.tig.2012.11.004] [Citation(s) in RCA: 150] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 10/24/2012] [Accepted: 11/09/2012] [Indexed: 12/13/2022]
|
10
|
Abstract
Cytoscape is an open-source software for visualizing, analyzing, and modeling biological networks. This chapter explains how to use Cytoscape to analyze the functional effect of sequence variations in the context of biological networks such as protein-protein interaction networks and signaling pathways. The chapter is divided into five parts: (1) obtaining information about the functional effect of sequence variation in a Cytoscape readable format, (2) loading and displaying different types of biological networks in Cytoscape, (3) integrating the genomic information (SNPs and mutations) with the biological networks, and (4) analyzing the effect of the genomic perturbation onto the network structure using Cytoscape built-in functions. Finally, we briefly outline how the integrated data can help in building mathematical network models for analyzing the effect of the sequence variation onto the dynamics of the biological system. Each part is illustrated by step-by-step instructions on an example use case and visualized by many screenshots and figures.
Collapse
|
11
|
Avillach P, Dufour JC, Diallo G, Salvo F, Joubert M, Thiessard F, Mougin F, Trifirò G, Fourrier-Réglat A, Pariente A, Fieschi M. Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project. J Am Med Inform Assoc 2012. [PMID: 23195749 DOI: 10.1136/amiajnl-2012-001083] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES The aim of this research was to automate the search of publications concerning adverse drug reactions (ADR) by defining the queries used to search MEDLINE and by determining the required threshold for the number of extracted publications to confirm the drug/event association in the literature. METHODS We defined an approach based on the medical subject headings (MeSH) 'descriptor records' and 'supplementary concept records' thesaurus, using the subheadings 'chemically induced' and 'adverse effects' with the 'pharmacological action' knowledge. An expert-built validation set of true positive and true negative drug/adverse event associations (n=61) was used to validate our method. RESULTS Using a threshold of three of more extracted publications, the automated search method presented a sensitivity of 90% and a specificity of 100%. For nine different drug/event pairs selected, the recall of the automated search ranged from 24% to 64% and the precision from 93% to 48%. CONCLUSIONS This work presents a method to find previously established relationships between drugs and adverse events in the literature. Using MEDLINE, following a MeSH approach to filter the signals, is a valid option. Our contribution is available as a web service that will be integrated in the final European EU-ADR project (Exploring and Understanding Adverse Drug Reactions by integrative mining of clinical records and biomedical knowledge) automated system.
Collapse
Affiliation(s)
- Paul Avillach
- LESIM, ISPED, University of Bordeaux, Bordeaux, France.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Naderi N, Witte R. Automated extraction and semantic analysis of mutation impacts from the biomedical literature. BMC Genomics 2012; 13 Suppl 4:S10. [PMID: 22759648 PMCID: PMC3395893 DOI: 10.1186/1471-2164-13-s4-s10] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. RESULTS We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. CONCLUSION We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions.
Collapse
Affiliation(s)
- Nona Naderi
- Semantic Software Lab, Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
| | | |
Collapse
|
13
|
Sequence variants and haplotype analysis of cat ERBB2 gene: a survey on spontaneous cat mammary neoplastic and non-neoplastic lesions. Int J Mol Sci 2012; 13:2783-2800. [PMID: 22489125 PMCID: PMC3317687 DOI: 10.3390/ijms13032783] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Revised: 01/29/2012] [Accepted: 02/24/2012] [Indexed: 02/02/2023] Open
Abstract
The human ERBB2 proto-oncogene is widely considered a key gene involved in human breast cancer onset and progression. Among spontaneous tumors, mammary tumors are the most frequent cause of cancer death in cats and second most frequent in humans. In fact, naturally occurring tumors in domestic animals, more particularly cat mammary tumors, have been proposed as a good model for human breast cancer, but critical genetic and molecular information is still scarce. The aims of this study include the analysis of the cat ERBB2 gene partial sequences (between exon 17 and 20) in order to characterize a normal and a mammary lesion heterogeneous populations. Cat genomic DNA was extracted from normal frozen samples (n = 16) and from frozen and formalin-fixed paraffin-embedded mammary lesion samples (n = 41). We amplified and sequenced two cat ERBB2 DNA fragments comprising exons 17 to 20. It was possible to identify five sequence variants and six haplotypes in the total population. Two sequence variants and two haplotypes show to be specific for cat mammary tumor samples. Bioinformatics analysis predicts that four of the sequence variants can produce alternative transcripts or activate cryptic splicing sites. Also, a possible association was identified between clinicopathological traits and the variant haplotypes. As far as we know, this is the first attempt to examine ERBB2 genetic variations in cat mammary genome and its possible association with the onset and progression of cat mammary tumors. The demonstration of a possible association between primary tumor size (one of the two most important prognostic factors) and the number of masses with the cat ERBB2 variant haplotypes reveal the importance of the analysis of this gene in veterinary medicine.
Collapse
|
14
|
Bhaskara RM, Srinivasan N. Stability of domain structures in multi-domain proteins. Sci Rep 2011; 1:40. [PMID: 22355559 PMCID: PMC3216527 DOI: 10.1038/srep00040] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2011] [Accepted: 06/27/2011] [Indexed: 01/22/2023] Open
Abstract
Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR.
Collapse
|
15
|
Riazanov A, Laurila JB, Baker CJO. Deploying mutation impact text-mining software with the SADI Semantic Web Services framework. BMC Bioinformatics 2011; 12 Suppl 4:S6. [PMID: 21992079 PMCID: PMC3194198 DOI: 10.1186/1471-2105-12-s4-s6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Mutation impact extraction is an important task designed to harvest relevant annotations from scientific documents for reuse in multiple contexts. Our previous work on text mining for mutation impacts resulted in (i) the development of a GATE-based pipeline that mines texts for information about impacts of mutations on proteins, (ii) the population of this information into our OWL DL mutation impact ontology, and (iii) establishing an experimental semantic database for storing the results of text mining. RESULTS This article explores the possibility of using the SADI framework as a medium for publishing our mutation impact software and data. SADI is a set of conventions for creating web services with semantic descriptions that facilitate automatic discovery and orchestration. We describe a case study exploring and demonstrating the utility of the SADI approach in our context. We describe several SADI services we created based on our text mining API and data, and demonstrate how they can be used in a number of biologically meaningful scenarios through a SPARQL interface (SHARE) to SADI services. In all cases we pay special attention to the integration of mutation impact services with external SADI services providing information about related biological entities, such as proteins, pathways, and drugs. CONCLUSION We have identified that SADI provides an effective way of exposing our mutation impact data such that it can be leveraged by a variety of stakeholders in multiple use cases. The solutions we provide for our use cases can serve as examples to potential SADI adopters trying to solve similar integration problems.
Collapse
Affiliation(s)
- Alexandre Riazanov
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada
| | - Jonas Bergman Laurila
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada
| | - Christopher JO Baker
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada
| |
Collapse
|
16
|
Bauer-Mehren A, Bundschus M, Rautschka M, Mayer MA, Sanz F, Furlong LI. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One 2011; 6:e20284. [PMID: 21695124 PMCID: PMC3114846 DOI: 10.1371/journal.pone.0020284] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Accepted: 04/27/2011] [Indexed: 02/05/2023] Open
Abstract
Background Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult. Principal Findings We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell. Conclusions For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases. Availability The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download.
Collapse
Affiliation(s)
- Anna Bauer-Mehren
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Markus Bundschus
- Institute for Computer Science, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Michael Rautschka
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Miguel A. Mayer
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
| | - Laura I. Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain
- * E-mail:
| |
Collapse
|
17
|
Laurila JB, Naderi N, Witte R, Riazanov A, Kouznetsov A, Baker CJO. Algorithms and semantic infrastructure for mutation impact extraction and grounding. BMC Genomics 2010; 11 Suppl 4:S24. [PMID: 21143808 PMCID: PMC3005927 DOI: 10.1186/1471-2164-11-s4-s24] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. Results We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. Conclusion We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.
Collapse
Affiliation(s)
- Jonas B Laurila
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, Canada.
| | | | | | | | | | | |
Collapse
|
18
|
Anderson MW, Schrijver I. Next generation DNA sequencing and the future of genomic medicine. Genes (Basel) 2010; 1:38-69. [PMID: 24710010 PMCID: PMC3960862 DOI: 10.3390/genes1010038] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2010] [Revised: 05/20/2010] [Accepted: 05/21/2010] [Indexed: 12/20/2022] Open
Abstract
In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpretation, laboratory workflow, data storage, and ethical considerations. This review describes the current high-throughput sequencing platforms commercially available, and compares the inherent advantages and disadvantages of each. The potential applications for clinical diagnostics are considered, as well as the need for software and analysis tools to interpret the vast amount of data generated. Finally, we discuss the clinical and ethical implications of the wealth of genetic information generated by these methods. Despite the challenges, we anticipate that the evolution and refinement of high-throughput DNA sequencing technologies will catalyze a new era of personalized medicine based on individualized genomic analysis.
Collapse
Affiliation(s)
- Matthew W Anderson
- Department of Pathology, Stanford University Medical Center, 300 Pasteur Drive, Room L235, Stanford, CA 94305-5627, USA.
| | - Iris Schrijver
- Department of Pathology, Stanford University Medical Center, 300 Pasteur Drive, Room L235, Stanford, CA 94305-5627, USA.
| |
Collapse
|
19
|
Pattin KA, Moore JH. Role for protein-protein interaction databases in human genetics. Expert Rev Proteomics 2010; 6:647-59. [PMID: 19929610 DOI: 10.1586/epr.09.86] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proteomics and the study of protein-protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein-protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein-protein interactions in human genetics and genetic epidemiology. Since protein-protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies.
Collapse
Affiliation(s)
- Kristine A Pattin
- Computational Genetics Laboratory and Department of Genetics, Dartmouth Medical School, Lebanon, NH, USA.
| | | |
Collapse
|
20
|
Baker CJO, Rebholz-Schuhmann D. Between proteins and phenotypes: annotation and interpretation of mutations. BMC Bioinformatics 2009; 10 Suppl 8:I1. [PMID: 19758463 PMCID: PMC2745581 DOI: 10.1186/1471-2105-10-s8-i1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|