1
|
Quaye JA, Moni BM, Kugblenu JE, Gadda G. Oxidation of α-hydroxy acids by D-2-hydroxyglutarate dehydrogenase enzymes. Arch Biochem Biophys 2025; 768:110355. [PMID: 39993590 DOI: 10.1016/j.abb.2025.110355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/15/2025] [Accepted: 02/21/2025] [Indexed: 02/26/2025]
Abstract
α-Hydroxy acids are naturally occurring organic molecules with various medical and industrial applications. However, some α-hydroxy acids, like D-2-hydroxyglutarate (D2HG), have been implicated in cancers and neurometabolic disorders such as D2HG aciduria. Several studies on the D2HG oxidizing enzyme D-2-hydroxyglutarate dehydrogenase (D2HGDH) from various eukaryotic and prokaryotic sources focus on the use and application of the enzyme as biosensors for detecting D2HG. A recent gene knockout study on the bacterial D2HGDH homologs from Pseudomonas stutzeri and Pseudomonas aeruginosa identified the D2HGDH to be essential for bacterial survival by driving l-serine biosynthesis. Thus, D2HGDH is a good candidate for a therapeutic target against the multidrug-resistant P. aeruginosa. However, there is no consensus on the D2HGDH catalytic mechanism, and several D2HGDH homologs have not been characterized in their structural properties, which are two crucial features for therapeutic design. P. aeruginosa D2HGDH, the most extensively studied D2HGDH homolog, is emerging as a paradigm for D2HGDH and flavoproteins with metal ions in their active site. In this review, we have explored the structures of all published D2HGDH homologs from 12 species using AlphaFold 3 and highlighted the fully conserved structure and active site topologies of all D2HGDH homologs. Additionally, evolutionary and functional studies coupled with analyses of enzymatic activities reveal that prokaryotic and eukaryotic D2HGDH homologs, diverging from two distinct ancestors, may have differentially evolved to specialize in their α-hydroxy acid catalysis. Additionally, this review identifies all D2HGDH homologs as metal and FAD-dependent enzymes that employ a metal-triggered FAD reduction in their catalysis. Elucidation of the D2HGDH mechanism will allow designing antibiotics that target these enzymes as potential therapeutics against pathogenic bacteria like P. aeruginosa in addition to the application of D2HGDH homologs as biosensors.
Collapse
Affiliation(s)
- Joanna Afokai Quaye
- Departments of Chemistry, Georgia State University, Atlanta, GA, 30302-3965, USA
| | - Bilkis Mehrin Moni
- Departments of Chemistry, Georgia State University, Atlanta, GA, 30302-3965, USA; The Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA, 30302-3965, USA
| | | | - Giovanni Gadda
- Departments of Chemistry, Georgia State University, Atlanta, GA, 30302-3965, USA; Departments of Biology, Georgia State University, Atlanta, GA, 30302-3965, USA; The Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA, 30302-3965, USA.
| |
Collapse
|
2
|
Suraci CM, Morrison ML, Roth MB. Oxygen is toxic in the cold in C. elegans. Front Physiol 2024; 15:1471249. [PMID: 39777359 PMCID: PMC11703811 DOI: 10.3389/fphys.2024.1471249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 12/05/2024] [Indexed: 01/11/2025] Open
Abstract
Introduction Temperature and oxygen are two factors that profoundly affect survival limits of animals; too much or too little of either is lethal. However, humans and other animals can exhibit exceptional survival when oxygen and temperature are simultaneously low. This research investigates the role of oxygen in the cold shock death of Caenorhabditis elegans. Methods The survival of C. elegans populations in combinations of oxygen concentrations and was assayed. Additionally, the effect of cold acclimatization, mutations in the cold acclimatization pathway, compounds, and antioxidant proteins on survival in low temperatures and high oxygen were investigated. Results We demonstrate that C. elegans have increased survival in 2°C when deprived of oxygen, and an increase to just 0.25 kPa of oxygen decreased survival. Additionally, we show that oxygen toxicity produced by a 35-fold increase above atmospheric oxygen levels was fatal for nematodes in 8 h at room temperature and 2 h at 2°C. We found that cold acclimatization and mutations in the cold acclimatization pathway improve survival in room temperature oxygen toxicity. Furthermore, we found that the compounds glucose, manganese (II), and ascorbate improve both cold shock and high oxygen survival, while the antioxidant proteins catalase and peroxiredoxin are essential to wild type survival in these conditions. Discussion Our results suggest that oxygen toxicity contributes to the death of C. elegans during cold shock. The changes in survival induced by cold acclimatization and mutations in the cold acclimatization pathway suggest that oxygen toxicity in the cold exerts evolutionary pressure, leading to the development of protections against it. Additionally, the resistance provided by diverse compounds and antioxidant proteins in both low temperature and high oxygen suggests these conditions have similar chemical environments. We discuss evidence that similar phenomena may function in humans.
Collapse
Affiliation(s)
| | | | - Mark B. Roth
- Roth Lab, Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| |
Collapse
|
3
|
Taha K. Employing Machine Learning Techniques to Detect Protein Function: A Survey, Experimental, and Empirical Evaluations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1965-1986. [PMID: 39008392 DOI: 10.1109/tcbb.2024.3427381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
This review article delves deeply into the various machine learning (ML) methods and algorithms employed in discerning protein functions. Each method discussed is assessed for its efficacy, limitations, potential improvements, and future prospects. We present an innovative hierarchical classification system that arranges algorithms into intricate categories and unique techniques. This taxonomy is based on a tri-level hierarchy, starting with the methodology category and narrowing down to specific techniques. Such a framework allows for a structured and comprehensive classification of algorithms, assisting researchers in understanding the interrelationships among diverse algorithms and techniques. The study incorporates both empirical and experimental evaluations to differentiate between the techniques. The empirical evaluation ranks the techniques based on four criteria. The experimental assessments rank: (1) individual techniques under the same methodology sub-category, (2) different sub-categories within the same category, and (3) the broad categories themselves. Integrating the innovative methodological classification, empirical findings, and experimental assessments, the article offers a well-rounded understanding of ML strategies in protein function identification. The paper also explores techniques for multi-task and multi-label detection of protein functions, in addition to focusing on single-task methods. Moreover, the paper sheds light on the future avenues of ML in protein function determination.
Collapse
|
4
|
Ulusoy E, Doğan T. Mutual annotation-based prediction of protein domain functions with Domain2GO. Protein Sci 2024; 33:e4988. [PMID: 38757367 PMCID: PMC11099699 DOI: 10.1002/pro.4988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/25/2024] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.
Collapse
Affiliation(s)
- Erva Ulusoy
- Biological Data Science Lab, Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
- Department of BioinformaticsGraduate School of Health Sciences, Hacettepe UniversityAnkaraTurkey
| | - Tunca Doğan
- Biological Data Science Lab, Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
- Department of BioinformaticsGraduate School of Health Sciences, Hacettepe UniversityAnkaraTurkey
| |
Collapse
|
5
|
Chen L, Han W, Jing W, Feng M, Zhou Q, Cheng X. Novel anti- Acanthamoeba effects elicited by a repurposed poly (ADP-ribose) polymerase inhibitor AZ9482. Front Cell Infect Microbiol 2024; 14:1414135. [PMID: 38863831 PMCID: PMC11165085 DOI: 10.3389/fcimb.2024.1414135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Introduction Acanthamoeba infection is a serious public health concern, necessitating the development of effective and safe anti-Acanthamoeba chemotherapies. Poly (ADP-ribose) polymerases (PARPs) govern a colossal amount of biological processes, such as DNA damage repair, protein degradation and apoptosis. Multiple PARP-targeted compounds have been approved for cancer treatment. However, repurposing of PARP inhibitors to treat Acanthamoeba is poorly understood. Methods In the present study, we attempted to fill these knowledge gaps by performing anti-Acanthamoeba efficacy assays, cell biology experiments, bioinformatics, and transcriptomic analyses. Results Using a homology model of Acanthamoeba poly (ADP-ribose) polymerases (PARPs), molecular docking of approved drugs revealed three potential inhibitory compounds: olaparib, venadaparib and AZ9482. In particular, venadaparib exhibited superior docking scores (-13.71) and favorable predicted binding free energy (-89.28 kcal/mol), followed by AZ9482, which showed a docking score of -13.20 and a binding free energy of -92.13 kcal/mol. Notably, the positively charged cyclopropylamine in venadaparib established a salt bridge (through E535) and a hydrogen bond (via N531) within the binding pocket. For comparison, AZ9482 was well stacked by the surrounding aromatic residues including H625, Y652, Y659 and Y670. In an assessment of trophozoites viability, AZ9482 exhibited a dose-and time-dependent anti-trophozoite effect by suppressing Acanthamoeba PARP activity, unlike olaparib and venadaparib. An Annexin V-fluorescein isothiocyanate/propidium iodide apoptosis assay revealed AZ9482 induced trophozoite necrotic cell death rather than apoptosis. Transcriptomics analyses conducted on Acanthamoeba trophozoites treated with AZ9482 demonstrated an atlas of differentially regulated proteins and genes, and found that AZ9482 rapidly upregulates a multitude of DNA damage repair pathways in trophozoites, and intriguingly downregulates several virulent genes. Analyzing gene expression related to DNA damage repair pathway and the rate of apurinic/apyrimidinic (AP) sites indicated DNA damage efficacy and repair modulation in Acanthamoeba trophozoites following AZ9482 treatment. Discussion Collectively, these findings highlight AZ9482, as a structurally unique PARP inhibitor, provides a promising prototype for advancing anti-Acanthamoeba drug research.
Collapse
Affiliation(s)
- Lijun Chen
- Department of Medical Microbiology and Parasitology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Wei Han
- Research Center for Intelligent Computing Platforms, Zhejiang Lab, Hangzhou, China
| | - Wenwen Jing
- Department of Medical Microbiology and Parasitology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Meng Feng
- Department of Medical Microbiology and Parasitology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qingtong Zhou
- Department of Pharmacology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Xunjia Cheng
- Department of Medical Microbiology and Parasitology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
6
|
Garcia CA, Gardner JG. RNAseq analysis of Cellvibrio japonicus during starch utilization differentiates between genes encoding carbohydrate active enzymes controlled by substrate detection or growth rate. Microbiol Spectr 2023; 11:e0245723. [PMID: 37800973 PMCID: PMC10714805 DOI: 10.1128/spectrum.02457-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/22/2023] [Indexed: 10/07/2023] Open
Abstract
IMPORTANCE Understanding the bacterial metabolism of starch is important as this polysaccharide is a ubiquitous ingredient in foods, supplements, and medicines, all of which influence gut microbiome composition and health. Our RNAseq and growth data set provides a valuable resource to those who want to better understand the regulation of starch utilization in Gram-negative bacteria. These data are also useful as they provide an example of how to approach studying a starch-utilizing bacterium that has many putative amylases by coupling transcriptomic data with growth assays to overcome the potential challenges of functional redundancy. The RNAseq data can also be used as a part of larger meta-analyses to compare how C. japonicus regulates carbohydrate active enzymes, or how this bacterium compares to gut microbiome constituents in terms of starch utilization potential.
Collapse
Affiliation(s)
- Cecelia A. Garcia
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, USA
| | - Jeffrey G. Gardner
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, USA
| |
Collapse
|
7
|
Cankara F, Doğan T. ASCARIS: Positional feature annotation and protein structure-based representation of single amino acid variations. Comput Struct Biotechnol J 2023; 21:4743-4758. [PMID: 37822561 PMCID: PMC10562615 DOI: 10.1016/j.csbj.2023.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 09/15/2023] [Accepted: 09/15/2023] [Indexed: 10/13/2023] Open
Abstract
Background Genomic variations may cause deleterious effects on protein functionality and perturb biological processes. Elucidating the effects of variations is critical for developing novel treatment strategies for diseases of genetic origin. Computational approaches have been aiding the work in this field by modeling and analyzing the mutational landscape. However, new approaches are required, especially for accurate representation and data-centric analysis of sequence variations. Method In this study, we propose ASCARIS (Annotation and StruCture-bAsed RepresentatIon of Single amino acid variations), a method for the featurization (i.e., quantitative representation) of single amino acid variations (SAVs), which could be used for a variety of purposes, such as predicting their functional effects or building multi-omics-based integrative models. ASCARIS utilizes the direct and spatial correspondence between the location of the SAV on the sequence/structure and 30 different types of positional feature annotations (e.g., active/lipidation/glycosylation sites; calcium/metal/DNA binding, inter/transmembrane regions, etc.), along with structural features and physicochemical properties. The main novelty of this method lies in constructing reusable numerical representations of SAVs via functional annotations. Results We statistically analyzed the relationship between these features and the consequences of variations and found that each carries information in this regard. To investigate potential applications of ASCARIS, we trained variant effect prediction models that utilize our SAV representations as input. We carried out an ablation study and a comparison against the state-of-the-art methods and observed that ASCARIS has a competing and complementary performance against widely-used predictors. ASCARIS can be used alone or in combination with other approaches to represent SAVs from a functional perspective. ASCARIS is available as a programmatic tool at https://github.com/HUBioDataLab/ASCARIS and as a web-service at https://huggingface.co/spaces/HUBioDataLab/ASCARIS.
Collapse
Affiliation(s)
- Fatma Cankara
- Biological Data Science Laboratory, Dept. of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
- Department of Computational Sciences and Engineering, Koc University, Istanbul, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Dept. of Computer Engineering, Hacettepe University, Ankara, Turkey
- Institute of Informatics, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
8
|
Fekete FJ, Marotta NJ, Liu X, Weinert EE. An O 2-sensing diguanylate cyclase broadly affects the aerobic transcriptome in the phytopathogen Pectobacterium carotovorum. Front Microbiol 2023; 14:1134742. [PMID: 37485529 PMCID: PMC10360401 DOI: 10.3389/fmicb.2023.1134742] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 06/26/2023] [Indexed: 07/25/2023] Open
Abstract
Pectobacterium carotovorum is an important plant pathogen responsible for the destruction of crops through bacterial soft rot, which is modulated by oxygen (O2) concentration. A soluble globin coupled sensor protein, Pcc DgcO (also referred to as PccGCS) is one way through which P. carotovorum senses oxygen. DgcO contains a diguanylate cyclase output domain producing c-di-GMP. Synthesis of the bacterial second messenger c-di-GMP is increased upon oxygen binding to the sensory globin domain. This work seeks to understand regulation of function by DgcO at the transcript level. RNA sequencing and differential expression analysis revealed that the deletion of DgcO only affects transcript levels in cells grown under aerobic conditions. Differential expression analysis showed that DgcO deletion alters transcript levels for metal transporters. These results, followed by inductively coupled plasma-mass spectrometry showing decreased concentrations of six biologically relevant metals upon DgcO deletion, provide evidence that a globin coupled sensor can affect cellular metal content. These findings improve the understanding of the transcript level control of O2-dependent phenotypes in an important phytopathogen and establish a basis for further studies on c-di-GMP-dependent functions in P. carotovorum.
Collapse
Affiliation(s)
- Florian J. Fekete
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, United States
| | - Nick J. Marotta
- Graduate Program in Molecular, Cellular, and Integrative Biosciences, Penn State University, University Park, PA, United States
| | - Xuanyu Liu
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, United States
| | - Emily E. Weinert
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, United States
- Department of Chemistry, Penn State University, University Park, PA, United States
| |
Collapse
|
9
|
Gedikbasi A, Toksoy G, Karaca M, Gulec C, Balci MC, Gunes D, Gunes S, Aslanger AD, Unverengil G, Karaman B, Basaran S, Demirkol M, Gokcay GF, Uyguner ZO. Clinical and bi-genomic DNA findings of patients suspected to have mitochondrial diseases. Front Genet 2023; 14:1191159. [PMID: 37377599 PMCID: PMC10292751 DOI: 10.3389/fgene.2023.1191159] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 05/02/2023] [Indexed: 06/29/2023] Open
Abstract
Background: Mitochondrial diseases are the most common group of inherited metabolic disorders, causing difficulties in definite diagnosis due to clinical and genetic heterogeneity. Clinical components are predominantly associated with pathogenic variants shown in nuclear or mitochondrial genomes that affect vital respiratory chain function. The development of high-throughput sequencing technologies has accelerated the elucidation of the genetic etiology of many genetic diseases that previously remained undiagnosed. Methods: Thirty affected patients from 24 unrelated families with clinical, radiological, biochemical, and histopathological evaluations considered for mitochondrial diseases were investigated. DNA isolated from the peripheral blood samples of probands was sequenced for nuclear exome and mitochondrial DNA (mtDNA) analyses. MtDNA sequencing was also performed from the muscle biopsy material in one patient. For segregation, Sanger sequencing is performed for pathogenic alterations in five other affected family members and healthy parents. Results: Exome sequencing revealed 14 different pathogenic variants in nine genes encoding mitochondrial function peptides (AARS2, EARS2, ECHS1, FBXL4, MICOS13, NDUFAF6, OXCT1, POLG, and TK2) in 12 patients from nine families and four variants in genes encoding important for muscle structure (CAPN3, DYSF, and TCAP) in six patients from four families. Three probands carried pathogenic mtDNA variations in two genes (MT-ATP6 and MT-TL1). Nine variants in five genes are reported for the first time with disease association: (AARS2: c.277C>T/p.(R93*), c.845C>G/p.(S282C); EARS2: c.319C>T/p.(R107C), c.1283delC/p.(P428Lfs*); ECHS1: c.161G>A/p.(R54His); c.202G>A/p.(E68Lys); NDUFAF6: c.479delA/p.(N162Ifs*27); and OXCT1: c.1370C>T/p.(T457I), c.1173-139G>T/p.(?). Conclusion: Bi-genomic DNA sequencing clarified genetic etiology in 67% (16/24) of the families. Diagnostic utility by mtDNA sequencing in 13% (3/24) and exome sequencing in 54% (13/24) of the families prioritized searching for nuclear genome pathologies for the first-tier test. Weakness and muscle wasting observed in 17% (4/24) of the families underlined that limb-girdle muscular dystrophy, similar to mitochondrial myopathy, is an essential point for differential diagnosis. The correct diagnosis is crucial for comprehensive genetic counseling of families. Also, it contributes to making treatment-helpful referrals, such as ensuring early access to medication for patients with mutations in the TK2 gene.
Collapse
Affiliation(s)
- Asuman Gedikbasi
- Department of Pediatric Basic Sciences, Institute of Child Health Istanbul University, Istanbul, Türkiye
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Guven Toksoy
- Department of Medical Genetics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Meryem Karaca
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Cagri Gulec
- Department of Medical Genetics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Mehmet Cihan Balci
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Dilek Gunes
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Seda Gunes
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Ayca Dilruba Aslanger
- Department of Medical Genetics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Gokcen Unverengil
- Department of Pathology, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Birsen Karaman
- Department of Pediatric Basic Sciences, Institute of Child Health Istanbul University, Istanbul, Türkiye
- Department of Medical Genetics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Seher Basaran
- Department of Medical Genetics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Mubeccel Demirkol
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Gulden Fatma Gokcay
- Division of Pediatric Nutrition and Metabolism, Department of Pediatrics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| | - Zehra Oya Uyguner
- Department of Medical Genetics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye
| |
Collapse
|
10
|
Dosch J, Bergmann H, Tran V, Ebersberger I. FAS: assessing the similarity between proteins using multi-layered feature architectures. Bioinformatics 2023; 39:btad226. [PMID: 37084276 PMCID: PMC10185405 DOI: 10.1093/bioinformatics/btad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/23/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023] Open
Abstract
MOTIVATION Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. RESULTS Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. AVAILABILITY AND IMPLEMENTATION FAS is available as python package: https://pypi.org/project/greedyFAS/.
Collapse
Affiliation(s)
- Julian Dosch
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Holger Bergmann
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Vinh Tran
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Goethe University Frankfurt, Faculty of Biosciences, Institute of Cell Biology and Neuroscience, Frankfurt, 60438, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, 60325, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt, 60325, Germany
| |
Collapse
|
11
|
Özdilek AS, Atakan A, Özsarı G, Acar A, Atalay MV, Doğan T, Rifaioğlu AS. ProFAB-open protein functional annotation benchmark. Brief Bioinform 2023; 24:7025464. [PMID: 36736370 DOI: 10.1093/bib/bbac627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 11/12/2022] [Accepted: 12/25/2022] [Indexed: 02/05/2023] Open
Abstract
As the number of protein sequences increases in biological databases, computational methods are required to provide accurate functional annotation with high coverage. Although several machine learning methods have been proposed for this purpose, there are still two main issues: (i) construction of reliable positive and negative training and validation datasets, and (ii) fair evaluation of their performances based on predefined experimental settings. To address these issues, we have developed ProFAB: Open Protein Functional Annotation Benchmark, which is a platform providing an infrastructure for a fair comparison of protein function prediction methods. ProFAB provides filtered and preprocessed protein annotation datasets and enables the training and evaluation of function prediction methods via several options. We believe that ProFAB will be useful for both computational and experimental researchers by enabling the utilization of ready-to-use datasets and machine learning algorithms for protein function prediction based on Gene Ontology terms and Enzyme Commission numbers. ProFAB is available at https://github.com/kansil/ProFAB and https://profab.kansil.org.
Collapse
Affiliation(s)
- A Samet Özdilek
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ahmet Atakan
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
- Department of Computer Engineering, Erzincan Binali Yıldırım University, Erzincan, Turkey
| | - Gökhan Özsarı
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
- Department of Computer Engineering, Niğde Ömer Halisdemir University, Niğde, Turkey
| | - Aybar Acar
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - M Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Tunca Doğan
- Department of Computer Engineering and Artificial Intelligence Engineering, Hacettepe University, Ankara, Turkey
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - Ahmet S Rifaioğlu
- Department of Electrical-Electronics Engineering, İskenderun Technical University, Hatay, Turkey
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University and Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
12
|
Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023; 15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.
Collapse
Affiliation(s)
- Heval Atas Guvenilir
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.
- Institute of Informatics, Hacettepe University, Ankara, Turkey.
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
13
|
Ciray F, Doğan T. Machine learning-based prediction of drug approvals using molecular, physicochemical, clinical trial, and patent-related features. Expert Opin Drug Discov 2022; 17:1425-1441. [PMID: 36444655 DOI: 10.1080/17460441.2023.2153830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND Drug development productivity has been declining lately due to elevated costs and reduced discovery rates. Therefore, pharmaceutical companies have been seeking alternative ways to determine and evaluate drug candidates. RESEARCH DESIGN AND METHODS In this work, we proposed a new computational approach to directly predict the regulatory approval of drug candidates, and implemented it as a method called 'DrugApp.' To accomplish this task, we employed multiple types of features including molecular and physicochemical properties of drug candidates, together with clinical trial and patent-related features, which are then processed by random forest classifiers to train our disease group-specific approval prediction models. RESULTS Our evaluations indicated DrugApp has a high and robust prediction performance. Within a use-case study, we showed our method can predict phase IV trial drugs that are later withdrawn from the market due to severe side effects. Finally, we used DrugApp models to forecast the approval of drug candidates that are currently in phases I/II/III of clinical trials. CONCLUSIONS We hope that our study will aid the research community in terms of evaluating and improving the process of drug development. The datasets, source code, results, and pre-trained models of DrugApp are freely available at https://github.com/HUBioDataLab/DrugApp.
Collapse
Affiliation(s)
- Fulya Ciray
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,Department of Health Informatics, Institute of Informatics, Hacettepe University, Ankara, Turkey.,Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
14
|
Ding R, He M, Huang H, Chen J, Huang M, Su Y. An 85-amino-acid polypeptide from Myrmeleon bore larvae (antlions) homologous to heat shock factor binding protein 1 with antiproliferative activity against MG-63 osteosarcoma cells in vitro. ASIAN BIOMED 2022; 16:201-211. [PMID: 37551169 PMCID: PMC10321181 DOI: 10.2478/abm-2022-0024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background Venomous arthropods have substances in their venom with antiproliferative potential for neoplastic cells. Objectives To identify a polypeptide from Myrmeleon bore (antlion) with antiproliferative activity against neoplastic cells, and to elucidate the molecular mechanism of the activity. Methods We used gel filtration and ion exchange chromatography to purify a polypeptide with antiproliferative activity against MG-63 human osteosarcoma cells from a proteinaceous extract of antlion. The polypeptide was sequenced and the stability of its antiproliferative activity was tested under a range of conditions in vitro. An 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) assay was used to determine the antiproliferative activity of the polypeptide against the MG-63 osteosarcoma cells and MC3T3-E1 mouse calvarial osteoblasts, which were used as a non-neoplastic control. We used western blotting to compare the levels of expression of heat shock transcription factor 1 (HSF1), heat shock protein 90 (HSP90), cyclin-dependent kinase 4 (CDK4), and protein kinase B alpha (ATK1) in MG-63 osteosarcoma cells and their mouse homologs in MC3T3-E1 osteoblasts after their treatment with the antlion antiproliferative polypeptide (ALAPP). Results The 85-amino-acid ALAPP has a 56% sequence identity with the human heat shock factor binding protein 1 (HSBP1). The antiproliferative activity of the polypeptide is relatively insensitive to temperature, pH, and metal ions. ALAPP has a strong concentration-dependent antiproliferative activity against MG-63 osteosarcoma cells compared with its effect on MC3T3-E1 osteoblasts. ALAPP significantly upregulates the expression of HSF1 in MC3T3-EL osteoblasts, but not in MG-63 osteosarcoma. ALAPP significantly downregulated the expression of HSP90, CDK4, and AKT1 expression in MG-63 osteosarcoma, but not in the osteoblasts. Conclusions ALAPP has significant antiproliferative activity against MG-63 osteosarcoma cells, but not nonneoplastic MC3T3-E1 osteoblasts. We speculate that non-neoplastic cells may evade the antiproliferative effect of ALAPP by upregulating HSF1 to maintain their HSP90, CDK4, and AKT1 expression at a relatively constant level.
Collapse
Affiliation(s)
- Rui Ding
- Department of General Surgery, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, Guangdong519000, China
| | - Ming He
- Department of General Surgery, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, Guangdong519000, China
| | - Huoying Huang
- Department of Biological Engineering, School of Biomedical and Pharmaceutical Science, Guangdong University of Technology, Guangzhou, Guangdong510006, China
| | - Jing Chen
- Department of Biological Engineering, School of Biomedical and Pharmaceutical Science, Guangdong University of Technology, Guangzhou, Guangdong510006, China
| | - Mingxing Huang
- Department of Biological Engineering, School of Biomedical and Pharmaceutical Science, Guangdong University of Technology, Guangzhou, Guangdong510006, China
| | - Yonghui Su
- Department of General Surgery, The Fifth Affiliated Hospital, Sun Yat-Sen University, Zhuhai, Guangdong519000, China
| |
Collapse
|
15
|
Doğan T, Akhan Güzelcan E, Baumann M, Koyas A, Atas H, Baxendale IR, Martin M, Cetin-Atalay R. Protein domain-based prediction of drug/compound-target interactions and experimental validation on LIM kinases. PLoS Comput Biol 2021; 17:e1009171. [PMID: 34843456 PMCID: PMC8659301 DOI: 10.1371/journal.pcbi.1009171] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 12/09/2021] [Accepted: 11/09/2021] [Indexed: 12/23/2022] Open
Abstract
Predictive approaches such as virtual screening have been used in drug discovery with the objective of reducing developmental time and costs. Current machine learning and network-based approaches have issues related to generalization, usability, or model interpretability, especially due to the complexity of target proteins' structure/function, and bias in system training datasets. Here, we propose a new method "DRUIDom" (DRUg Interacting Domain prediction) to identify bio-interactions between drug candidate compounds and targets by utilizing the domain modularity of proteins, to overcome problems associated with current approaches. DRUIDom is composed of two methodological steps. First, ligands/compounds are statistically mapped to structural domains of their target proteins, with the aim of identifying their interactions. As such, other proteins containing the same mapped domain or domain pair become new candidate targets for the corresponding compounds. Next, a million-scale dataset of small molecule compounds, including those mapped to domains in the previous step, are clustered based on their molecular similarities, and their domain associations are propagated to other compounds within the same clusters. Experimentally verified bioactivity data points, obtained from public databases, are meticulously filtered to construct datasets of active/interacting and inactive/non-interacting drug/compound-target pairs (~2.9M data points), and used as training data for calculating parameters of compound-domain mappings, which led to 27,032 high-confidence associations between 250 domains and 8,165 compounds, and a finalized output of ~5 million new compound-protein interactions. DRUIDom is experimentally validated by syntheses and bioactivity analyses of compounds predicted to target LIM-kinase proteins, which play critical roles in the regulation of cell motility, cell cycle progression, and differentiation through actin filament dynamics. We showed that LIMK-inhibitor-2 and its derivatives significantly block the cancer cell migration through inhibition of LIMK phosphorylation and the downstream protein cofilin. One of the derivative compounds (LIMKi-2d) was identified as a promising candidate due to its action on resistant Mahlavu liver cancer cells. The results demonstrated that DRUIDom can be exploited to identify drug candidate compounds for intended targets and to predict new target proteins based on the defined compound-domain relationships. Datasets, results, and the source code of DRUIDom are fully-available at: https://github.com/cansyl/DRUIDom.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Institute of Informatics, Hacettepe University, Ankara, Turkey
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ece Akhan Güzelcan
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
- Center for Genomics and Rare Diseases & Biobank for Rare Diseases, Hacettepe University, Ankara, Turkey
| | - Marcus Baumann
- School of Chemistry, University College Dublin, Dublin, Ireland
| | - Altay Koyas
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Heval Atas
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Ian R. Baxendale
- Department of Chemistry, University of Durham, Durham, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Rengul Cetin-Atalay
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
16
|
Exploring the Meta-regulon of the CRP/FNR Family of Global Transcriptional Regulators in a Partial-Nitritation Anammox Microbiome. mSystems 2021; 6:e0090621. [PMID: 34636676 PMCID: PMC8510549 DOI: 10.1128/msystems.00906-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microorganisms must respond to environmental changes to survive, often by controlling transcription initiation. Intermittent aeration during wastewater treatment presents a cyclically changing environment to which microorganisms must react. We used an intermittently aerated bioreactor performing partial nitritation and anammox (PNA) to investigate how the microbiome responds to recurring change. Meta-transcriptomic analysis revealed a dramatic disconnect between the relative DNA abundance and gene expression within the metagenome-assembled genomes (MAGs) of community members, suggesting the importance of transcriptional regulation in this microbiome. To explore how community members responded to cyclic aeration via transcriptional regulation, we searched for homologs of the catabolite repressor protein/fumarate and nitrate reductase regulatory protein (CRP/FNR) family of transcription factors (TFs) within the MAGs. Using phylogenetic analyses, evaluation of sequence conservation in important amino acid residues, and prediction of genes regulated by TFs in the MAGs, we identified homologs of the oxygen-sensing FNR in Nitrosomonas and Rhodocyclaceae, nitrogen-sensing dissimilative nitrate respiration regulator that responds to nitrogen species (DNR) in Rhodocyclaceae, and nitrogen-sensing nitrite and nitric oxide reductase regulator that responds to nitrogen species (NnrR) in Nitrospira MAGs. Our data also predict that CRP/FNR homologs in Ignavibacteria, Flavobacteriales, and Saprospiraceae MAGs sense carbon availability. In addition, a CRP/FNR homolog in a Brocadia MAG was most closely related to CRP TFs known to sense carbon sources in well-studied organisms. However, we predict that in autotrophic Brocadia, this TF most likely regulates a diverse set of functions, including a response to stress during the cyclic aerobic/anoxic conditions. Overall, this analysis allowed us to define a meta-regulon of the PNA microbiome that explains functions and interactions of the most active community members. IMPORTANCE Microbiomes are important contributors to many ecosystems, including ones where nutrient cycling is stimulated by aeration control. Optimizing cyclic aeration helps reduce energy needs and maximize microbiome performance during wastewater treatment; however, little is known about how most microbial community members respond to these alternating conditions. We defined the meta-regulon of a PNA microbiome by combining existing knowledge of how the CRP/FNR family of bacterial TFs respond to stimuli, with metatranscriptomic analyses to characterize gene expression changes during aeration cycles. Our results indicated that, for some members of the community, prior knowledge is sufficient for high-confidence assignments of TF function, whereas other community members have CRP/FNR TFs for which inferences of function are limited by lack of prior knowledge. This study provides a framework to begin elucidating meta-regulons in microbiomes, where pure cultures are not available for traditional transcriptional regulation studies. Defining the meta-regulon can help in optimizing microbiome performance.
Collapse
|
17
|
Huang J, Swieringa F, Solari FA, Provenzale I, Grassi L, De Simone I, Baaten CCFMJ, Cavill R, Sickmann A, Frontini M, Heemskerk JWM. Assessment of a complete and classified platelet proteome from genome-wide transcripts of human platelets and megakaryocytes covering platelet functions. Sci Rep 2021; 11:12358. [PMID: 34117303 PMCID: PMC8196183 DOI: 10.1038/s41598-021-91661-x] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/26/2021] [Indexed: 02/06/2023] Open
Abstract
Novel platelet and megakaryocyte transcriptome analysis allows prediction of the full or theoretical proteome of a representative human platelet. Here, we integrated the established platelet proteomes from six cohorts of healthy subjects, encompassing 5.2 k proteins, with two novel genome-wide transcriptomes (57.8 k mRNAs). For 14.8 k protein-coding transcripts, we assigned the proteins to 21 UniProt-based classes, based on their preferential intracellular localization and presumed function. This classified transcriptome-proteome profile of platelets revealed: (i) Absence of 37.2 k genome-wide transcripts. (ii) High quantitative similarity of platelet and megakaryocyte transcriptomes (R = 0.75) for 14.8 k protein-coding genes, but not for 3.8 k RNA genes or 1.9 k pseudogenes (R = 0.43-0.54), suggesting redistribution of mRNAs upon platelet shedding from megakaryocytes. (iii) Copy numbers of 3.5 k proteins that were restricted in size by the corresponding transcript levels (iv) Near complete coverage of identified proteins in the relevant transcriptome (log2fpkm > 0.20) except for plasma-derived secretory proteins, pointing to adhesion and uptake of such proteins. (v) Underrepresentation in the identified proteome of nuclear-related, membrane and signaling proteins, as well proteins with low-level transcripts. We then constructed a prediction model, based on protein function, transcript level and (peri)nuclear localization, and calculated the achievable proteome at ~ 10 k proteins. Model validation identified 1.0 k additional proteins in the predicted classes. Network and database analysis revealed the presence of 2.4 k proteins with a possible role in thrombosis and hemostasis, and 138 proteins linked to platelet-related disorders. This genome-wide platelet transcriptome and (non)identified proteome database thus provides a scaffold for discovering the roles of unknown platelet proteins in health and disease.
Collapse
Affiliation(s)
- Jingnan Huang
- Department of Biochemistry, CARIM, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands.
- Leibniz-Institut Für Analytische Wissenschaften-ISAS-E.V, Dortmund, Germany.
| | - Frauke Swieringa
- Department of Biochemistry, CARIM, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands
- Leibniz-Institut Für Analytische Wissenschaften-ISAS-E.V, Dortmund, Germany
| | - Fiorella A Solari
- Leibniz-Institut Für Analytische Wissenschaften-ISAS-E.V, Dortmund, Germany
| | - Isabella Provenzale
- Department of Biochemistry, CARIM, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands
| | - Luigi Grassi
- Department of Haematology, University of Cambridge, National Health Service Blood and Transplant (NHSBT), Cambridge Biomedical Campus, Cambridge, UK
| | - Ilaria De Simone
- Department of Biochemistry, CARIM, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands
| | - Constance C F M J Baaten
- Department of Biochemistry, CARIM, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands
- Institute for Molecular Cardiovascular Research (IMCAR), University Hospital RWTH, Aachen, Germany
| | - Rachel Cavill
- Department of Data Science and Knowledge Engineering, FSE, Maastricht University, Maastricht, The Netherlands
| | - Albert Sickmann
- Leibniz-Institut Für Analytische Wissenschaften-ISAS-E.V, Dortmund, Germany
- Medizinische Fakultät, Medizinische Proteom-Center, Ruhr-Universität Bochum, Germany
- Department of Chemistry, College of Physical Sciences, University of Aberdeen, Aberdeen, UK
| | - Mattia Frontini
- Department of Haematology, University of Cambridge, National Health Service Blood and Transplant (NHSBT), Cambridge Biomedical Campus, Cambridge, UK
- Institute of Biomedical & Clinical Science, College of Medicine and Health, University of Exeter Medical School, Exeter, UK
| | - Johan W M Heemskerk
- Department of Biochemistry, CARIM, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands.
| |
Collapse
|
18
|
The molecular basis for the pH-dependent calcium affinity of the pattern recognition receptor langerin. J Biol Chem 2021; 296:100718. [PMID: 33989634 PMCID: PMC8219899 DOI: 10.1016/j.jbc.2021.100718] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 04/12/2021] [Accepted: 04/27/2021] [Indexed: 02/07/2023] Open
Abstract
The C-type lectin receptor langerin plays a vital role in the mammalian defense against invading pathogens. Langerin requires a Ca2+ cofactor, the binding affinity of which is regulated by pH. Thus, Ca2+ is bound when langerin is on the membrane but released when langerin and its pathogen substrate traffic to the acidic endosome, allowing the substrate to be degraded. The change in pH is sensed by protonation of the allosteric pH sensor histidine H294. However, the mechanism by which Ca2+ is released from the buried binding site is not clear. We studied the structural consequences of protonating H294 by molecular dynamics simulations (total simulation time: about 120 μs) and Markov models. We discovered a relay mechanism in which a proton is moved into the vicinity of the Ca2+-binding site without transferring the initial proton from H294. Protonation of H294 unlocks a conformation in which a protonated lysine side chain forms a hydrogen bond with a Ca2+-coordinating aspartic acid. This destabilizes Ca2+ in the binding pocket, which we probed by steered molecular dynamics. After Ca2+ release, the proton is likely transferred to the aspartic acid and stabilized by a dyad with a nearby glutamic acid, triggering a conformational transition and thus preventing Ca2+ rebinding. These results show how pH regulation of a buried orthosteric binding site from a solvent-exposed allosteric pH sensor can be realized by information transfer through a specific chain of conformational arrangements.
Collapse
|
19
|
Scheuer K, Helbing C, Firkowska-Boden I, Jandt KD. Self-assembled fibrinogen–fibronectin hybrid protein nanofibers with medium-sensitive stability. RSC Adv 2021; 11:14113-14120. [PMID: 35423936 PMCID: PMC8697752 DOI: 10.1039/d0ra10749b] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/02/2021] [Indexed: 01/15/2023] Open
Abstract
Hybrid protein nanofibers (hPNFs) have been identified as promising nano building blocks for numerous applications in nanomedicine and tissue engineering. We have recently reported a nature-inspired, self-assembly route to create hPNFs from human plasma proteins, i.e., albumin and hemoglobin. However, it is still unclear whether the same route can be applied to other plasma proteins and whether it is possible to control the composition of the resulting fibers. In this context, to further understand the hPNFs self-assembly mechanism and to optimize their properties, we report herein on ethanol-induced self-assembly of two different plasma proteins, i.e., fibrinogen (FG) and fibronectin (FN). We show that by varying initial protein ratios, the composition and thus the properties of the resulting hPNFs can be fine-tuned. Specifically, atomic force microscopy, hydrodynamic diameter, and zeta potential data together revealed a strong correlation of the hPNFs dimensions and surface charge to their initial protein mixing ratio. The composition-independent prompt dissolution of hPNFs in ultrapure water, in contrast to their stability in PBS, indicates that the molecular arrangement of FN and FG in hPNFs is mainly based on electrostatic interactions. Supported by experimental data we introduce a feasible mechanism that explains the interactions between FN and FG and their self-assembly to hPNFs. These findings contribute to the understanding of dual protein interactions, which can be beneficial in designing innovative biomaterials with multifaceted biological and physical characteristics. Hybrid protein nanofibers (hPNFs) have been identified as promising nano building blocks for numerous applications in nanomedicine and tissue engineering.![]()
Collapse
Affiliation(s)
- Karl Scheuer
- Chair of Materials Science
- Otto Schott Institute of Materials Research
- Friedrich Schiller University Jena
- Germany
| | - Christian Helbing
- Chair of Materials Science
- Otto Schott Institute of Materials Research
- Friedrich Schiller University Jena
- Germany
| | - Izabela Firkowska-Boden
- Chair of Materials Science
- Otto Schott Institute of Materials Research
- Friedrich Schiller University Jena
- Germany
| | - Klaus D. Jandt
- Chair of Materials Science
- Otto Schott Institute of Materials Research
- Friedrich Schiller University Jena
- Germany
- Jena Center for Soft Matter
| |
Collapse
|
20
|
Karhadkar TR, Meek TD, Gomer RH. Inhibiting Sialidase-Induced TGF- β1 Activation Attenuates Pulmonary Fibrosis in Mice. J Pharmacol Exp Ther 2021; 376:106-117. [PMID: 33144389 PMCID: PMC7788355 DOI: 10.1124/jpet.120.000258] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 10/06/2020] [Indexed: 02/06/2023] Open
Abstract
The active form of transforming growth factor-β1 (TGF-β1) plays a key role in potentiating fibrosis. TGF-β1 is sequestered in an inactive state by a latency-associated glycopeptide (LAP). Sialidases (also called neuraminidases (NEU)) cleave terminal sialic acids from glycoconjugates. The sialidase NEU3 is upregulated in fibrosis, and mice lacking Neu3 show attenuated bleomycin-induced increases in active TGF-β1 in the lungs and attenuated pulmonary fibrosis. Here we observe that recombinant human NEU3 upregulates active human TGF-β1 by releasing active TGF-β1 from its latent inactive form by desialylating LAP. Based on the proposed mechanism of action of NEU3, we hypothesized that compounds with a ring structure resembling picolinic acid might be transition state analogs and thus possible NEU3 inhibitors. Some compounds in this class showed nanomolar IC50 for recombinant human NEU3 releasing active human TGF-β1 from the latent inactive form. The compounds given as daily 0.1-1-mg/kg injections starting at day 10 strongly attenuated lung inflammation, lung TGF-β1 upregulation, and pulmonary fibrosis at day 21 in a mouse bleomycin model of pulmonary fibrosis. These results suggest that NEU3 participates in fibrosis by desialylating LAP and releasing TGF-β1 and that the new class of NEU3 inhibitors are potential therapeutics for fibrosis. SIGNIFICANCE STATEMENT: The extracellular sialidase NEU3 appears to be a key driver of pulmonary fibrosis. The significance of this report is that 1) we show the mechanism (NEU3 desialylates the latency-associated glycopeptide protein that keeps the profibrotic cytokine transforming growth factor-β1 (TGF-β1) in an inactive state, causing active TGF-β1 release), 2) we then use the predicted NEU3 mechanism to identify nM IC50 NEU3 inhibitors, and 3) these new NEU3 inhibitors are potent therapeutics in a mouse model of pulmonary fibrosis.
Collapse
Affiliation(s)
- Tejas R Karhadkar
- Departments of Biology (T.R.K., R.H.G.) and Biochemistry and Biophysics (T.D.M.), Texas A&M University, College Station, Texas
| | - Thomas D Meek
- Departments of Biology (T.R.K., R.H.G.) and Biochemistry and Biophysics (T.D.M.), Texas A&M University, College Station, Texas
| | - Richard H Gomer
- Departments of Biology (T.R.K., R.H.G.) and Biochemistry and Biophysics (T.D.M.), Texas A&M University, College Station, Texas
| |
Collapse
|
21
|
Zandonadi FS, Ferreira SP, Alexandrino AV, Carnielli CM, Artier J, Barcelos MP, Nicolela NCS, Prieto EL, Goto LS, Belasque J, Novo-Mansur MTM. Periplasm-enriched fractions from Xanthomonas citri subsp. citri type A and X. fuscans subsp. aurantifolii type B present distinct proteomic profiles under in vitro pathogenicity induction. PLoS One 2020; 15:e0243867. [PMID: 33338036 PMCID: PMC7748154 DOI: 10.1371/journal.pone.0243867] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 11/29/2020] [Indexed: 12/24/2022] Open
Abstract
The causative agent of Asiatic citrus canker, the Gram-negative bacterium Xanthomonas citri subsp. citri (XAC), produces more severe symptoms and attacks a larger number of citric hosts than Xanthomonas fuscans subsp. aurantifolii XauB and XauC, the causative agents of cancrosis, a milder form of the disease. Here we report a comparative proteomic analysis of periplasmic-enriched fractions of XAC and XauB in XAM-M, a pathogenicity- inducing culture medium, for identification of differential proteins. Proteins were resolved by two-dimensional electrophoresis combined with liquid chromatography-mass spectrometry. Among the 12 proteins identified from the 4 unique spots from XAC in XAM-M (p<0.05) were phosphoglucomutase (PGM), enolase, xylose isomerase (XI), transglycosylase, NAD(P)H-dependent glycerol 3-phosphate dehydrogenase, succinyl-CoA synthetase β subunit, 6-phosphogluconate dehydrogenase, and conserved hypothetical proteins XAC0901 and XAC0223; most of them were not detected as differential for XAC when both bacteria were grown in NB medium, a pathogenicity non-inducing medium. XauB showed a very different profile from XAC in XAM-M, presenting 29 unique spots containing proteins related to a great diversity of metabolic pathways. Preponderant expression of PGM and XI in XAC was validated by Western Blot analysis in the periplasmic-enriched fractions of both bacteria. This work shows remarkable differences between the periplasmic-enriched proteomes of XAC and XauB, bacteria that cause symptoms with distinct degrees of severity during citrus infection. The results suggest that some proteins identified in XAC can have an important role in XAC pathogenicity.
Collapse
Affiliation(s)
- Flávia S. Zandonadi
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Sílvia P. Ferreira
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - André V. Alexandrino
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Carolina M. Carnielli
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Juliana Artier
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Mariana P. Barcelos
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Nicole C. S. Nicolela
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Evandro L. Prieto
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - Leandro S. Goto
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
| | - José Belasque
- Departamento de Fitopatologia e Nematologia, Escola Superior de Agricultura “Luiz de Queiroz”, Universidade de São Paulo, USP, Piracicaba, São Paulo, Brazil
| | - Maria Teresa Marques Novo-Mansur
- Laboratório de Bioquímica e Biologia Molecular Aplicada, Departamento de Genética e Evolução, Universidade Federal de São Carlos, UFSCar, São Carlos, São Paulo, Brazil
- * E-mail:
| |
Collapse
|
22
|
Abstract
The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
Affiliation(s)
- The UniProt Consortium
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland
- Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street NW, Suite 1200, Washington, DC 20007, USA
- Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark DE 19711, USA
- To whom correspondence should be addressed. Tel: +44 1223 494 100; Fax: +44 1223 494 468;
| |
Collapse
|
23
|
Rifaioglu AS, Nalbat E, Atalay V, Martin MJ, Cetin-Atalay R, Doğan T. DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem Sci 2020; 11:2531-2557. [PMID: 33209251 PMCID: PMC7643205 DOI: 10.1039/c9sc03414e] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 01/05/2020] [Indexed: 12/12/2022] Open
Abstract
The identification of physical interactions between drug candidate compounds and target biomolecules is an important process in drug discovery. Since conventional screening procedures are expensive and time consuming, computational approaches are employed to provide aid by automatically predicting novel drug-target interactions (DTIs). In this study, we propose a large-scale DTI prediction system, DEEPScreen, for early stage drug discovery, using deep convolutional neural networks. One of the main advantages of DEEPScreen is employing readily available 2-D structural representations of compounds at the input level instead of conventional descriptors that display limited performance. DEEPScreen learns complex features inherently from the 2-D representations, thus producing highly accurate predictions. The DEEPScreen system was trained for 704 target proteins (using curated bioactivity data) and finalized with rigorous hyper-parameter optimization tests. We compared the performance of DEEPScreen against the state-of-the-art on multiple benchmark datasets to indicate the effectiveness of the proposed approach and verified selected novel predictions through molecular docking analysis and literature-based validation. Finally, JAK proteins that were predicted by DEEPScreen as new targets of a well-known drug cladribine were experimentally demonstrated in vitro on cancer cells through STAT3 phosphorylation, which is the downstream effector protein. The DEEPScreen system can be exploited in the fields of drug discovery and repurposing for in silico screening of the chemogenomic space, to provide novel DTIs which can be experimentally pursued. The source code, trained "ready-to-use" prediction models, all datasets and the results of this study are available at ; https://github.com/cansyl/DEEPscreen.
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering , METU , Ankara , 06800 , Turkey . ; Tel: +903122105576
- Department of Computer Engineering , İskenderun Technical University , Hatay , 31200 , Turkey
- KanSiL , Department of Health Informatics , Graduate School of Informatics , METU , Ankara , 06800 , Turkey
| | - Esra Nalbat
- KanSiL , Department of Health Informatics , Graduate School of Informatics , METU , Ankara , 06800 , Turkey
| | - Volkan Atalay
- Department of Computer Engineering , METU , Ankara , 06800 , Turkey . ; Tel: +903122105576
- KanSiL , Department of Health Informatics , Graduate School of Informatics , METU , Ankara , 06800 , Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Hinxton , Cambridge , CB10 1SD , UK
| | - Rengul Cetin-Atalay
- KanSiL , Department of Health Informatics , Graduate School of Informatics , METU , Ankara , 06800 , Turkey
- Section of Pulmonary and Critical Care Medicine , The University of Chicago , Chicago , IL 60637 , USA
| | - Tunca Doğan
- Department of Computer Engineering , Hacettepe University , Ankara , 06800 , Turkey . ; Tel: +903122977193/117
- Institute of Informatics , Hacettepe University , Ankara , 06800 , Turkey
| |
Collapse
|
24
|
Sureyya Rifaioglu A, Doğan T, Jesus Martin M, Cetin-Atalay R, Atalay V. DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks. Sci Rep 2019; 9:7344. [PMID: 31089211 PMCID: PMC6517386 DOI: 10.1038/s41598-019-43708-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 04/27/2019] [Indexed: 01/22/2023] Open
Abstract
Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the 'biofilm formation process' in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred .
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, METU, Ankara, 06800, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay, 31200, Turkey
| | - Tunca Doğan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK.
- KanSiL, Department of Health Informatics, Graduate School of Informatics, METU, Ankara, 06800, Turkey.
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Rengul Cetin-Atalay
- KanSiL, Department of Health Informatics, Graduate School of Informatics, METU, Ankara, 06800, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, METU, Ankara, 06800, Turkey.
- KanSiL, Department of Health Informatics, Graduate School of Informatics, METU, Ankara, 06800, Turkey.
| |
Collapse
|
25
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
26
|
Dalkiran A, Rifaioglu AS, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinformatics 2018; 19:334. [PMID: 30241466 PMCID: PMC6150975 DOI: 10.1186/s12859-018-2368-y] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 09/10/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. RESULTS In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study. CONCLUSIONS ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred . ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html .
Collapse
Affiliation(s)
- Alperen Dalkiran
- Department of Computer Engineering, Middle East Technical University, 06800 Ankara, Turkey
- Department of Computer Engineering, Adana Science and Technology University, 01250 Adana, Turkey
| | - Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Middle East Technical University, 06800 Ankara, Turkey
- Department of Computer Engineering, Iskenderun Technical University, Hatay, 31200 İskenderun, Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD UK
| | - Rengul Cetin-Atalay
- KanSiL, Graduate School of Informatics, Middle East Technical University, 06800 Ankara, Turkey
- Graduate School of Informatics, Middle East Technical University, 06800 Ankara, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, 06800 Ankara, Turkey
- KanSiL, Graduate School of Informatics, Middle East Technical University, 06800 Ankara, Turkey
| | - Tunca Doğan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD UK
- KanSiL, Graduate School of Informatics, Middle East Technical University, 06800 Ankara, Turkey
- Graduate School of Informatics, Middle East Technical University, 06800 Ankara, Turkey
| |
Collapse
|
27
|
Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018; 14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Collapse
Affiliation(s)
- Meenakshi S Iyer
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, Karnataka 560 065, India.
| | | | | |
Collapse
|
28
|
Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018; 6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open
Abstract
Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provides researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relationships between biomedical entities by automatically mapping phenotypic abnormality defining HPO terms with biomolecular function defining GO terms, where each association indicates the occurrence of the abnormality due to the loss of the biomolecular function expressed by the corresponding GO term. The proposed HPO2GO mappings were extracted by calculating the frequency of the co-annotations of the terms on the same genes/proteins, using already existing curated HPO and GO annotation sets. This was followed by the filtering of the unreliable mappings that could be observed due to chance, by statistical resampling of the co-occurrence similarity distributions. Furthermore, the biological relevance of the finalized mappings were discussed over selected cases, using the literature. The resulting HPO2GO mappings can be employed in different settings to predict and to analyse novel gene/protein—ontology term—disease relations. As an application of the proposed approach, HPO term—protein associations (i.e., HPO2protein) were predicted. In order to test the predictive performance of the method on a quantitative basis, and to compare it with the state-of-the-art, CAFA2 challenge HPO prediction target protein set was employed. The results of the benchmark indicated the potential of the proposed approach, as HPO2GO performance was among the best (Fmax = 0.35). The automated cross ontology mapping approach developed in this work may be extended to other ontologies as well, to identify unexplored relation patterns at the systemic level. The datasets, results and the source code of HPO2GO are available for download at: https://github.com/cansyl/HPO2GO.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Cancer Systems Biology Laboratory (KanSiL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
29
|
Rifaioglu AS, Doğan T, Saraç ÖS, Ersahin T, Saidi R, Atalay MV, Martin MJ, Cetin-Atalay R. Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants. Proteins 2017; 86:135-151. [PMID: 29098713 DOI: 10.1002/prot.25416] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Revised: 10/24/2017] [Accepted: 11/01/2017] [Indexed: 12/24/2022]
Abstract
Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at http://cansyl.metu.edu.tr/UniGOPred.html.
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Middle East Technical University, Ankara, 06800, Turkey.,Department of Computer Engineering, İskenderun Technical University, Hatay, 31200, Turkey
| | - Tunca Doğan
- Protein Function Development Team, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom.,CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Ömer Sinan Saraç
- Department of Computer Engineering, Istanbul Technical University, İstanbul, 34467, Turkey
| | - Tulin Ersahin
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Rabie Saidi
- Protein Function Development Team, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Mehmet Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, 06800, Turkey
| | - Maria Jesus Martin
- Protein Function Development Team, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Rengul Cetin-Atalay
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| |
Collapse
|
30
|
Koehorst JJ, Saccenti E, Schaap PJ, Martins Dos Santos VAP, Suarez-Diez M. Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics. F1000Res 2016; 5:1987. [PMID: 27703668 PMCID: PMC5031134 DOI: 10.12688/f1000research.9416.3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/26/2017] [Indexed: 11/20/2022] Open
Abstract
A functional comparative genome analysis is essential to understand the mechanisms underlying bacterial evolution and adaptation. Detection of functional orthologs using standard global sequence similarity methods faces several problems; the need for defining arbitrary acceptance thresholds for similarity and alignment length, lateral gene acquisition and the high computational cost for finding bi-directional best matches at a large scale. We investigated the use of protein domain architectures for large scale functional comparative analysis as an alternative method. The performance of both approaches was assessed through functional comparison of 446 bacterial genomes sampled at different taxonomic levels. We show that protein domain architectures provide a fast and efficient alternative to methods based on sequence similarity to identify groups of functionally equivalent proteins within and across taxonomic boundaries, and it is suitable for large scale comparative analysis. Running both methods in parallel pinpoints potential functional adaptations that may add to bacterial fitness.
Collapse
Affiliation(s)
- Jasper J Koehorst
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| | - Peter J Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| | - Vitor A P Martins Dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands.,LifeGlimmer GmBH, Berlin, Germany
| | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|