1
|
Denger A, Helms V. Identifying optimal substrate classes of membrane transporters. PLoS One 2024; 19:e0315330. [PMID: 39700222 DOI: 10.1371/journal.pone.0315330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 11/22/2024] [Indexed: 12/21/2024] Open
Abstract
Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.
Collapse
Affiliation(s)
- Andreas Denger
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| |
Collapse
|
2
|
Delihas N. Evolution of a Human-Specific De Novo Open Reading Frame and Its Linked Transcriptional Silencer. Int J Mol Sci 2024; 25:3924. [PMID: 38612733 PMCID: PMC11011693 DOI: 10.3390/ijms25073924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/23/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
In the human genome, two short open reading frames (ORFs) separated by a transcriptional silencer and a small intervening sequence stem from the gene SMIM45. The two ORFs show different translational characteristics, and they also show divergent patterns of evolutionary development. The studies presented here describe the evolution of the components of SMIM45. One ORF consists of an ultra-conserved 68 amino acid (aa) sequence, whose origins can be traced beyond the evolutionary age of divergence of the elephant shark, ~462 MYA. The silencer also has ancient origins, but it has a complex and divergent pattern of evolutionary formation, as it overlaps both at the 68 aa ORF and the intervening sequence. The other ORF consists of 107 aa. It develops during primate evolution but is found to originate de novo from an ancestral non-coding genomic region with root origins within the Afrothere clade of placental mammals, whose evolutionary age of divergence is ~99 MYA. The formation of the complete 107 aa ORF during primate evolution is outlined, whereby sequence development is found to occur through biased mutations, with disruptive random mutations that also occur but lead to a dead-end. The 107 aa ORF is of particular significance, as there is evidence to suggest it is a protein that may function in human brain development. Its evolutionary formation presents a view of a human-specific ORF and its linked silencer that were predetermined in non-primate ancestral species. The genomic position of the silencer offers interesting possibilities for the regulation of transcription of the 107 aa ORF. A hypothesis is presented with respect to possible spatiotemporal expression of the 107 aa ORF in embryonic tissues.
Collapse
Affiliation(s)
- Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
3
|
Degli-Innocenti F, Breton T, Chinaglia S, Esposito E, Pecchiari M, Pennacchio A, Pischedda A, Tosin M. Microorganisms that produce enzymes active on biodegradable polyesters are ubiquitous. Biodegradation 2023; 34:489-518. [PMID: 37354274 DOI: 10.1007/s10532-023-10031-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/30/2023] [Indexed: 06/26/2023]
Abstract
Biodegradability standards measure ultimate biodegradation of polymers by exposing the material under test to a natural microbial inoculum. Available tests developed by the International Organization for Standardization (ISO) use inoculums sampled from different environments e.g. soil, marine sediments, seawater. Understanding whether each inoculum is to be considered as microbially unique or not can be relevant for the interpretation of tests results. In this review, we address this question by consideration of the following: (i) the chemical nature of biodegradable plastics (virtually all biodegradable plastics are polyesters) (ii) the diffusion of ester bonds in nature both in simple molecules and in polymers (ubiquitous); (iii) the diffusion of decomposers capable of producing enzymes, called esterases, which accelerate the hydrolysis of esters, including polyesters (ubiquitous); (iv) the evidence showing that synthetic polyesters can be depolymerized by esterases (large and growing); (v) the evidence showing that these esterases are ubiquitous (growing and confirmed by bioinformatics studies). By combining the relevant available facts it can be concluded that if a certain polyester shows ultimate biodegradation when exposed to a natural inoculum, it can be considered biodegradable and need not be retested using other inoculums. Obviously, if the polymer does not show ultimate biodegradation it must be considered recalcitrant, until proven otherwise.
Collapse
Affiliation(s)
| | - Tony Breton
- Novamont S.p.A., via Fauser 8, 28100, Novara, Italy
| | | | | | | | | | | | | |
Collapse
|
4
|
Horsfield ST, Tonkin-Hill G, Croucher NJ, Lees JA. Accurate and fast graph-based pangenome annotation and clustering with ggCaller. Genome Res 2023; 33:1622-1637. [PMID: 37620118 PMCID: PMC10620059 DOI: 10.1101/gr.277733.123] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/18/2023] [Indexed: 08/26/2023]
Abstract
Bacterial genomes differ in both gene content and sequence mutations, which underlie extensive phenotypic diversity, including variation in susceptibility to antimicrobials or vaccine-induced immunity. To identify and quantify important variants, all genes within a population must be predicted, functionally annotated, and clustered, representing the "pangenome." Despite the volume of genome data available, gene prediction and annotation are currently conducted in isolation on individual genomes, which is computationally inefficient and frequently inconsistent across genomes. Here, we introduce the open-source software graph-gene-caller (ggCaller). ggCaller combines gene prediction, functional annotation, and clustering into a single workflow using population-wide de Bruijn graphs, removing redundancy in gene annotation and resulting in more accurate gene predictions and orthologue clustering. We applied ggCaller to simulated and real-world bacterial data sets containing hundreds or thousands of genomes, comparing it to current state-of-the-art tools. ggCaller has considerable speed-ups with equivalent or greater accuracy, particularly with data sets containing complex sources of error, such as assembly contamination or fragmentation. ggCaller is also an important extension to bacterial genome-wide association studies, enabling querying of annotated graphs for functional analyses. We highlight this application by functionally annotating DNA sequences with significant associations to tetracycline and macrolide resistance in Streptococcus pneumoniae, identifying key resistance determinants that were missed when using only a single reference genome. ggCaller is a novel bacterial genome analysis tool with applications in bacterial evolution and epidemiology.
Collapse
Affiliation(s)
- Samuel T Horsfield
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom;
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Gerry Tonkin-Hill
- Department of Biostatistics, University of Oslo, Blindern, 0372 Oslo, Norway
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom
| | - John A Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
5
|
Fettach S, Thari FZ, Hafidi Z, Karrouchi K, Bouathmany K, Cherrah Y, El Achouri M, Benbacer L, El Mzibri M, Sefrioui H, Bougrin K, Faouzi MEA. Biological, toxicological and molecular docking evaluations of isoxazoline-thiazolidine-2,4-dione analogues as new class of anti-hyperglycemic agents. J Biomol Struct Dyn 2023; 41:1072-1084. [PMID: 34957934 DOI: 10.1080/07391102.2021.2017348] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In this work, three isoxazoline-thiazolidine-2,4-dione derivatives were synthesized and characterized by FT-IR, 1H-NMR, 13C-NMR and ESI-MS spectrometry. All compounds have been investigated for their α-amylase and α-glucosidase inhibitory activities. In vitro enzymatic evaluation revealed that all compounds were inhibitory potent against α-glucosidase with IC50 values varied from 40.67 ± 1.81 to 92.54 ± 0.43 µM, and α-amylase with IC50 in the range of 07.01 ± 0.02 to 75.10 ± 1.06 µM. One of the tested compounds were found to be more potent inhibitor compared to other compounds and standard drug Acarbose (IC50 glucosidase= 97.12 ± 0.35 µM and IC50 amylase= 2.97 ± 0.01 μM). All compounds were then evaluated for their acute toxicity in vivo and shown their safety at a high dose with LD > 2000mg/kg BW. A cell-based toxicity evaluation was performed to determine the safety of compounds on liver cells, using the MTT assay against HepG2 cells, and the results shown that all compounds have non-toxic impact against cell viability and proliferation compared to reference drug (Pioglitazone). Furthermore, the molecular homology analysis, SAR and the molecular binding properties of compound with the active site of α-amylase and α-glucosidase were confirmed through computational analysis. This study has identified the inhibitory potential of a new class of synthesized isoxazoline-thiazolidine-2,4-dione derivatives in controlling both hyperglycemia and type 2 diabetes mellitus without any hepatic toxicity.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Saad Fettach
- Laboratory of Pharmacology and Toxicology, Biopharmaceutical and Toxicological Analysis Research Team, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| | - Fatima Zahra Thari
- Equipe de Chimie des Plantes et de Synthèse Organique et Bioorganique, URAC23, Faculty of Science, B.P. 1014, Geophysics, Natural Patrimony and Green Chemistry (GEOPAC) Research Center, Mohammed V University in Rabat, Rabat, Morocco
| | - Zakaria Hafidi
- Department of Surfactants and Nanobiotechnology, IQAC-CSIC, c/Jordi Girona, Barcelona, Spain
| | - Khalid Karrouchi
- Laboratory of Analytical Chemistry and Bromatology, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| | - Kaoutar Bouathmany
- Biology and Molecular Research Unit, Department of Life Sciences, National Center for Energy, Nuclear Science and Technology (CNESTEN), Rabat, Morocco
| | - Yahia Cherrah
- Laboratory of Pharmacology and Toxicology, Biopharmaceutical and Toxicological Analysis Research Team, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| | - Mohammed El Achouri
- Laboratoire de Physico-Chimie des Matériaux Inorganiques et Organiques, Centre des Sciences des Matériaux, Ecole Normale Supérieure-Rabat, Mohammed V University, Rabat, Morocco
| | - Laila Benbacer
- Biology and Molecular Research Unit, Department of Life Sciences, National Center for Energy, Nuclear Science and Technology (CNESTEN), Rabat, Morocco
| | - Mohammed El Mzibri
- Biology and Molecular Research Unit, Department of Life Sciences, National Center for Energy, Nuclear Science and Technology (CNESTEN), Rabat, Morocco
| | - Hassan Sefrioui
- Moroccan Foundation for Science, Innovation & Research (MAScIR), Centre de Biotechnologie Médicale, Rabat, Morocco
| | - Khalid Bougrin
- Equipe de Chimie des Plantes et de Synthèse Organique et Bioorganique, URAC23, Faculty of Science, B.P. 1014, Geophysics, Natural Patrimony and Green Chemistry (GEOPAC) Research Center, Mohammed V University in Rabat, Rabat, Morocco.,Chemical and Biochemical Sciences Green Process Engineering (CBS), Mohammed VI Polytechnic University (UM6P), Benguerir, Morocco
| | - My El Abbes Faouzi
- Laboratory of Pharmacology and Toxicology, Biopharmaceutical and Toxicological Analysis Research Team, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, Rabat, Morocco
| |
Collapse
|
6
|
Llinares-López F, Berthet Q, Blondel M, Teboul O, Vert JP. Deep embedding and alignment of protein sequences. Nat Methods 2023; 20:104-111. [PMID: 36522501 DOI: 10.1038/s41592-022-01700-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 10/24/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence alignment is a key component of most bioinformatics pipelines to study the structures and functions of proteins. Aligning highly divergent sequences remains, however, a difficult task that current algorithms often fail to perform accurately, leaving many proteins or open reading frames poorly annotated. Here we leverage recent advances in deep learning for language modeling and differentiable programming to propose DEDAL (deep embedding and differentiable alignment), a flexible model to align protein sequences and detect homologs. DEDAL is a machine learning-based model that learns to align sequences by observing large datasets of raw protein sequences and of correct alignments. Once trained, we show that DEDAL improves by up to two- or threefold the alignment correctness over existing methods on remote homologs and better discriminates remote homologs from evolutionarily unrelated sequences, paving the way to improvements on many downstream tasks relying on sequence alignment in structural and functional genomics.
Collapse
|
7
|
Denger A, Helms V. Optimized Data Set and Feature Construction for Substrate Prediction of Membrane Transporters. J Chem Inf Model 2022; 62:6242-6257. [PMID: 36454173 DOI: 10.1021/acs.jcim.2c00850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
α-Helical transmembrane proteins termed membrane transporters mediate the passage of small hydrophilic substrate molecules across biological lipid bilayer membranes. Annotating the specific substrates of the dozens to hundreds of individual transporters of an organism is an important task. In the past, machine learning classifiers have been successfully trained on pan-organism data sets to predict putative substrates of transporters. Here, we critically examine the selection of an optimal data set of protein sequence features for the classification task. We focus on membrane transporters of the three model organisms Escherichia coli, Arabidopsis thaliana, and Saccharomyces cerevisiae, as well as human. We show that organism-specific classifiers can be robustly trained if at least 20 samples are available for each substrate class. If information from position-specific scoring matrices is included, such classifiers have F1 scores between 0.85 and 1.00. For the largest data set (A. thaliana), a 4-class classifier yielded an F-score of 0.97. On a pan-organism data set composed of transporters of all four organisms, amino acid and sugar transporters were predicted with an F1 score of 0.91.
Collapse
Affiliation(s)
- Andreas Denger
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| | - Volkhard Helms
- Center for Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany
| |
Collapse
|
8
|
Sengupta S, Azad RK. Reconstructing horizontal gene flow network to understand prokaryotic evolution. Open Biol 2022; 12:220169. [PMID: 36446404 PMCID: PMC9708380 DOI: 10.1098/rsob.220169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K. Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
9
|
Sengupta S, Azad RK. Reconstructing horizontal gene flow network to understand prokaryotic evolution. Open Biol 2022. [PMID: 36446404 DOI: 10.6084/m9.figshare.c.6307519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA.,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
10
|
Joint inference of ancestry and genotypes of parents from children. iScience 2022; 25:104768. [PMID: 35942102 PMCID: PMC9356179 DOI: 10.1016/j.isci.2022.104768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 05/18/2022] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
In this paper, we address a problem: can we perform ancestry inference for parents from one or more children’s DNA samples? That is, suppose the parents’ genomes consist of segments of different ancestry, and our goal is inferring parental ancestry and at the same time, calling parental genotypes from given children’s genetic data. Such ancestry inference may provide insights into recent ancestors from children’s genomes, and potentially has applications in understanding genetic traits. At present, there exists no method for this inference problem. We present parMix, a method based on hidden Markov model (HMM) that can jointly infer parental ancestry and call parental genotypes from data of a small number of children. Simulation results show that parMix performs well in practice. It can provide reasonably accurate parental inference given data from a small number (say three) of children. parMix becomes more accurate when data from more children are used. Presented a method for inferring ancestry and genotypes of parents from children Recombination events can be detected using parMix parMix can deal with the genotypes with phasing errors parMix can be used to infer admixture proportion of parents
Collapse
|
11
|
Gorbalenya AE, Anisimova M. Editorial overview: Virus bioinformatics - empowering genomics of pathogens, viromes, and the virosphere across divergence scales. Curr Opin Virol 2022; 52:161-165. [PMID: 34942540 DOI: 10.1016/j.coviro.2021.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Alexander E Gorbalenya
- Department of Medical Microbiology, Leiden University Medical Center, Leiden, The Netherlands; Faculty of Bioengineering and Bioinformatics and Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia.
| | - Maria Anisimova
- Institute of Applied Simulation, Zurich University of Applied Sciences, ZHAW Wädenswil, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
12
|
Garcia AK, Fer E, Sephus C, Kacar B. An Integrated Method to Reconstruct Ancient Proteins. Methods Mol Biol 2022; 2569:267-281. [PMID: 36083453 DOI: 10.1007/978-1-0716-2691-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Proteins have played a fundamental role throughout life's history on Earth. Despite their biological importance, ancient origin, early function, and evolution of proteins are seldom able to be directly studied because few of these attributes are preserved across geologic timescales. Ancestral sequence reconstruction (ASR) provides a method to infer ancestral amino acid sequences and determine the evolutionary predecessors of modern-day proteins using phylogenetic tools. Laboratory application of ASR allows ancient sequences to be deduced from genetic information available in extant organisms and then experimentally resurrected to elucidate ancestral characteristics. In this article, we provide a generalized, stepwise protocol that considers the major elements of a well-designed ASR study and details potential sources of reconstruction bias that can reduce the relevance of historical inferences. We underscore key stages in our approach so that it may be broadly utilized to reconstruct the evolutionary histories of proteins.
Collapse
Affiliation(s)
- Amanda K Garcia
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Evrim Fer
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, USA
| | - Cathryn Sephus
- Scripps Institution of Oceanography, University of California at San Diego, La Jolla, CA, USA
| | - Betul Kacar
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
13
|
Fogha J, Bayry J, Diharce J, de Brevern AG. Structural and evolutionary exploration of the IL-3 family and its alpha subunit receptors. Amino Acids 2021; 53:1211-1227. [PMID: 34196789 DOI: 10.1007/s00726-021-03026-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/21/2021] [Indexed: 12/14/2022]
Abstract
Interleukin-3 (IL-3) is a cytokine belonging to the family of common β (βc) and is involved in various biological systems. Its activity is mediated by the interaction with its receptor (IL-3R), a heterodimer composed of two distinct subunits: IL-3Rα and βc. IL-3 and its receptor, especially IL-3Rα, play a crucial role in pathologies like inflammatory diseases and therefore are interesting therapeutic targets. Here, we have performed an analysis of these proteins and their interaction based on structural and evolutionary information. We highlighted that IL-3 and IL-3Rα structural architectures are conserved across evolution and shared with other proteins belonging to the same βc family interleukin-5 (IL-5) and granulocyte-macrophage colony-stimulating factor (GM-CSF). The IL-3Rα/IL-3 interaction is mediated by a large interface in which most residues are surprisingly not conserved during evolution and across family members. In spite of this high variability, we suggested small regions constituted by few residues conserved during the evolution in both proteins that could be important for the binding affinity.
Collapse
Affiliation(s)
- Jade Fogha
- UMR_S 1134, DSIMB, Université de Paris, Inserm, Biologie Intégrée du Globule Rouge, 75739, Paris, France
- Institut National de La Transfusion Sanguine (INTS), 75739, Paris, France
- Laboratoire D'Excellence GR-Ex, 75739, Paris, France
| | - Jagadeesh Bayry
- Centre de Recherche Des Cordeliers, Institut National de La Santé Et de La Recherche Médicale, Sorbonne Université, Université de Paris, 75006, Paris, France
- Indian Institute of Technology Palakkad, Kozhippara, Palakkad, 678 557, India
| | - Julien Diharce
- UMR_S 1134, DSIMB, Université de Paris, Inserm, Biologie Intégrée du Globule Rouge, 75739, Paris, France.
- Institut National de La Transfusion Sanguine (INTS), 75739, Paris, France.
- Laboratoire D'Excellence GR-Ex, 75739, Paris, France.
| | - Alexandre G de Brevern
- UMR_S 1134, DSIMB, Université de Paris, Inserm, Biologie Intégrée du Globule Rouge, 75739, Paris, France.
- Institut National de La Transfusion Sanguine (INTS), 75739, Paris, France.
- Laboratoire D'Excellence GR-Ex, 75739, Paris, France.
- UMR_S 1134, DSIMB, Université de La Réunion, Inserm, Biologie Intégrée du Globule Rouge, La Réunion, 97744, Saint-Denis, France.
| |
Collapse
|
14
|
Hafidi Z, El Achouri M, O Sousa FF, Pérez L. Antifungal activity of amino-alcohols based cationic surfactants and in silico, homology modeling, docking and molecular dynamics studies against lanosterol 14-α-demethylase enzyme. J Biomol Struct Dyn 2021; 40:7762-7778. [PMID: 33754947 DOI: 10.1080/07391102.2021.1902396] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Fungi are being responsible for causing serious infections in humans and animals. The opportunistic microorganisms provoke environmental contaminations in health and storage facilities to represent a serious concern to health security. The present work investigates the antifungal activity of two amino-alcohols based cationic surfactants such as CnEtOH, CnPrOH (with n = 14 and 16 are the carbon numbers of alkyl chain and EtOH = Ethanol and PrOH = Propanol) against a collection of different Candida species (Candida tropicalis, Candida albicans, Candida auris, Cyberlindnera jadinii, Candida parapsilosis, Candida glabrata and Candida rugosa) respectively. The amino-alcohols based cationic surfactants exhibited good antifungal activity against all Candida strains tested with minimum inhibitory concentrations (MIC) ranging from 0.002 to 0.30 mM. The MIC evaluation shows an increase as a function of the hydrophobicity of all inhibitors against the majority of the Candida strains tested. The different location of the alcoholic OH function in the polar head shows the influence on the availability of N+ responsible for electrostatic interactions with the candidate's cell walls, which remains a very important step in the mode of action of quaternary ammonium cationic surfactants. Hence, a 3D structure of lanosterol 14-α-demethylase enzyme from C. auris was constructed by homology modeling using an online SWISS-MODEL server. The predicted model was analyzed by serval servers. Furthermore, a molecular docking study was carried out to better understand the binding mechanism of lanosterol homologous protein with surfactant ligands. Then, the docked complexes lanosterol-surfactants were refined by the molecular dynamic simulation to analyze their interaction behavior during the simulation.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zakaria Hafidi
- Laboratoire de Physico-Chimie des Matériaux Inorganiques et Organiques, Ecole Normale supérieure-Rabat, Mohammed V University in Rabat, Centre des Sciences des Matériaux, Rabat, Morocco.,Surfactants and Nanobiotechnology Department, IQAC, CSIC, Barcelona, Spain
| | - Mohammed El Achouri
- Laboratoire de Physico-Chimie des Matériaux Inorganiques et Organiques, Ecole Normale supérieure-Rabat, Mohammed V University in Rabat, Centre des Sciences des Matériaux, Rabat, Morocco
| | - Francisco F O Sousa
- Surfactants and Nanobiotechnology Department, IQAC, CSIC, Barcelona, Spain.,Graduate Program on Pharmaceutical Innovation, Department of Biological & Health Sciences, Federal University of Amapa, Rodovia Juscelino Kubitschek, Macapa, Amapá, Brazil
| | - Lourdes Pérez
- Surfactants and Nanobiotechnology Department, IQAC, CSIC, Barcelona, Spain
| |
Collapse
|
15
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
16
|
Huang TC, Fischer WB. Sequence–function correlation of the transmembrane domains in NS4B of HCV using a computational approach. AIMS BIOPHYSICS 2021. [DOI: 10.3934/biophy.2021013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
17
|
Urban G, Torrisi M, Magnan CN, Pollastri G, Baldi P. Protein profiles: Biases and protocols. Comput Struct Biotechnol J 2020; 18:2281-2289. [PMID: 32994887 PMCID: PMC7486441 DOI: 10.1016/j.csbj.2020.08.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 08/14/2020] [Accepted: 08/15/2020] [Indexed: 11/13/2022] Open
Abstract
The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu) and can be downloaded at: www.download.igb.uci.edu/#evalpro.
Collapse
Affiliation(s)
- Gregor Urban
- Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Mirko Torrisi
- UCD Institute for Discovery, University College Dublin, Dublin, 4, Ireland
| | - Christophe N Magnan
- Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Gianluca Pollastri
- UCD Institute for Discovery, University College Dublin, Dublin, 4, Ireland
| | - Pierre Baldi
- Department of Computer Science & Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| |
Collapse
|
18
|
Ferruz N, Lobos F, Lemm D, Toledo-Patino S, Farías-Rico JA, Schmidt S, Höcker B. Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design. J Mol Biol 2020; 432:3898-3914. [PMID: 32330481 PMCID: PMC7322520 DOI: 10.1016/j.jmb.2020.04.013] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/15/2022]
Abstract
Natural evolution has generated an impressively diverse protein universe via duplication and recombination from a set of protein fragments that served as building blocks. The application of these concepts to the design of new proteins using subdomain-sized fragments from different folds has proven to be experimentally successful. To better understand how evolution has shaped our protein universe, we performed an all-against-all comparison of protein domains representing all naturally existing folds and identified conserved homologous protein fragments. Overall, we found more than 1000 protein fragments of various lengths among different folds through similarity network analysis. These fragments are present in very different protein environments and represent versatile building blocks for protein design. These data are available in our web server called F(old P)uzzle (fuzzle.uni-bayreuth.de), which allows to individually filter the dataset and create customized networks for folds of interest. We believe that our results serve as an invaluable resource for structural and evolutionary biologists and as raw material for the design of custom-made proteins.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dominik Lemm
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Saacnicteh Toledo-Patino
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | - Steffen Schmidt
- Max Planck Institute for Developmental Biology, Tübingen, Germany; Computational Biochemistry, University of Bayreuth, Bayreuth, Germany.
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany.
| |
Collapse
|
19
|
Vakirlis N, Carvunis AR, McLysaght A. Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes. eLife 2020; 9:e53500. [PMID: 32066524 PMCID: PMC7028367 DOI: 10.7554/elife.53500] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 01/07/2020] [Indexed: 12/20/2022] Open
Abstract
The origin of 'orphan' genes, species-specific sequences that lack detectable homologues, has remained mysterious since the dawn of the genomic era. There are two dominant explanations for orphan genes: complete sequence divergence from ancestral genes, such that homologues are not readily detectable; and de novo emergence from ancestral non-genic sequences, such that homologues genuinely do not exist. The relative contribution of the two processes remains unknown. Here, we harness the special circumstance of conserved synteny to estimate the contribution of complete divergence to the pool of orphan genes. By separately comparing yeast, fly and human genes to related taxa using conservative criteria, we find that complete divergence accounts, on average, for at most a third of eukaryotic orphan and taxonomically restricted genes. We observe that complete divergence occurs at a stable rate within a phylum but at different rates between phyla, and is frequently associated with gene shortening akin to pseudogenization.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Smurfit Institute of GeneticsTrinity College Dublin, University of DublinDublinIreland
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of MedicineUniversity of PittsburghPittsburghUnited States
| | - Aoife McLysaght
- Smurfit Institute of GeneticsTrinity College Dublin, University of DublinDublinIreland
| |
Collapse
|
20
|
Jain A, Perisa D, Fliedner F, von Haeseler A, Ebersberger I. The Evolutionary Traceability of a Protein. Genome Biol Evol 2019; 11:531-545. [PMID: 30649284 PMCID: PMC6394115 DOI: 10.1093/gbe/evz008] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2019] [Indexed: 12/12/2022] Open
Abstract
Orthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. However, the similarity between orthologs decays with time, and ultimately it becomes insufficient to infer common ancestry. This leaves ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the “evolutionary traceability” as a measure that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. Using yeast, we show that genes that were thought to date back to the last universal common ancestor are of high traceability. Their functions mostly involve catalysis, ion transport, and ribonucleoprotein complex assembly. In turn, the fraction of yeast genes whose traceability is not sufficient to infer their presence in last universal common ancestor is enriched for regulatory functions. Computing the traceabilities of genes that have been experimentally characterized as being essential for a self-replicating cell reveals that many of the genes that lack orthologs outside bacteria have low traceability. This leaves open whether their orthologs in the eukaryotic and archaeal domains have been overlooked. Looking at the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and nondetection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks. “protTrace,” a software tool for computing evolutionary traceability, is freely available at https://github.com/BIONF/protTrace.git; last accessed February 10, 2019.
Collapse
Affiliation(s)
- Arpit Jain
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Dominik Perisa
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Fabian Fliedner
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University Vienna, Austria.,Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Austria
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Center (BiK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| |
Collapse
|
21
|
Krishnakumar P, Riemer S, Perera R, Lingner T, Goloborodko A, Khalifa H, Bontems F, Kaufholz F, El-Brolosy MA, Dosch R. Functional equivalence of germ plasm organizers. PLoS Genet 2018; 14:e1007696. [PMID: 30399145 PMCID: PMC6219760 DOI: 10.1371/journal.pgen.1007696] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 09/16/2018] [Indexed: 11/18/2022] Open
Abstract
The proteins Oskar (Osk) in Drosophila and Bucky ball (Buc) in zebrafish act as germ plasm organizers. Both proteins recapitulate germ plasm activities but seem to be unique to their animal groups. Here, we discover that Osk and Buc show similar activities during germ cell specification. Drosophila Osk induces additional PGCs in zebrafish. Surprisingly, Osk and Buc do not show homologous protein motifs that would explain their related function. Nonetheless, we detect that both proteins contain stretches of intrinsically disordered regions (IDRs), which seem to be involved in protein aggregation. IDRs are known to rapidly change their sequence during evolution, which might obscure biochemical interaction motifs. Indeed, we show that Buc binds to the known Oskar interactors Vasa protein and nanos mRNA indicating conserved biochemical activities. These data provide a molecular framework for two proteins with unrelated sequence but with equivalent function to assemble a conserved core-complex nucleating germ plasm. Multicellular organisms use gametes for their propagation. Gametes are formed from germ cells, which are specified during embryogenesis in some animals by the inheritance of RNP granules known as germ plasm. Transplantation of germ plasm induces extra germ cells, whereas germ plasm ablation leads to the loss of gametes and sterility. Therefore, germ plasm is key for germ cell formation and reproduction. However, the molecular mechanisms of germ cell specification by germ plasm in the vertebrate embryo remain an unsolved question. Proteins, which assemble the germ plasm, are known as germ plasm organizers. Here, we show that the two germ plasm organizers Oskar from the fly and Bucky ball from the fish show similar functions by using a cross species approach. Both are intrinsically disordered proteins, which rapidly changed their sequence during evolution. Moreover, both proteins still interact with conserved components of the germ cell specification pathway. These data might provide a first example of two proteins with the same biological role, but distinct sequence.
Collapse
Affiliation(s)
- Pritesh Krishnakumar
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Stephan Riemer
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Roshan Perera
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Thomas Lingner
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Alexander Goloborodko
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Hazem Khalifa
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Franck Bontems
- Laboratory of Metabolism, Department of Internal Medicine Specialties, Faculty of Medicine, University of Geneva, Switzerland
| | - Felix Kaufholz
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Mohamed A. El-Brolosy
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
| | - Roland Dosch
- Institute for Developmental Biochemistry, University Medical Center, Göttingen, Germany
- Institute of Human Genetics, University Medical Center, Göttingen, Germany
- * E-mail:
| |
Collapse
|
22
|
Identification of (4-(9H-fluoren-9-yl) piperazin-1-yl) methanone derivatives as falcipain 2 inhibitors active against Plasmodium falciparum cultures. Biochim Biophys Acta Gen Subj 2018; 1862:2911-2923. [PMID: 30253205 DOI: 10.1016/j.bbagen.2018.09.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 09/19/2018] [Accepted: 09/19/2018] [Indexed: 12/22/2022]
Abstract
BACKGROUND Falcipain 2 (FP-2) is the hemoglobin-degrading cysteine protease of Plasmodium falciparum most extensively targeted to develop novel antimalarials. However, no commercial antimalarial drugs based on FP-2 inhibition are available yet due to the low selectivity of most FP-2 inhibitors against the human cysteine proteases. METHODS A structure-based virtual screening (SVBS) using Maybridge HitFinder™ compound database was conducted to identify potential FP-2 inhibitors. In vitro enzymatic and cell-growth inhibition assays were performed for the top-scoring compounds. Docking, molecular dynamics (MD) simulations and free energy calculations were employed to study the interaction of the best hits with FP-2 and other related enzymes. RESULTS AND CONCLUSIONS Two hits based on 4-(9H-fluoren-9-yl) piperazin-1-yl) methanone scaffold, HTS07940 and HTS08262, were identified as inhibitors of FP-2 (half-maximal inhibitory concentration (IC50) = 64 μM and 14.7 μM, respectively) without a detectable inhibition against the human off-target cathepsin K (hCatK). HTS07940 and HTS08262 inhibited the growth of the multidrug-resistant P. falciparum strain FCR3 in culture (half-maximal inhibitory concentrations (IC50) = 2.91 μM and 34 μM, respectively) and exhibited only moderate cytotoxicity against HeLa cells (Half-maximal cytotoxic concentration (CC50) = 133 μM and 350 μM, respectively). Free energy calculations reproduced the experimental affinities of the hits for FP-2 and explained the selectivity with respect to hCatK. GENERAL SIGNIFICANCE To the best of our knowledge, HTS07940 stands among the most selective FP-2 inhibitors identified by SBVS reported so far, displaying moderate antiplasmodial activity and low cytotoxicity against human cells. Hence, this compound constitutes a promising lead for the design of more potent and selective FP-2 inhibitors.
Collapse
|
23
|
Rubio-Largo Á, Vanneschi L, Castelli M, Vega-Rodríguez MA. Multiobjective characteristic-based framework for very-large multiple sequence alignment. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.06.022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Rubio-Largo Á, Castelli M, Vanneschi L, Vega-Rodríguez MA. A Parallel Multiobjective Metaheuristic for Multiple Sequence Alignment. J Comput Biol 2018; 25:1009-1022. [PMID: 29671616 DOI: 10.1089/cmb.2018.0031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The alignment among three or more nucleotides/amino acids sequences at the same time is known as multiple sequence alignment (MSA), a nondeterministic polynomial time (NP)-hard optimization problem. The time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In this work, we deal with a multiobjective version of the MSA problem wherein the goal is to simultaneously optimize the accuracy and conservation of the alignment. A parallel version of the hybrid multiobjective memetic metaheuristics for MSA is proposed. To evaluate the parallel performance of our proposal, we have selected a pull of data sets with different number of sequences (up to 1000 sequences) and study its parallel performance against other well-known parallel metaheuristics published in the literature, such as MSAProbs, tree-based consistency objective function for alignment evaluation (T-Coffee), Clustal [Formula: see text], and multiple alignment using fast Fourier transform (MAFFT). The comparative study reveals that our parallel aligner obtains better results than MSAProbs, T-Coffee, Clustal [Formula: see text], and MAFFT. In addition, the parallel version is around 25 times faster than the sequential version with 32 cores, obtaining an efficiency around 80%.
Collapse
Affiliation(s)
| | - Mauro Castelli
- 1 NOVA IMS, Universidade Nova de Lisboa , Lisbon, Portugal
| | | | - Miguel A Vega-Rodríguez
- 2 Department of Computer and Communications Technologies, University of Extremadura , Caceres, Spain
| |
Collapse
|
25
|
On the Regularities of the Polar Profiles of Proteins Related to Ebola Virus Infection and their Functional Domains. Cell Biochem Biophys 2018; 76:411-431. [PMID: 29511990 PMCID: PMC7090660 DOI: 10.1007/s12013-018-0839-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 02/16/2018] [Indexed: 11/25/2022]
Abstract
The number of fatalities and economic losses caused by the Ebola virus infection across the planet culminated in the havoc that occurred between August and November 2014. However, little is known about the molecular protein profile of this devastating virus. This work represents a thorough bioinformatics analysis of the regularities of charge distribution (polar profiles) in two groups of proteins and their functional domains associated with Ebola virus disease: Ebola virus proteins and Human proteins interacting with Ebola virus. Our analysis reveals that a fragment exists in each of these proteins—one named the “functional domain”—with the polar profile similar to the polar profile of the protein that contains it. Each protein is formed by a group of short sub-sequences, where each fragment has a different and distinctive polar profile and where the polar profile between adjacent short sub-sequences changes orderly and gradually to coincide with the polar profile of the whole protein. When using the charge distribution as a metric, it was observed that it effectively discriminates the proteins from their functional domains. As a counterexample, the same test was applied to a set of synthetic proteins built for that purpose, revealing that any of the regularities reported here for the Ebola virus proteins and human proteins interacting with Ebola virus were not present in the synthetic proteins. Our results indicate that the polar profile of each protein studied and its corresponding functional domain are similar. Thus, when building each protein from its functional domai—adding one amino acid at a time and plotting each time its polar profile—it was observed that the resulting graphs can be divided into groups with similar polar profiles.
Collapse
|
26
|
Rubio-Largo A, Vanneschi L, Castelli M, Vega-Rodriguez MA. A Characteristic-Based Framework for Multiple Sequence Aligners. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:41-51. [PMID: 27831898 DOI: 10.1109/tcyb.2016.2621129] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The multiple sequence alignment is a well-known bioinformatics problem that consists in the alignment of three or more biological sequences (protein or nucleic acid). In the literature, a number of tools have been proposed for dealing with this biological sequence alignment problem, such as progressive methods, consistency-based methods, or iterative methods; among others. These aligners often use a default parameter configuration for all the input sequences to align. However, the default configuration is not always the best choice, the alignment accuracy of the tool may be highly boosted if specific parameter configurations are used, depending on the biological characteristics of the input sequences. In this paper, we propose a characteristic-based framework for multiple sequence aligners. The idea of the framework is, given an input set of unaligned sequences, extract its characteristics and run the aligner with the best parameter configuration found for another set of unaligned sequences with similar characteristics. In order to test the framework, we have used the well-known multiple sequence comparison by log-expectation (MUSCLE) v3.8 aligner with different benchmarks, such as benchmark alignments database v3.0, protein reference alignment benchmark v4.0, and sequence alignment benchmark v1.65. The results shown that the alignment accuracy and conservation of MUSCLE might be greatly improved with the proposed framework, specially in those scenarios with a low percentage of identity. The characteristic-based framework for multiple sequence aligners is freely available for downloading at http://arco.unex.es/arl/fwk-msa/cbf-msa.zip.
Collapse
|
27
|
Rubio-Largo Á, Vanneschi L, Castelli M, Vega-Rodríguez MA. Using biological knowledge for multiple sequence aligner decision making. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.08.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
28
|
Rubio-Largo Á, Vanneschi L, Castelli M, Vega-Rodríguez MA. Reducing Alignment Time Complexity of Ultra-Large Sets of Sequences. J Comput Biol 2017; 24:1144-1154. [PMID: 28686466 DOI: 10.1089/cmb.2017.0097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The alignment of three or more protein or nucleotide sequences is known as Multiple Sequence Alignment problem. The complexity of this problem increases exponentially with the number of sequences; therefore, many of the current approaches published in the literature suffer a computational overhead when thousands of sequences are required to be aligned. We introduce a new approach for dealing with ultra-large sets of sequences. A two-level clustering method is considered. The first level clusters the input sequences by using their biological composition, that is, the number of positive, negative, polar, special, and hydrophobic amino acids. In the second level, each cluster is divided into different clusters according to their similarity. Then, each cluster is aligned by using any method/aligner. After aligning the centroid sequences of each second-level cluster, we extrapolate the new gaps to each cluster of sequences to obtain the final alignment. We present a study on biological data with up to ∼100,000 sequences, showing that the proposed approach is able to obtain accurate alignments in a reduced amount of time; for example, in >10,000 sequences datasets, it is able to reduce up to ∼45 times the required runtime of the well-known Kalign.
Collapse
Affiliation(s)
- Álvaro Rubio-Largo
- 1 Nova Information Management School-NOVA IMS , Universidade Nova de Lisboa, Lisboa, Portugal
| | - Leonardo Vanneschi
- 1 Nova Information Management School-NOVA IMS , Universidade Nova de Lisboa, Lisboa, Portugal
| | - Mauro Castelli
- 1 Nova Information Management School-NOVA IMS , Universidade Nova de Lisboa, Lisboa, Portugal
| | - Miguel A Vega-Rodríguez
- 2 Department of Technologies of Computers and Communications, University of Extremadura , Cáceres, Spain
| |
Collapse
|
29
|
Slavkin HC, Graham E, Zeichner-David M, Hildemann W. ENAMEL-LIKE ANTIGENS IN HAGFISH: POSSIBLE EVOLUTIONARY SIGNIFICANCE. Evolution 2017; 37:404-412. [DOI: 10.1111/j.1558-5646.1983.tb05548.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/1982] [Revised: 04/27/1982] [Indexed: 11/29/2022]
Affiliation(s)
- H. C. Slavkin
- Laboratory for Developmental Biology; Graduate Program in Craniofacial Biology; University Park Los Angeles California 90007
- Department of Biochemistry, School of Dentistry; University of Southern California; University Park Los Angeles California 90007
| | - Edward Graham
- Laboratory for Developmental Biology; Graduate Program in Craniofacial Biology; University Park Los Angeles California 90007
- Department of Biochemistry, School of Dentistry; University of Southern California; University Park Los Angeles California 90007
| | - Margarita Zeichner-David
- Laboratory for Developmental Biology; Graduate Program in Craniofacial Biology; University Park Los Angeles California 90007
- Department of Biochemistry, School of Dentistry; University of Southern California; University Park Los Angeles California 90007
| | - William Hildemann
- Laboratory for Developmental Biology; Graduate Program in Craniofacial Biology; University Park Los Angeles California 90007
- Department of Biochemistry, School of Dentistry; University of Southern California; University Park Los Angeles California 90007
- Dental Research Institute; University of California; Los Angeles California 90024
| |
Collapse
|
30
|
Thomas RDK, Reif WE. THE SKELETON SPACE: A FINITE SET OF ORGANIC DESIGNS. Evolution 2017; 47:341-360. [DOI: 10.1111/j.1558-5646.1993.tb02098.x] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/1992] [Accepted: 08/25/1992] [Indexed: 11/29/2022]
Affiliation(s)
- R. D. K. Thomas
- Department of Geosciences; Franklin and Marshall College; P.O. Box 3003 Lancaster PA 17604-3003 USA
| | - W.-E. Reif
- Institut für Geologie und Paläontologie, Universität Tübingen; Sigwartstrasse 10, 7400 Tübingen 1 GERMANY
| |
Collapse
|
31
|
Ohta T. FURTHER SIMULATION STUDIES ON EVOLUTION BY GENE DUPLICATION. Evolution 2017; 42:375-386. [PMID: 28567848 DOI: 10.1111/j.1558-5646.1988.tb04140.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/1986] [Accepted: 09/30/1987] [Indexed: 11/29/2022]
Abstract
In order to understand the origin of multigene families, Monte Carlo simulations were performed to see how a genetic system evolves under unequal crossing-over, mutation, random genetic drift and natural selection, starting from a single gene copy. Both haploid and diploid models were examined. Beneficial, neutral, and detrimental mutations were incorporated, and "positive" selection favors those chromosomes (haploid) or individuals (diploid) with more beneficial mutations than others. The same model for haploids was previously investigated with special reference to the evolution of gene organization, and the ratio of the numbers of beneficial genes to pseudogenes was found to be a rough indicator of the relative strengths of positive and negative (against deleterious alleles) natural selection (Ohta, 1987b). In the present paper, the evolution of gene organization and of sequence divergence among genes in the multigene family is examined. It is shown that positive selection accelerates the accumulation of arrays containing different beneficial mutations, but that total divergence including both neutral and beneficial mutations is not very sensitive to positive selection, under this model. The proportion of beneficial mutations in the total mutations accumulated is a better indicator of positive selection than is the total divergence. It is pointed out that various observed examples in which amino-acid substitutions are accelerated, as compared with synonymous substitutions in duplicated genes (Li, 1985), may reflect the effect of selection similar to the present scheme. The diploid model is shown to be more efficient for accumulating beneficial mutations in duplicated genes than the haploid one, and the relevance of this finding to the advantage of sexual reproduction is discussed.
Collapse
Affiliation(s)
- Tomoko Ohta
- National Institute of Genetics, Mishima 411, Japan
| |
Collapse
|
32
|
Bazylinski DA, Morillo V, Lefèvre CT, Viloria N, Dubbels BL, Williams TJ. Endothiovibrio diazotrophicus gen. nov., sp. nov., a novel nitrogen-fixing, sulfur-oxidizing gammaproteobacterium isolated from a salt marsh. Int J Syst Evol Microbiol 2017; 67:1491-1498. [DOI: 10.1099/ijsem.0.001743] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Dennis A Bazylinski
- School of Life Sciences, University of Nevada at Las Vegas, Las Vegas, Nevada 89154-4004, USA
| | - Viviana Morillo
- School of Life Sciences, University of Nevada at Las Vegas, Las Vegas, Nevada 89154-4004, USA
| | - Christopher T Lefèvre
- CEA Cadarache/CNRS/Université Aix-Marseille, UMR7265 Biosciences and Biotechnologies Institute, Laboratoire de Bioénergétique Cellulaire, Saint Paul lez Durance 13108, France
| | - Nathan Viloria
- School of Life Sciences, University of Nevada at Las Vegas, Las Vegas, Nevada 89154-4004, USA
| | - Bradley L Dubbels
- Novozymes North America Inc., 9000 Development Drive, Morrisville, North Carolina 27560, USA
| | - Timothy J Williams
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney NSW 2052, Australia
| |
Collapse
|
33
|
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification. J Virol 2017; 91:JVI.02275-16. [PMID: 28122979 PMCID: PMC5375668 DOI: 10.1128/jvi.02275-16] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 01/13/2017] [Indexed: 11/20/2022] Open
Abstract
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids.
Collapse
|
34
|
Shi J, Chen WF, Zhang B, Fan SH, Ai X, Liu NN, Rety S, Xi XG. A helical bundle in the N-terminal domain of the BLM helicase mediates dimer and potentially hexamer formation. J Biol Chem 2017; 292:5909-5920. [PMID: 28228481 DOI: 10.1074/jbc.m116.761510] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Revised: 02/14/2017] [Indexed: 12/11/2022] Open
Abstract
Helicases play a critical role in processes such as replication or recombination by unwinding double-stranded DNA; mutations of these genes can therefore have devastating biological consequences. In humans, mutations in genes of three members of the RecQ family helicases (blm, wrn, and recq4) give rise to three strikingly distinctive clinical phenotypes: Bloom syndrome, Werner syndrome, and Rothmund-Thomson syndrome, respectively. However, the molecular basis for these varying phenotypic outcomes is unclear, in part because a full mechanistic description of helicase activity is lacking. Because the helicase core domains are highly conserved, it has been postulated that functional differences among family members might be explained by significant differences in the N-terminal domains, but these domains are poorly characterized. To help fill this gap, we now describe bioinformatics, biochemical, and structural data for three vertebrate BLM proteins. We pair high resolution crystal structures with SAXS analysis to describe an internal, highly conserved sequence we term the dimerization helical bundle in N-terminal domain (DHBN). We show that, despite the N-terminal domain being loosely structured and potentially lacking a defined three-dimensional structure in general, the DHBN exists as a dimeric structure required for higher order oligomer assembly. Interestingly, the unwinding amplitude and rate decrease as BLM is assembled from dimer into hexamer, and also, the stable DHBN dimer can be dissociated upon ATP hydrolysis. Thus, the structural and biochemical characterizations of N-terminal domains will provide new insights into how the N-terminal domain affects the structural and functional organization of the full BLM molecule.
Collapse
Affiliation(s)
- Jing Shi
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Wei-Fei Chen
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Bo Zhang
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - San-Hong Fan
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xia Ai
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Na-Nv Liu
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Stephane Rety
- the Institut de Biochimie et Chimie des Protéines, CNRS UMR 5086, 7 Passage du Vercors, 69367 Lyon, France, and
| | - Xu-Guang Xi
- From the College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China, .,the Laboratoire de Biologie et Pharmacologie Appliquée, ENS de Cachan, Université Paris-Saclay, CNRS, 61 Avenue du Président Wilson, 94235 Cachan, France
| |
Collapse
|
35
|
Abstract
Recent technological advances in sequencing and high-throughput DNA cloning have resulted in the generation of vast quantities of biological sequence data. Ideally the functions of individual genes and proteins predicted by these methods should be assessed experimentally within the context of a defined hypothesis. However, if no hypothesis is known a priori, or the number of sequences to be assessed is large, bioinformatics techniques may be useful in predicting function.This chapter proposes a pipeline of freely available Web-based tools to analyze protein-coding DNA and peptide sequences of unknown function. Accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function.The following methods are described in detail: 1. Annotation of gene function through Protein domain detection (SMART and Pfam). 2. Sequence similarity methods for homolog detection (BLAST and DELTA-BLAST). 3. Comparing sequences to whole genome data.
Collapse
Affiliation(s)
- Tom C Giles
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK
- Advanced Data Analysis Centre, University of Nottingham, Leicestershire, LE12 5RD, UK
| | - Richard D Emes
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK.
- Advanced Data Analysis Centre, University of Nottingham, Leicestershire, LE12 5RD, UK.
| |
Collapse
|
36
|
Koteswara Reddy G, Nagamalleswara Rao K, Yarrakula K. Insights into structure and function of 30S Ribosomal Protein S2 (30S2) in Chlamydophila pneumoniae: A potent target of pneumonia. Comput Biol Chem 2016; 66:11-20. [PMID: 27866051 DOI: 10.1016/j.compbiolchem.2016.10.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Revised: 10/04/2016] [Accepted: 10/29/2016] [Indexed: 02/01/2023]
Abstract
The gene 30S ribosomal protein S2 (30S2) is identified as a potential drug and vaccine target for Pneumonia. Its structural characterization is an important to understand the mechanism of action for identifying its receptor and/or other binding partners. The comparative genomics and proteomics studies are useful for structural characterization of 30S2 in C. Pneumoniae using different bioinformatics tools and web servers. In this study, the protein 30S2 structure was modelled and validated by Ramachandran plot. It is found that the modelled protein under most favoured "core" region was 88.7% and overall G-factor statistics with average score was -0.20. However, seven sequential motifs have been identified for 30S2 with reference codes (PR0095, PF0038, TIGR01012, PTHR11489, SSF52313 and PTHR11489). In addition, seven structural highly conserved residues have been identified in the large cleft are Lys160, Gly161and Arg162 with volume 1288.83Å3 and average depth of the cleft was 10.75Å. Moreover, biological functions, biochemical process and structural constituents of ribosome are also explored. The study will be helped us to understand the sequential, structural, functional and evolutionary clues of unknown proteins available in C. Pneumoniae.
Collapse
Affiliation(s)
- G Koteswara Reddy
- Centre for Disaster Mitigation and Management, VIT University, Vellore-632014, India.
| | | | - Kiran Yarrakula
- Centre for Disaster Mitigation and Management, VIT University, Vellore-632014, India
| |
Collapse
|
37
|
Wren JD. Bioinformatics programs are 31-fold over-represented among the highest impact scientific papers of the past two decades. Bioinformatics 2016; 32:2686-91. [PMID: 27153671 DOI: 10.1093/bioinformatics/btw284] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 04/21/2016] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. RESULTS When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. CONTACT jdwren@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonathan D Wren
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104-5005, USA, Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, USA
| |
Collapse
|
38
|
Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL. Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.12.034] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
39
|
Tsu BV, Saier MH. The LysE Superfamily of Transport Proteins Involved in Cell Physiology and Pathogenesis. PLoS One 2015; 10:e0137184. [PMID: 26474485 PMCID: PMC4608589 DOI: 10.1371/journal.pone.0137184] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2014] [Accepted: 08/13/2015] [Indexed: 01/13/2023] Open
Abstract
The LysE superfamily consists of transmembrane transport proteins that catalyze export of amino acids, lipids and heavy metal ions. Statistical means were used to show that it includes newly identified families including transporters specific for (1) tellurium, (2) iron/lead, (3) manganese, (4) calcium, (5) nickel/cobalt, (6) amino acids, and (7) peptidoglycolipids as well as (8) one family of transmembrane electron carriers. Internal repeats and conserved motifs were identified, and multiple alignments, phylogenetic trees and average hydropathy, amphipathicity and similarity plots provided evidence that all members of the superfamily derived from a single common 3-TMS precursor peptide via intragenic duplication. Their common origin implies that they share common structural, mechanistic and functional attributes. The transporters of this superfamily play important roles in ionic homeostasis, cell envelope assembly, and protection from excessive cytoplasmic heavy metal/metabolite concentrations. They thus influence the physiology and pathogenesis of numerous microbes, being potential targets of drug action.
Collapse
Affiliation(s)
- Brian V. Tsu
- Department of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, California, United States of America
| | - Milton H. Saier
- Department of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
40
|
Ndhlovu A, Hazelhurst S, Durand PM. Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix. BMC Bioinformatics 2015; 16:255. [PMID: 26269100 PMCID: PMC4535666 DOI: 10.1186/s12859-015-0688-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 07/29/2015] [Indexed: 11/27/2022] Open
Abstract
Background Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. Results The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. Conclusion This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.
Collapse
Affiliation(s)
- Andrew Ndhlovu
- Evolutionary Medicine Laboratory, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa. .,Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa.
| | - Scott Hazelhurst
- School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa. .,Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa.
| | - Pierre M Durand
- Evolutionary Medicine Laboratory, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa. .,Sydney Brenner Institute of Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa. .,Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA. .,Department of Biodiversity and Conservation Biology, Faculty of Natural Sciences, University of the Western Cape, Private Bag X17, Belville, Cape Town, 7530, South Africa.
| |
Collapse
|
41
|
Najibi S, Faghihi M, Golalizadeh M, Arab S. Bayesian alignment of proteins via Delaunay tetrahedralization. J Appl Stat 2015. [DOI: 10.1080/02664763.2014.995605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
42
|
Evolutionary relationship of two ancient protein superfolds. Nat Chem Biol 2014; 10:710-5. [PMID: 25038785 DOI: 10.1038/nchembio.1579] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 06/02/2014] [Indexed: 01/29/2023]
Abstract
Proteins are the molecular machines of the cell that fold into specific three-dimensional structures to fulfill their functions. To improve our understanding of how the structure and function of proteins arises, it is crucial to understand how evolution has generated the structural diversity we observe today. Classically, proteins that adopt different folds are considered to be nonhomologous. However, using state-of-the-art tools for homology detection, we found evidence of homology between proteins of two ancient and highly populated protein folds, the (βα)8-barrel and the flavodoxin-like fold. We detected a family of sequences that show intermediate features between both folds and determined what is to our knowledge the first representative crystal structure of one of its members, giving new insights into the evolutionary link of two of the earliest folds. Our findings contribute to an emergent vision where protein superfolds share common ancestry and encourage further approaches to complete the mapping of structure space onto sequence space.
Collapse
|
43
|
Wong WC, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation. BMC Bioinformatics 2014; 15:166. [PMID: 24890864 PMCID: PMC4061105 DOI: 10.1186/1471-2105-15-166] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 05/27/2014] [Indexed: 02/01/2023] Open
Abstract
Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison.
Collapse
Affiliation(s)
- Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore.
| | | | | | | |
Collapse
|
44
|
Waldispühl J, O'Donnell CW, Will S, Devadas S, Backofen R, Berger B. Simultaneous alignment and folding of protein sequences. J Comput Biol 2014; 21:477-91. [PMID: 24766258 DOI: 10.1089/cmb.2013.0163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/ ).
Collapse
|
45
|
Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 2013; 29:2112-21. [DOI: 10.1093/bioinformatics/btt360] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
46
|
Zheng WH, Västermark Å, Shlykov MA, Reddy V, Sun EI, Saier MH. Evolutionary relationships of ATP-Binding Cassette (ABC) uptake porters. BMC Microbiol 2013; 13:98. [PMID: 23647830 PMCID: PMC3654945 DOI: 10.1186/1471-2180-13-98] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 04/19/2013] [Indexed: 11/20/2022] Open
Abstract
Background The ATP-Binding Cassette (ABC) functional superfamily includes integral transmembrane exporters that have evolved three times independently, forming three families termed ABC1, ABC2 and ABC3, upon which monophyletic ATPases have been superimposed for energy-coupling purposes [e.g., J Membr Biol 231(1):1-10, 2009]. The goal of the work reported in this communication was to understand how the integral membrane constituents of ABC uptake transporters with different numbers of predicted or established transmembrane segments (TMSs) evolved. In a few cases, high resolution 3-dimensional structures were available, and in these cases, their structures plus primary sequence analyses allowed us to predict evolutionary pathways of origin. Results All of the 35 currently recognized families of ABC uptake proteins except for one (family 21) were shown to be homologous using quantitative statistical methods. These methods involved using established programs that compare native protein sequences with each other, after having compared each sequence with thousands of its own shuffled sequences, to gain evidence for homology. Topological analyses suggested that these porters contain numbers of TMSs ranging from four or five to twenty. Intragenic duplication events occurred multiple times during the evolution of these porters. They originated from a simple primordial protein containing 3 TMSs which duplicated to 6 TMSs, and then produced porters of the various topologies via insertions, deletions and further duplications. Except for family 21 which proved to be related to ABC1 exporters, they are all related to members of the previously identified ABC2 exporter family. Duplications that occurred in addition to the primordial 3 → 6 duplication included 5 → 10, 6 → 12 and 10 → 20 TMSs. In one case, protein topologies were uncertain as different programs gave discrepant predictions. It could not be concluded with certainty whether a 4 TMS ancestral protein or a 5 TMS ancestral protein duplicated to give an 8 or a 10 TMS protein. Evidence is presented suggesting but not proving that the 2TMS repeat unit in ABC1 porters derived from the two central TMSs of ABC2 porters. These results provide structural information and plausible evolutionary pathways for the appearance of most integral membrane constituents of ABC uptake transport systems. Conclusions Almost all integral membrane uptake porters of the ABC superfamily belong to the ABC2 family, previously established for exporters. Most of these proteins can have 5, 6, 10, 12 or 20 TMSs per polypeptide chain. Evolutionary pathways for their appearance are proposed.
Collapse
Affiliation(s)
- Wei Hao Zheng
- Department of Molecular Biology, University of California at San Diego, La Jolla, CA 92093-0116, USA
| | | | | | | | | | | |
Collapse
|
47
|
Dargahi D, Baillie D, Pio F. Bioinformatics analysis identify novel OB fold protein coding genes in C. elegans. PLoS One 2013; 8:e62204. [PMID: 23638006 PMCID: PMC3636199 DOI: 10.1371/journal.pone.0062204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 03/20/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5-25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n=46) compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n=344) or fruit fly D. melanogaster (n=84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies. METHODOLOGY/PRINCIPAL FINDINGS This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans. CONCLUSIONS/SIGNIFICANCE This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.
Collapse
Affiliation(s)
- Daryanaz Dargahi
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| | - David Baillie
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Frederic Pio
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
48
|
Graham DJ. A new bioinformatics approach to natural protein collections: permutation structure contrasts of viral and cellular systems. Protein J 2013; 32:275-87. [PMID: 23605224 DOI: 10.1007/s10930-013-9485-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Biological cells and viruses operate by different replication and symmetry paradigms. Cells are able to replicate independently and express little spatial symmetry; viruses require cells for replication while manifesting high symmetry. The author inquired whether different paradigms were reflected in the permutations of amino acid sequences. The hypothesis was that the permutation structure level and symmetry within viral protein collections exceed that of living cells. The rationale was that one symmetry aspect generally accompanies and promotes others in a system. The inquiry was readily answered given abundant sequence archives for proteins. The analysis of collections from diverse viral and cellular sources lends strong support. Additional insights into protein primary structure, the design of collections, and the role of information are provided as well.
Collapse
Affiliation(s)
- Daniel J Graham
- Department of Chemistry, Loyola University Chicago, 6525 North Sheridan Road, Chicago, IL 60626, USA.
| |
Collapse
|
49
|
Polyanovskii VO, Tumanyan VG. Estimation of the quality of global alignment of amino acid sequences based on evolution criterion. Biophysics (Nagoya-shi) 2013. [DOI: 10.1134/s0006350913020140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
50
|
Abstract
Ferredoxins are electron carrier proteins that contain active sites consisting of nonheme iron and inorganic sulfur. They are ubiquitous in living cells and are believed to be among the earliest redox proteins having appeared in primitive organisms. The small size of Ferredoxins allows their amino acid sequences to be determined with relative ease, and nearly a hundred primary structures have been elucidated over the past two decades. Most of these proteins belong to two distinct groups which have been used to construct phylogenetic trees of bacteria and oxygenic photosynthetic organisms respectively. A number of other Ferredoxins, however, seem to be unrelated to any of these two families of proteins and thus raise the problem of the origin of ferredoxins: are they all derived from a common ancestor, or have they appeared and evolved independently several times in the course of biological evolution? This issue is critical in view of the importance of Ferredoxins as evolutionary markers. There is evidence suggesting that presently known ferredoxins belong to at least five independent phyletic lines.
Collapse
Affiliation(s)
- J Meyer
- DRF-LBio-Biochimie Microbienne CENG, 38041 Grenoble, Cedex, France
| |
Collapse
|