1
|
Perin C, Cretin G, Gelly JC. Hierarchical Analysis of Protein Structures: From Secondary Structures to Protein Units and Domains. Methods Mol Biol 2025; 2870:357-370. [PMID: 39543044 DOI: 10.1007/978-1-0716-4213-9_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
The three-dimensional structure of proteins is traditionally organized into hierarchical levels, specifically secondary structures and domains. However, different studies suggest the existence of intermediate levels, such as Protein Units (PUs), which provide a refined understanding of protein architecture. PUs, characterized by their compactness and independence, serve as an intermediate organizational level, bridging the gap between secondary structures and domains. This new view not only enhances our comprehension of protein structure, folding, and evolutionary mechanisms but also provides a robust methodology for identifying and categorizing protein domains. Based on the concept of PUs, alternative structural partitioning solutions can be proposed that address the structural ambiguity of proteins, leading to more meaningful domain identification.
Collapse
Affiliation(s)
- Charlotte Perin
- TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse, France
| | | | | |
Collapse
|
2
|
Sidhanta SPD, Sowdhamini R, Srinivasan N. Comparative analysis of permanent and transient domain-domain interactions in multi-domain proteins. Proteins 2025; 93:197-208. [PMID: 37828826 DOI: 10.1002/prot.26581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/09/2023] [Accepted: 08/11/2023] [Indexed: 10/14/2023]
Abstract
Protein domains are structural, functional, and evolutionary units. These domains bring out the diversity of functionality by means of interactions with other co-existing domains and provide stability. Hence, it is important to study intra-protein inter-domain interactions from the perspective of types of interactions. Domains within a chain could interact over short timeframes or permanently, rather like protein-protein interactions (PPIs). However, no systematic study has been carried out between two classes, namely permanent and transient domain-domain interactions. In this work, we studied 263 two-domain proteins, belonging to either of these classes and their interfaces on the basis of several factors, such as interface area and details of interactions (number, strength, and types of interactions). We also characterized them based on residue conservation at the interface, correlation of residue motions across domains, its involvement in repeat formation, and their involvement in particular molecular processes. Finally, we could analyze the interactions arising from domains in two-domain monomeric proteins, and we observed significant differences between these two classes of domain interactions and a few similarities. This study will help to obtain a better understanding of structure-function and folding principles of multi-domain proteins.
Collapse
Affiliation(s)
| | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- Computational Approaches to Protein Science, National Centre for Biological Sciences, Bangalore, India
- Computational Biology, Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | | |
Collapse
|
3
|
Altenhoff A, Nevers Y, Tran V, Jyothi D, Martin M, Cosentino S, Majidian S, Marcet-Houben M, Fuentes-Palacios D, Persson E, Walsh T, Lecompte O, Gabaldón T, Kelly S, Hu Y, Iwasaki W, Capella-Gutierrez S, Dessimoz C, Thomas P, Ebersberger I, Sonnhammer E. New developments for the Quest for Orthologs benchmark service. NAR Genom Bioinform 2024; 6:lqae167. [PMID: 39664814 PMCID: PMC11632614 DOI: 10.1093/nargab/lqae167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/17/2024] [Accepted: 11/12/2024] [Indexed: 12/13/2024] Open
Abstract
The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation. It is supported and maintained by the QfO consortium, and is used to gather ortholog predictions and to examine strengths and weaknesses of newly developed and existing orthology inference methods. The web server allows different inference methods to be compared in a standardized way using the same proteome data. The benchmark results are useful for developing new methods and can help researchers to guide their choice of orthology method for applications in comparative genomics and phylogenetic analysis. We here present a new release of the Orthology Benchmark Service with a new benchmark based on feature architecture similarity as well as updated reference proteomes. We further provide a meta-analysis of the public predictions from 18 different orthology assignment methods to reveal how they relate in terms of ortholog predictions and benchmark performance. These results can guide users of orthologs to the best suited method for their purpose.
Collapse
Affiliation(s)
- Adrian Altenhoff
- ETH Zurich, Department of Computer Science,Universitätstrasse 19, 8092 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Génopode, 1015 Lausanne, Switzerland
| | - Vinh Tran
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Department of Biosciences, Goethe University, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany
| | - Dushyanth Jyothi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Salvatore Cosentino
- Department of Integrated Biosciences, University of Tokyo, Tokyo 277-0882, Japan
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Génopode, 1015 Lausanne, Switzerland
| | - Marina Marcet-Houben
- Barcelona Supercomputing Center (BSC-CNS), Plaça d'Eusebi Güell, 1-3, 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Carrer Baldiri Reixac, 10, 08028 Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Monforte de Lemos, 3-5. Pabellón 11, 28029 Madrid, Spain
| | - Diego Fuentes-Palacios
- Barcelona Supercomputing Center (BSC-CNS), Plaça d'Eusebi Güell, 1-3, 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Carrer Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden
| | - Thomas Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, Centre de Recherche en Biomédecine de Strasbourg, University of Strasbourg, CNRS, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Toni Gabaldón
- Barcelona Supercomputing Center (BSC-CNS), Plaça d'Eusebi Güell, 1-3, 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Carrer Baldiri Reixac, 10, 08028 Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Monforte de Lemos, 3-5. Pabellón 11, 28029 Madrid, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08003 Barcelona, Spain
| | - Steven Kelly
- Department of Biology, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Wataru Iwasaki
- Department of Integrated Biosciences, University of Tokyo, Tokyo 277-0882, Japan
| | | | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Génopode, 1015 Lausanne, Switzerland
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Department of Biosciences, Goethe University, Max-von-Laue-Str. 13, D-60438 Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Senckenberganlage 25, D-60325 Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Senckenberganlage 25, D-60325 Frankfurt am Main, Germany
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden
| |
Collapse
|
4
|
Li W, Almirantis Y, Provata A. Range-limited Heaps' law for functional DNA words in the human genome. J Theor Biol 2024; 592:111878. [PMID: 38901778 DOI: 10.1016/j.jtbi.2024.111878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 05/31/2024] [Accepted: 06/10/2024] [Indexed: 06/22/2024]
Abstract
Heaps' or Herdan-Heaps' law is a linguistic law describing the relationship between the vocabulary/dictionary size (type) and word counts (token) to be a power-law function. Its existence in genomes with certain definition of DNA words is unclear partly because the dictionary size in genome could be much smaller than that in a human language. We define a DNA word as a coding region in a genome that codes for a protein domain. Using human chromosomes and chromosome arms as individual samples, we establish the existence of Heaps' law in the human genome within limited range. Our definition of words in a genomic or proteomic context is different from other definitions such as over-represented k-mers which are much shorter in length. Although an approximate power-law distribution of protein domain sizes due to gene duplication and the related Zipf's law is well known, their translation to the Heaps' law in DNA words is not automatic. Several other animal genomes are shown herein also to exhibit range-limited Heaps' law with our definition of DNA words, though with various exponents. When tokens were randomly sampled and sample sizes reach to the maximum level, a deviation from the Heaps' law was observed, but a quadratic regression in log-log type-token plot fits the data perfectly. Investigation of type-token plot and its regression coefficients could provide an alternative narrative of reusage and redundancy of protein domains as well as creation of new protein domains from a linguistic perspective.
Collapse
Affiliation(s)
- Wentian Li
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA(1); The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA.
| | - Yannis Almirantis
- Theoretical Biology and Computational Genomics Laboratory, Institute of Bioscience and Applications, National Center for Scientific Research "Demokritos", 15341 Athens, Greece
| | - Astero Provata
- Statistical Mechanics and Dynamical Systems Laboratory, Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", 15341 Athens, Greece
| |
Collapse
|
5
|
Barone F, Russo ET, Villegas Garcia EN, Punta M, Cozzini S, Ansuini A, Cazzaniga A. Protein family annotation for the Unified Human Gastrointestinal Proteome by DPCfam clustering. Sci Data 2024; 11:568. [PMID: 38824125 PMCID: PMC11144186 DOI: 10.1038/s41597-024-03131-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 03/08/2024] [Indexed: 06/03/2024] Open
Abstract
Technological advances in massively parallel sequencing have led to an exponential growth in the number of known protein sequences. Much of this growth originates from metagenomic projects producing new sequences from environmental and clinical samples. The Unified Human Gastrointestinal Proteome (UHGP) catalogue is one of the most relevant metagenomic datasets with applications ranging from medicine to biology. However, the low levels of sequence annotation may impair its usability. This work aims to produce a family classification of UHGP sequences to facilitate downstream structural and functional annotation. This is achieved through the release of the DPCfam-UHGP50 dataset containing 10,778 putative protein families generated using DPCfam clustering, an unsupervised pipeline grouping sequences into single or multi-domain architectures. DPCfam-UHGP50 considerably improves family coverage at protein and residue levels compared to the manually curated repository Pfam. In the hope that DPCfam-UHGP50 will foster future discoveries in the field of metagenomics of the human gut, we release a FAIR-compliant database of our results that is easily accessible via a searchable web server and Zenodo repository.
Collapse
Affiliation(s)
- Federico Barone
- Area Science Park, Padriciano, 99, 34149, Trieste, Italy
- University of Trieste, Trieste, 34127, Italy
| | | | | | - Marco Punta
- IRCCS San Raffaele Institute, Center for Omics Sciences, Milan, 20132, Italy
- IRCCS San Raffaele Institute, Unit of Immunogenetics, Leukemia Genomics and Immunobiology, Division of Immunology, Transplantation and Infectious Disease, Milan, 20132, Italy
| | | | | | | |
Collapse
|
6
|
Klemm P, Stadler PF, Lechner M. Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs. FRONTIERS IN BIOINFORMATICS 2023; 3:1322477. [PMID: 38152702 PMCID: PMC10751348 DOI: 10.3389/fbinf.2023.1322477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 11/06/2023] [Indexed: 12/29/2023] Open
Abstract
Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy-the pseudo-reciprocal best alignment heuristic-that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.
Collapse
Affiliation(s)
- Paul Klemm
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Marburg, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Institute of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, United States
| | - Marcus Lechner
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-Universität Marburg, Marburg, Germany
| |
Collapse
|
7
|
Mahlich Y, Zhu C, Chung H, Velaga PK, De Paolis Kaluza M, Radivojac P, Friedberg I, Bromberg Y. Learning from the unknown: exploring the range of bacterial functionality. Nucleic Acids Res 2023; 51:10162-10175. [PMID: 37739408 PMCID: PMC10602916 DOI: 10.1093/nar/gkad757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 09/11/2023] [Indexed: 09/24/2023] Open
Abstract
Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
- Xbiome Inc., 1 Broadway, 14th fl, Cambridge, MA 02142, USA
| | - Henri Chung
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
- Interdepartmental program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Pavan K Velaga
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - M Clara De Paolis Kaluza
- Khoury College of Computer Sciences, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
- Interdepartmental program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA 30322, USA
| |
Collapse
|
8
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
9
|
Cretin G, Périn C, Zimmermann N, Galochkina T, Gelly JC. ICARUS: flexible protein structural alignment based on Protein Units. Bioinformatics 2023; 39:btad459. [PMID: 37498544 PMCID: PMC10400377 DOI: 10.1093/bioinformatics/btad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/04/2023] [Accepted: 07/26/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Alignment of protein structures is a major problem in structural biology. The first approach commonly used is to consider proteins as rigid bodies. However, alignment of protein structures can be very complex due to conformational variability, or complex evolutionary relationships between proteins such as insertions, circular permutations or repetitions. In such cases, introducing flexibility becomes useful for two reasons: (i) it can help compare two protein chains which adopted two different conformational states, such as due to proteins/ligands interaction or post-translational modifications, and (ii) it aids in the identification of conserved regions in proteins that may have distant evolutionary relationships. RESULTS We propose ICARUS, a new approach for flexible structural alignment based on identification of Protein Units, evolutionarily preserved structural descriptors of intermediate size, between secondary structures and domains. ICARUS significantly outperforms reference methods on a dataset of very difficult structural alignments. AVAILABILITY AND IMPLEMENTATION Code is freely available online at https://github.com/DSIMB/ICARUS.
Collapse
Affiliation(s)
- Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Charlotte Périn
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- TBI, Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France
| | - Nicolas Zimmermann
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| |
Collapse
|
10
|
Lin ZJ, Huang BX, Su LF, Zhu SY, He JW, Chen GZ, Lin PX. Sub-region analysis of DMD gene in cases with idiopathic generalized epilepsy. Neurogenetics 2023; 24:161-169. [PMID: 37022522 DOI: 10.1007/s10048-023-00715-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/24/2023] [Indexed: 04/07/2023]
Abstract
Gene sub-region encoded protein domain is the basic unit for protein structure and function. The DMD gene is the largest coding gene in humans, with its phenotype relevant to idiopathic generalized epilepsy. We hypothesized variants clustered in sub-regions of idiopathic generalized epilepsy genes and investigated the relationship between the DMD gene and idiopathic generalized epilepsy. Whole exome sequencing was performed in 106 idiopathic generalized epilepsy individuals. DMD variants were filtered with variant type, allele frequency, in silico prediction, hemizygous or homozygous status in the population, inheritance mode, and domain location. Variants located at the sub-regions were selected by the subRVIS software. The pathogenicity of variants was evaluated by the American College of Medical Genetics and Genomics criteria. Articles on functional studies related to epilepsy for variants clustered protein domains were reviewed. In sub-regions of the DMD gene, two variants were identified in two unrelated cases with juvenile absence epilepsy or juvenile myoclonic epilepsy. The pathogenicity of both variants was uncertain significance. Allele frequency of both variants in probands with idiopathic generalized epilepsy reached statistical significance compared with the population (Fisher's test, p = 2.02 × 10-6, adjusted α = 4.52 × 10-6). The variants clustered in the spectrin domain of dystrophin, which binds to glycoprotein complexes and indirectly affects ion channels contributing to epileptogenesis. Gene sub-region analysis suggests a weak association between the DMD gene and idiopathic generalized epilepsy. Functional analysis of gene sub-region helps infer the pathogenesis of idiopathic generalized epilepsy.
Collapse
Affiliation(s)
- Zhi-Jian Lin
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Bi-Xia Huang
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Li-Fang Su
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Sheng-Yin Zhu
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Jun-Wei He
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Guo-Zhang Chen
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China
| | - Peng-Xing Lin
- Department of Neurology, The Affiliated Hospital of Putian University, Brain Science Institute of Putian University, 999 Dongzhen East Road, Licheng District, Putian, 351100, China.
| |
Collapse
|
11
|
Bolz SN, Schroeder M. Promiscuity in drug discovery on the verge of the structural revolution: recent advances and future chances. Expert Opin Drug Discov 2023; 18:973-985. [PMID: 37489516 DOI: 10.1080/17460441.2023.2239700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023]
Abstract
INTRODUCTION Promiscuity denotes the ability of ligands and targets to specifically interact with multiple binding partners. Despite negative aspects like side effects, promiscuity is receiving increasing attention in drug discovery as it can enhance drug efficacy and provides a molecular basis for drug repositioning. The three-dimensional structure of ligand-target complexes delivers exclusive insights into the molecular mechanisms of promiscuity and structure-based methods enable the identification of promiscuous interactions. With the recent breakthrough in protein structure prediction, novel possibilities open up to reveal unknown connections in ligand-target interaction networks. AREAS COVERED This review highlights the significance of structure in the identification and characterization of promiscuity and evaluates the potential of protein structure prediction to advance our knowledge of drug-target interaction networks. It discusses the definition and relevance of promiscuity in drug discovery and explores different approaches to detecting promiscuous ligands and targets. EXPERT OPINION Examination of structural data is essential for understanding and quantifying promiscuity. The recent advancements in structure prediction have resulted in an abundance of targets that are well-suited for structure-based methods like docking. In silico approaches may eventually completely transform our understanding of drug-target networks by complementing the millions of predicted protein structures with billions of predicted drug-target interactions.
Collapse
Affiliation(s)
- Sarah Naomi Bolz
- Biotechnology Center (BIOTEC), CMCB, Technische Universität Dresden, Dresden, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), CMCB, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
12
|
Carss KJ, Deaton AM, Del Rio-Espinola A, Diogo D, Fielden M, Kulkarni DA, Moggs J, Newham P, Nelson MR, Sistare FD, Ward LD, Yuan J. Using human genetics to improve safety assessment of therapeutics. Nat Rev Drug Discov 2023; 22:145-162. [PMID: 36261593 DOI: 10.1038/s41573-022-00561-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/02/2022] [Indexed: 02/07/2023]
Abstract
Human genetics research has discovered thousands of proteins associated with complex and rare diseases. Genome-wide association studies (GWAS) and studies of Mendelian disease have resulted in an increased understanding of the role of gene function and regulation in human conditions. Although the application of human genetics has been explored primarily as a method to identify potential drug targets and support their relevance to disease in humans, there is increasing interest in using genetic data to identify potential safety liabilities of modulating a given target. Human genetic variants can be used as a model to anticipate the effect of lifelong modulation of therapeutic targets and identify the potential risk for on-target adverse events. This approach is particularly useful for non-clinical safety evaluation of novel therapeutics that lack pharmacologically relevant animal models and can contribute to the intrinsic safety profile of a drug target. This Review illustrates applications of human genetics to safety studies during drug discovery and development, including assessing the potential for on- and off-target associated adverse events, carcinogenicity risk assessment, and guiding translational safety study designs and monitoring strategies. A summary of available human genetic resources and recommended best practices is provided. The challenges and future perspectives of translating human genetic information to identify risks for potential drug effects in preclinical and clinical development are discussed.
Collapse
Affiliation(s)
| | - Aimee M Deaton
- Amgen, Cambridge, MA, USA.,Alnylam Pharmaceuticals, Cambridge, MA, USA
| | - Alberto Del Rio-Espinola
- Novartis Institutes for BioMedical Research, Basel, Switzerland.,GentiBio Inc., Cambridge, MA, USA
| | | | - Mark Fielden
- Amgen, Thousand Oaks, MA, USA.,Kate Therapeutics, San Diego, CA, USA
| | | | - Jonathan Moggs
- Novartis Institutes for BioMedical Research, Basel, Switzerland
| | | | | | - Frank D Sistare
- Merck & Co., West Point, PA, USA.,315 Meadowmont Ln, Chapel Hill, NC, USA
| | - Lucas D Ward
- Amgen, Cambridge, MA, USA. .,Alnylam Pharmaceuticals, Cambridge, MA, USA.
| | - Jing Yuan
- Amgen, Cambridge, MA, USA.,Pfizer, Cambridge, MA, USA
| |
Collapse
|
13
|
Launay R, Teppa E, Esque J, André I. Modeling Protein Complexes and Molecular Assemblies Using Computational Methods. Methods Mol Biol 2023; 2553:57-77. [PMID: 36227539 DOI: 10.1007/978-1-0716-2617-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many biological molecules are assembled into supramolecular complexes that are necessary to perform functions in the cell. Better understanding and characterization of these molecular assemblies are thus essential to further elucidate molecular mechanisms and key protein-protein interactions that could be targeted to modulate the protein binding affinity or develop new binders. Experimental access to structural information on these supramolecular assemblies is often hampered by the size of these systems that make their recombinant production and characterization rather difficult. Computational methods combining both structural data, molecular modeling techniques, and sequence coevolution information can thus offer a good alternative to gain access to the structural organization of protein complexes and assemblies. Herein, we present some computational methods to predict structural models of the protein partners, to search for interacting regions using coevolution information, and to build molecular assemblies. The approach is exemplified using a case study to model the succinate-quinone oxidoreductase heterocomplex.
Collapse
Affiliation(s)
- Romain Launay
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France
| | - Elin Teppa
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France
| | - Jérémy Esque
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France.
| | - Isabelle André
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France.
| |
Collapse
|
14
|
Fukuchi S, Noguchi T, Anbo H, Homma K. Exon Elongation Added Intrinsically Disordered Regions to the Encoded Proteins and Facilitated the Emergence of the Last Eukaryotic Common Ancestor. Mol Biol Evol 2022; 40:6931801. [PMID: 36529689 PMCID: PMC9825244 DOI: 10.1093/molbev/msac272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/06/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Most prokaryotic proteins consist of a single structural domain (SD) with little intrinsically disordered regions (IDRs) that by themselves do not adopt stable structures, whereas the typical eukaryotic protein comprises multiple SDs and IDRs. How eukaryotic proteins evolved to differ from prokaryotic proteins has not been fully elucidated. Here, we found that the longer the internal exons are, the more frequently they encode IDRs in eight eukaryotes including vertebrates, invertebrates, a fungus, and plants. Based on this observation, we propose the "small bang" model from the proteomic viewpoint: the protoeukaryotic genes had no introns and mostly encoded one SD each, but a majority of them were subsequently divided into multiple exons (step 1). Many exons unconstrained by SDs elongated to encode IDRs (step 2). The elongated exons encoding IDRs frequently facilitated the acquisition of multiple SDs to make the last common ancestor of eukaryotes (step 3). One prediction of the model is that long internal exons are mostly unconstrained exons. Analytical results of the eight eukaryotes are consistent with this prediction. In support of the model, we identified cases of internal exons that elongated after the rat-mouse divergence and discovered that the expanded sections are mostly in unconstrained exons and preferentially encode IDRs. The model also predicts that SDs followed by long internal exons tend to have other SDs downstream. This prediction was also verified in all the eukaryotic species analyzed. Our model accounts for the dichotomy between prokaryotic and eukaryotic proteins and proposes a selective advantage conferred by IDRs.
Collapse
Affiliation(s)
- Satoshi Fukuchi
- Program for Information Systems, Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi-shi, Japan
| | - Tamotsu Noguchi
- Pharmaceutical Education Research Center, Meiji Pharmaceutical University, Kiyose, Tokyo, Japan
| | - Hiroto Anbo
- Program for Information Systems, Division of Informatics, Bioengineering and Bioscience, Maebashi Institute of Technology, Maebashi-shi, Japan
| | | |
Collapse
|
15
|
Tripathy M, Srivastava A, Sastry S, Rao M. Protein as evolvable functionally constrained amorphous matter. J Biosci 2022. [DOI: 10.1007/s12038-022-00313-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
16
|
Mohanty P, Kapoor U, Sundaravadivelu Devarajan D, Phan TM, Rizuan A, Mittal J. Principles Governing the Phase Separation of Multidomain Proteins. Biochemistry 2022; 61:2443-2455. [PMID: 35802394 PMCID: PMC9669140 DOI: 10.1021/acs.biochem.2c00210] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A variety of membraneless organelles, often termed "biological condensates", play an important role in the regulation of cellular processes such as gene transcription, translation, and protein quality control. On the basis of experimental and theoretical investigations, liquid-liquid phase separation (LLPS) has been proposed as a possible mechanism for the origin of biological condensates. LLPS requires multivalent macromolecules that template the formation of long-range, intermolecular interaction networks and results in the formation of condensates with defined composition and material properties. Multivalent interactions driving LLPS exhibit a wide range of modes from highly stereospecific to nonspecific and involve both folded and disordered regions. Multidomain proteins serve as suitable macromolecules for promoting phase separation and achieving disparate functions due to their potential for multivalent interactions and regulation. Here, we aim to highlight the influence of the domain architecture and interdomain interactions on the phase separation of multidomain protein condensates. First, the general principles underlying these interactions are illustrated on the basis of examples of multidomain proteins that are predominantly associated with nucleic acid binding and protein quality control and contain both folded and disordered regions. Next, the examples showcase how LLPS properties of folded and disordered regions can be leveraged to engineer multidomain constructs that form condensates with the desired assembly and functional properties. Finally, we highlight the need for improvements in coarse-grained computational models that can provide molecular-level insights into multidomain protein condensates in conjunction with experimental efforts.
Collapse
Affiliation(s)
- Priyesh Mohanty
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | - Utkarsh Kapoor
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | | | - Tien Minh Phan
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | - Azamat Rizuan
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | - Jeetain Mittal
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| |
Collapse
|
17
|
Tang QY, Ren W, Wang J, Kaneko K. The Statistical Trends of Protein Evolution: A Lesson from AlphaFold Database. Mol Biol Evol 2022; 39:msac197. [PMID: 36108094 PMCID: PMC9550990 DOI: 10.1093/molbev/msac197] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The recent development of artificial intelligence provides us with new and powerful tools for studying the mysterious relationship between organism evolution and protein evolution. In this work, based on the AlphaFold Protein Structure Database (AlphaFold DB), we perform comparative analyses of the proteins of different organisms. The statistics of AlphaFold-predicted structures show that, for organisms with higher complexity, their constituent proteins will have larger radii of gyration, higher coil fractions, and slower vibrations, statistically. By conducting normal mode analysis and scaling analyses, we demonstrate that higher organismal complexity correlates with lower fractal dimensions in both the structure and dynamics of the constituent proteins, suggesting that higher functional specialization is associated with higher organismal complexity. We also uncover the topology and sequence bases of these correlations. As the organismal complexity increases, the residue contact networks of the constituent proteins will be more assortative, and these proteins will have a higher degree of hydrophilic-hydrophobic segregation in the sequences. Furthermore, by comparing the statistical structural proximity across the proteomes with the phylogenetic tree of homologous proteins, we show that, statistical structural proximity across the proteomes may indirectly reflect the phylogenetic proximity, indicating a statistical trend of protein evolution in parallel with organism evolution. This study provides new insights into how the diversity in the functionality of proteins increases and how the dimensionality of the manifold of protein dynamics reduces during evolution, contributing to the understanding of the origin and evolution of lives.
Collapse
Affiliation(s)
- Qian-Yuan Tang
- Laboratory for Neural Computation and Adaptation, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0106, Japan
| | - Weitong Ren
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Jun Wang
- School of Physics, National Laboratory of Solid State Microstructure, and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, People’s Republic of China
| | - Kunihiko Kaneko
- Center for Complex Systems Biology, Universal Biology Institute, University of Tokyo, Komaba, Meguro, Tokyo 153-8902, Japan
- The Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, Copenhagen 2100-DK, Denmark
| |
Collapse
|
18
|
Romei M, Sapriel G, Imbert P, Jamay T, Chomilier J, Lecointre G, Carpentier M. Protein folds as synapomorphies of the tree of life. Evolution 2022; 76:1706-1719. [PMID: 35765784 PMCID: PMC9541633 DOI: 10.1111/evo.14550] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 01/22/2023]
Abstract
Several studies showed that folds (topology of protein secondary structures) distribution in proteomes may be a global proxy to build phylogeny. Then, some folds should be synapomorphies (derived characters exclusively shared among taxa). However, previous studies used methods that did not allow synapomorphy identification, which requires congruence analysis of folds as individual characters. Here, we map SCOP folds onto a sample of 210 species across the tree of life (TOL). Congruence is assessed using retention index of each fold for the TOL, and principal component analysis for deeper branches. Using a bicluster mapping approach, we define synapomorphic blocks of folds (SBF) sharing similar presence/absence patterns. Among the 1232 folds, 20% are universally present in our TOL, whereas 54% are reliable synapomorphies. These results are similar with CATH and ECOD databases. Eukaryotes are characterized by a large number of them, and several SBFs clearly support nested eukaryotic clades (divergence times from 1100 to 380 mya). Although clearly separated, the three superkingdoms reveal a strong mosaic pattern. This pattern is consistent with the dual origin of eukaryotes and witness secondary endosymbiosis in their phothosynthetic clades. Our study unveils direct analysis of folds synapomorphies as key characters to unravel evolutionary history of species.
Collapse
Affiliation(s)
- Martin Romei
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,IMPMC (UMR 7590), BiBiP, Sorbonne Université, CNRS, MNHNParisFrance
| | - Guillaume Sapriel
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,UFR des sciences de la santéUniversité Versailles‐St‐QuentinVersaillesFrance
| | - Pierre Imbert
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Théo Jamay
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | | | - Guillaume Lecointre
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| |
Collapse
|
19
|
Kuang D, Issakova D, Kim J. Learning Proteome Domain Folding Using LSTMs in an Empirical Kernel Space. J Mol Biol 2022; 434:167686. [PMID: 35716781 DOI: 10.1016/j.jmb.2022.167686] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 06/08/2022] [Accepted: 06/10/2022] [Indexed: 11/30/2022]
Abstract
The recognition of protein structural folds is the starting point for protein function inference and for many structural prediction tools. We previously introduced the idea of using empirical comparisons to create a data-augmented feature space called PESS (Protein Empirical Structure Space)1 as a novel approach for protein structure prediction. Here, we extend the previous approach by generating the PESS feature space over fixed-length subsequences of query peptides, and applying a sequential neural network model, with one long short-term memory cell layer followed by a fully connected layer. Using this approach, we show that only a small group of domains as a training set is needed to achieve near state-of-the-art accuracy on fold recognition. Our method improves on the previous approach by reducing the training set required and improving the model's ability to generalize across species, which will help fold prediction for newly discovered proteins.
Collapse
Affiliation(s)
- Da Kuang
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA.
| | - Dina Issakova
- Department of Biology, University of Pennsylvania, Philadelphia, USA.
| | - Junhyong Kim
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA; Department of Biology, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
20
|
Zhao R, Pei S, Yau SST. New Genome Sequence Detection via Natural Vector Convex Hull Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1782-1793. [PMID: 33237867 DOI: 10.1109/tcbb.2020.3040706] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It remains challenging how to find existing but undiscovered genome sequence mutations or predict potential genome sequence mutations based on real sequence data. Motivated by this, we develop approaches to detect new, undiscovered genome sequences. Because discovering new genome sequences through biological experiments is resource-intensive, we want to achieve the new genome sequence detection task mathematically. However, little literature tells us how to detect new, undiscovered genome sequence mutations mathematically. We form a new framework based on natural vector convex hull method that conducts alignment-free sequence analysis. Our newly developed two approaches, Random-permutation Algorithm with Penalty (RAP) and Random-permutation Algorithm with Penalty and COstrained Search (RAPCOS), use the geometry properties captured by natural vectors. In our experiment, we discover a mathematically new human immunodeficiency virus (HIV) genome sequence using some real HIV genome sequences. Significantly, the proposed methods are applicable to solve the new genome sequence detection challenge and have many good properties, such as robustness, rapid convergence, and fast computation.
Collapse
|
21
|
Suraweera CD, Banjara S, Hinds MG, Kvansakul M. Metazoans and Intrinsic Apoptosis: An Evolutionary Analysis of the Bcl-2 Family. Int J Mol Sci 2022; 23:ijms23073691. [PMID: 35409052 PMCID: PMC8998228 DOI: 10.3390/ijms23073691] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/24/2022] [Accepted: 03/25/2022] [Indexed: 01/12/2023] Open
Abstract
The B-cell lymphoma-2 (Bcl-2) family is a group of genes regulating intrinsic apoptosis, a process controlling events such as development, homeostasis and the innate and adaptive immune responses in metazoans. In higher organisms, Bcl-2 proteins coordinate intrinsic apoptosis through their regulation of the integrity of the mitochondrial outer membrane; this function appears to have originated in the basal metazoans. Bcl-2 genes predate the cnidarian-bilaterian split and have been identified in porifera, placozoans and cnidarians but not ctenophores and some nematodes. The Bcl-2 family is composed of two groups of proteins, one with an α-helical Bcl-2 fold that has been identified in porifera, placozoans, cnidarians, and almost all higher bilaterians. The second group of proteins, the BH3-only group, has little sequence conservation and less well-defined structures and is found in cnidarians and most bilaterians, but not porifera or placozoans. Here we examine the evolutionary relationships between Bcl-2 proteins. We show that the structures of the Bcl-2-fold proteins are highly conserved over evolutionary time. Some metazoans such as the urochordate Oikopleura dioica have lost all Bcl-2 family members. This gene loss indicates that Bcl-2 regulated apoptosis is not an absolute requirement in metazoans, a finding mirrored in recent gene deletion studies in mice. Sequence analysis suggests that at least some Bcl-2 proteins lack the ability to bind BH3-only antagonists and therefore potentially have other non-apoptotic functions. By examining the foundations of the Bcl-2 regulated apoptosis, functional relationships may be clarified that allow us to understand the role of specific Bcl-2 proteins in evolution and disease.
Collapse
Affiliation(s)
- Chathura D. Suraweera
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, VIC 3086, Australia; (C.D.S.); (S.B.)
| | - Suresh Banjara
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, VIC 3086, Australia; (C.D.S.); (S.B.)
| | - Mark G. Hinds
- Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC 3052, Australia
- Correspondence: (M.G.H.); (M.K.)
| | - Marc Kvansakul
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, VIC 3086, Australia; (C.D.S.); (S.B.)
- Correspondence: (M.G.H.); (M.K.)
| |
Collapse
|
22
|
Li DD, Wang JL, Liu Y, Li YZ, Zhang Z. Expanded analyses of the functional correlations within structural classifications of glycoside hydrolases. Comput Struct Biotechnol J 2021; 19:5931-5942. [PMID: 34849197 PMCID: PMC8602953 DOI: 10.1016/j.csbj.2021.10.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/30/2021] [Accepted: 10/30/2021] [Indexed: 01/01/2023] Open
Abstract
Glycoside hydrolases (GHs) are greatly diverse in sequences and functions, but systematic studies of GH relationships based on structural information are lacking. Here, we report that GHs have multiple evolutionary origins and are structurally derived from 27 homologous superfamilies and 16 folds, but GHs are highly biased to distribute in a few superfamilies and folds. Six of these superfamilies are widely encoded by archaea, bacteria, and eukaryotes, indicating that they may be the most ancient in origin. Most superfamilies vary in enzyme function, and some, such as the superfamilies of (β/α)8-barrel and (α/α)6-barrel structures, exhibit extreme functional diversity; this is highly positively correlated with sequence diversity. More than one-third of glycosidase activities show a phenomenon of convergent evolution, especially the degradation functions of GHs on polysaccharides. The GHs of most superfamilies have relatively narrow environmental distributions, normally with the highest abundance in host-associated environments and a distribution preference for moderate low-temperature and acidic environments. Overall, our expanded analysis facilitates an understanding of complex GH sequence-structure-function relationships and may guide our screening and engineering of GHs.
Collapse
Affiliation(s)
- Dan-Dan Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Jin-Lan Wang
- National Administration of Health Data, Jinan 250002, China
| | - Ya Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Yue-Zhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Zheng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China.,Suzhou Research Institute, Shandong University, Suzhou 215123, China
| |
Collapse
|
23
|
Rose GD. Protein folding - seeing is deceiving. Protein Sci 2021; 30:1606-1616. [PMID: 33938055 PMCID: PMC8284583 DOI: 10.1002/pro.4096] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/24/2021] [Accepted: 04/30/2021] [Indexed: 11/13/2022]
Abstract
This Perspective is intended to raise questions about the conventional interpretation of protein folding. According to the conventional interpretation, developed over many decades, a protein population can visit a vast number of conformations under unfolding conditions, but a single dominant native population emerges under folding conditions. Accordingly, folding comes with a substantial loss of conformational entropy. How is this price paid? The conventional answer is that favorable interactions between and among the side chains can compensate for entropy loss, and moreover, these interactions are responsible for the structural particulars of the native conformation. Challenging this interpretation, the Perspective introduces a proposal that high energy (i.e., unfavorable) excluding interactions winnow the accessible population substantially under physical-chemical conditions that favor folding. Both steric clash and unsatisfied hydrogen bond donors and acceptors are classified as excluding interactions, so called because conformers with such disfavored interactions will be largely excluded from the thermodynamic population. Both excluding interactions and solvent factors that induce compactness are somewhat nonspecific, yet together they promote substantial chain organization. Moreover, proteins are built on a backbone scaffold consisting of α-helices and strands of β-sheet, where the number of hydrogen bond donors and acceptors is exactly balanced. These repetitive secondary structural elements are the only two conformers that can be both completely hydrogen-bond satisfied and extended indefinitely without encountering a steric clash. Consequently, the number of fundamental folds is limited to no more than ~10,000 for a protein domain. Once excluding interactions are taken into account, the issue of "frustration" is largely eliminated and the Levinthal paradox is resolved. Putting the "bottom line" at the top: it is likely that hydrogen-bond satisfaction represents a largely under-appreciated parameter in protein folding models.
Collapse
Affiliation(s)
- George D. Rose
- T.C. Jenkins Department of BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| |
Collapse
|
24
|
Lakshmanan Mangalath D, Hassan Mohammed SA. Ligand Binding Domain of Estrogen Receptor Alpha Preserve a Conserved Structural Architecture Similar to Bacterial Taxis Receptors. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.681913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
It remains a mystery why estrogen hormone receptors (ERs), which are highly specific toward its endogenous hormones, are responsive to chemically distinct exogenous agents. Does it indicate that ERs are environmentally regulated? Here, we speculate that ERs would have some common structural features with prokaryotic taxis receptor responsive toward environmental signals. This study addresses the low specificity and high responsiveness of ERs toward chemically distinct exogenous substances, from an evolutionary point of view. Here, we compared the ligand binding domain (LBD) of ER alpha (α) with the LBDs of prokaryotic taxis receptors to check if LBDs share any structural similarity. Interestingly, a high degree of similarity in the domain structural fold architecture of ERα and bacterial taxis receptors was observed. The pharmacophore modeling focused on ligand molecules of both receptors suggest that these ligands share common pharmacophore features. The molecular docking studies suggest that the natural ligands of bacterial chemotaxis receptors exhibit strong interaction with human ER as well. Although phylogenetic analysis proved that these proteins are unrelated, they would have evolved independently, suggesting a possibility of convergent molecular evolution. Nevertheless, a remarkable sequence divergence was seen between these proteins even when they shared common domain structural folds and common ligand-based pharmacophore features, suggesting that the protein architecture remains conserved within the structure for a specific function irrespective of sequence identity.
Collapse
|
25
|
Frutiger A, Tanno A, Hwu S, Tiefenauer RF, Vörös J, Nakatsuka N. Nonspecific Binding-Fundamental Concepts and Consequences for Biosensing Applications. Chem Rev 2021; 121:8095-8160. [PMID: 34105942 DOI: 10.1021/acs.chemrev.1c00044] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Nature achieves differentiation of specific and nonspecific binding in molecular interactions through precise control of biomolecules in space and time. Artificial systems such as biosensors that rely on distinguishing specific molecular binding events in a sea of nonspecific interactions have struggled to overcome this issue. Despite the numerous technological advancements in biosensor technologies, nonspecific binding has remained a critical bottleneck due to the lack of a fundamental understanding of the phenomenon. To date, the identity, cause, and influence of nonspecific binding remain topics of debate within the scientific community. In this review, we discuss the evolution of the concept of nonspecific binding over the past five decades based upon the thermodynamic, intermolecular, and structural perspectives to provide classification frameworks for biomolecular interactions. Further, we introduce various theoretical models that predict the expected behavior of biosensors in physiologically relevant environments to calculate the theoretical detection limit and to optimize sensor performance. We conclude by discussing existing practical approaches to tackle the nonspecific binding challenge in vitro for biosensing platforms and how we can both address and harness nonspecific interactions for in vivo systems.
Collapse
Affiliation(s)
- Andreas Frutiger
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Alexander Tanno
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Stephanie Hwu
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Raphael F Tiefenauer
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - János Vörös
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| | - Nako Nakatsuka
- Laboratory of Biosensors and Bioelectronics, Institute for Biomedical Engineering, ETH Zürich, Zürich CH-8092, Switzerland
| |
Collapse
|
26
|
Gao C, Ma C, Wang H, Zhong H, Zang J, Zhong R, He F, Yang D. Intrinsic disorder in protein domains contributes to both organism complexity and clade-specific functions. Sci Rep 2021; 11:2985. [PMID: 33542394 PMCID: PMC7862400 DOI: 10.1038/s41598-021-82656-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/22/2021] [Indexed: 11/09/2022] Open
Abstract
Interestingly, some protein domains are intrinsically disordered (abbreviated as IDD), and the disorder degree of same domains may differ in different contexts. However, the evolutionary causes and biological significance of these phenomena are unclear. Here, we address these issues by genome-wide analyses of the evolutionary and functional features of IDDs in 1,870 species across the three superkingdoms. As the result, there is a significant positive correlation between the proportion of IDDs and organism complexity with some interesting exceptions. These phenomena may be due to the high disorder of clade-specific domains and the different disorder degrees of the domains shared in different clades. The functions of IDDs are clade-specific and the higher proportion of post-translational modification sites may contribute to their complex functions. Compared with metazoans, fungi have more IDDs with a consecutive disorder region but a low disorder ratio, which reflects their different functional requirements. As for disorder variation, it’s greater for domains among different proteins than those within the same proteins. Some clade-specific ‘no-variation’ or ‘high-variation’ domains are involved in clade-specific functions. In sum, intrinsic domain disorder is related to both the organism complexity and clade-specific functions. These results deepen the understanding of the evolution and function of IDDs.
Collapse
Affiliation(s)
- Chao Gao
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China
| | - Chong Ma
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China.,Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
| | - Huqiang Wang
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China
| | - Haolin Zhong
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China
| | - Jiayin Zang
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China
| | - Rugang Zhong
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, 100124, China
| | - Fuchu He
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China.
| | - Dong Yang
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, 38 Science Park Road, Changping District, Beijing, 102206, China.
| |
Collapse
|
27
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
28
|
Abstract
Background:
The basic building block of a body is protein which is a complex system
whose structure plays a key role in activation, catalysis, messaging and disease states. Therefore,
careful investigation of protein structure is necessary for the diagnosis of diseases and for the drug
designing. Protein structures are described at their different levels of complexity: primary (chain),
secondary (helical), tertiary (3D), and quaternary structure. Analyzing complex 3D structure of
protein is a difficult task but it can be analyzed as a network of interconnection between its
component, where amino acids are considered as nodes and interconnection between them are
edges.
Objective:
Many literature works have proven that the small world network concept provides
many new opportunities to investigate network of biological systems. The objective of this paper is
analyzing the protein structure using small world concept.
Methods:
Protein is analyzed using small world network concept, specifically where extreme
condition is having a degree distribution which follows power law. For the correct verification of
the proposed approach, dataset of the Oncogene protein structure is analyzed using Python
programming.
Results:
Protein structure is plotted as network of amino acids (Residue Interaction Graph (RIG))
using distance matrix of nodes with given threshold, then various centrality measures (i.e., degree
distribution, Degree-Betweenness correlation, and Betweenness-Closeness correlation) are
calculated for 1323 nodes and graphs are plotted.
Conclusion:
Ultimately, it is concluded that there exist hubs with higher centrality degree but less
in number, and they are expected to be robust toward harmful effects of mutations with new
functions.
Collapse
Affiliation(s)
- Neetu Kumari
- Department of Computer Science, Banaras Hindu University, Varanasi, India
| | - Anshul Verma
- Department of Computer Science, Banaras Hindu University, Varanasi, India
| |
Collapse
|
29
|
Abstract
An accurate estimation of the Protein Space size, in light of the factors that govern it, is a long-standing problem and of paramount importance in evolutionary biology, since it determines the nature of protein evolvability. A simple analysis will enable us to, firstly, reduce an unrealistic Protein Space size of ~ 10130 sequences, for a 100-residues polypeptide chain, to ~ 109 functional proteins and, secondly, estimate a robust average-mutation rate per amino acid (ξ ~ 1.23) and infer from it, in light of the protein marginal stability, that only a fraction of the sequence will be available at any one time for a functional protein to evolve. Although this result does not solve the Protein Space vastness problem frames it in a more rational one and illustrates the impact of the marginal stability on protein evolvability.
Collapse
|
30
|
Agamah FE, Mazandu GK, Hassan R, Bope CD, Thomford NE, Ghansah A, Chimusa ER. Computational/in silico methods in drug target and lead prediction. Brief Bioinform 2020; 21:1663-1675. [PMID: 31711157 PMCID: PMC7673338 DOI: 10.1093/bib/bbz103] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/17/2019] [Accepted: 07/18/2019] [Indexed: 01/10/2023] Open
Abstract
Drug-like compounds are most of the time denied approval and use owing to the unexpected clinical side effects and cross-reactivity observed during clinical trials. These unexpected outcomes resulting in significant increase in attrition rate centralizes on the selected drug targets. These targets may be disease candidate proteins or genes, biological pathways, disease-associated microRNAs, disease-related biomarkers, abnormal molecular phenotypes, crucial nodes of biological network or molecular functions. This is generally linked to several factors, including incomplete knowledge on the drug targets and unpredicted pharmacokinetic expressions upon target interaction or off-target effects. A method used to identify targets, especially for polygenic diseases, is essential and constitutes a major bottleneck in drug development with the fundamental stage being the identification and validation of drug targets of interest for further downstream processes. Thus, various computational methods have been developed to complement experimental approaches in drug discovery. Here, we present an overview of various computational methods and tools applied in predicting or validating drug targets and drug-like molecules. We provide an overview on their advantages and compare these methods to identify effective methods which likely lead to optimal results. We also explore major sources of drug failure considering the challenges and opportunities involved. This review might guide researchers on selecting the most efficient approach or technique during the computational drug discovery process.
Collapse
Affiliation(s)
- Francis E Agamah
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
| | - Gaston K Mazandu
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
- African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Radia Hassan
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
| | - Christian D Bope
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
- Faculty of Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Nicholas E Thomford
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
- School of Medical Sciences, University of Cape Coast, PMB, Cape Coast, Ghana
| | - Anita Ghansah
- Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, PO Box LG 581, Legon, Ghana
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town, Observatory 7925, South Africa
| |
Collapse
|
31
|
Czubat B, Minias A, Brzostek A, Żaczek A, Struś K, Zakrzewska-Czerwińska J, Dziadek J. Functional Disassociation Between the Protein Domains of MSMEG_4305 of Mycolicibacterium smegmatis ( Mycobacterium smegmatis) in vivo. Front Microbiol 2020; 11:2008. [PMID: 32973726 PMCID: PMC7466739 DOI: 10.3389/fmicb.2020.02008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 07/29/2020] [Indexed: 12/02/2022] Open
Abstract
MSMEG_4305 is a two-domain protein of Mycolicibacterium smegmatis (Mycobacterium smegmatis) (Mycolicibacterium smegmatis). The N-terminal domain of MSMEG_4305 encodes an RNase H type I. The C-terminal domain is a presumed CobC, predicted to be involved in the aerobic synthesis of vitamin B12. Both domains reach their maximum at distinct pH, approximately 8.5 and 4.5, respectively. The presence of the CobC domain influenced RNase activity in vitro in homolog Rv2228c. Here, we analyzed the role of MSMEG_4305 in vitamin B12 synthesis and the functional association between both domains in vivo in M. smegmatis. We used knock-out mutant of M. smegmatis, deficient in MSMEG_4305. Whole-cell lysates of the mutants strain contained a lower concentration of vitamin B12, as it determined with immunoenzimatic assay. We observed growth deficits, related to vitamin B12 production, on media containing sulfamethazine and propionate. Removal of the CobC domain of MSMEG_4305 in ΔrnhA background hardly affected the growth rate of M. smegmatis in vivo. The strain carrying truncation showed no fitness deficit in the competitive assay and it did not show increased level of RNA/DNA hybrids in its genome. We show that homologs of MSMEG_4305 are present only in the Actinomycetales phylogenetic branch (according to the old classification system). The domains of MSMEG_4305 homologs accumulate mutations at a different rate, while the linker region is highly variable. We conclude that MSMEG_4305 is a multidomain protein that most probably was fixed in the phylogenetic tree of life due to genetic drift.
Collapse
Affiliation(s)
- Bożena Czubat
- Department of Experimental and Clinical Pharmacology, University of Rzeszów, Rzeszów, Poland.,Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Alina Minias
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Anna Brzostek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| | - Anna Żaczek
- Institute of Medical Sciences, Medical College of Rzeszów University, Rzeszów, Poland
| | - Katarzyna Struś
- Department of Bioenergetics, Food Analysis and Microbiology, Institute of Food Technology and Nutrition, University of Rzeszów, Rzeszów, Poland
| | | | - Jarosław Dziadek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, łLódź, Poland
| |
Collapse
|
32
|
Prabha A, Balaji PV. Characterization of left-handed beta helix-domains, and identification and functional annotation of proteins containing such domains. Proteins 2020; 89:6-20. [PMID: 32748987 DOI: 10.1002/prot.25990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 05/12/2020] [Accepted: 07/26/2020] [Indexed: 11/12/2022]
Abstract
Only about 0.3% of the entries in UniProt database have manually curated annotation. Annotation at the molecular level often relies on low-throughput one-protein-at-a-time approach. Computational methods bridge this gap by assigning function based on sequence and/or fold similarity. Left-handed beta helix (LbH) consists of three repeating six-stranded beta-strands forming an 18-mer turn of the helix. Analysis of LbH-domains showed that variations are found in the number of residues in a beta-strand (5-7, 6 being the most common), number of turns (4-10) of the helix, insertions of one or more loops of variable length (0-36 residues), and the location of loop insertion. An 18-mer HMM profile was created which identifies LbH-domain containing proteins using sequence as the only input; the number of false positives is zero when proteins tested were those with known 3D structures. 136 474 entries of TrEMBL database were found to contain LbH-domain. Rules developed by analyzing LbH-domain containing acyltransferases, gamma-class carbonic anhydrases, and nucleotidyltransferases have led to the annotation of 17 389 TrEMBL entries which currently have no functional tag.
Collapse
Affiliation(s)
- Anu Prabha
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Petety V Balaji
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
33
|
Chu XY, Zhang HY. Cofactors as Molecular Fossils To Trace the Origin and Evolution of Proteins. Chembiochem 2020; 21:3161-3168. [PMID: 32515532 DOI: 10.1002/cbic.202000027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 06/03/2020] [Indexed: 12/16/2022]
Abstract
Due to their early origin and extreme conservation, cofactors are valuable molecular fossils for tracing the origin and evolution of proteins. First, as the order of protein folds binding with cofactors roughly coincides with protein-fold chronology, cofactors are considered to have facilitated the origin of primitive proteins by selecting them from pools of random amino acid sequences. Second, in the subsequent evolution of proteins, cofactors still played an important role. More interestingly, as metallic cofactors evolved with geochemical variations, some geochemical events left imprints in the chronology of protein architecture; this provides further evidence supporting the coevolution of biochemistry and geochemistry. In this paper, we attempt to review the molecular fossils used in tracing the origin and evolution of proteins, with a special focus on cofactors.
Collapse
Affiliation(s)
- Xin-Yi Chu
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
34
|
Abstract
Darwin's theory of evolution emphasized that positive selection of functional proficiency provides the fitness that ultimately determines the structure of life, a view that has dominated biochemical thinking of enzymes as perfectly optimized for their specific functions. The 20th-century modern synthesis, structural biology, and the central dogma explained the machinery of evolution, and nearly neutral theory explained how selection competes with random fixation dynamics that produce molecular clocks essential e.g. for dating evolutionary histories. However, quantitative proteomics revealed that selection pressures not relating to optimal function play much larger roles than previously thought, acting perhaps most importantly via protein expression levels. This paper first summarizes recent progress in the 21st century toward recovering this universal selection pressure. Then, the paper argues that proteome cost minimization is the dominant, underlying 'non-function' selection pressure controlling most of the evolution of already functionally adapted living systems. A theory of proteome cost minimization is described and argued to have consequences for understanding evolutionary trade-offs, aging, cancer, and neurodegenerative protein-misfolding diseases.
Collapse
|
35
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2020; 21:566-583. [PMID: 30776072 DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 01/03/2025] Open
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein-protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
36
|
Narunsky A, Kessel A, Solan R, Alva V, Kolodny R, Ben-Tal N. On the evolution of protein-adenine binding. Proc Natl Acad Sci U S A 2020; 117:4701-4709. [PMID: 32079721 PMCID: PMC7060716 DOI: 10.1073/pnas.1911349117] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Proteins' interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein-adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein-adenine interactions in the Watson-Crick edge of adenine and shows that all of adenine's edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments ("themes") that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.
Collapse
Affiliation(s)
- Aya Narunsky
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel
| | - Amit Kessel
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel
| | - Ron Solan
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, 3498838 Haifa, Israel
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Ramat Aviv, Israel;
| |
Collapse
|
37
|
Lespinats S, De Clerck O, Colange B, Gorelova V, Grando D, Maréchal E, Van Der Straeten D, Rébeillé F, Bastien O. Phylogeny and Sequence Space: A Combined Approach to Analyze the Evolutionary Trajectories of Homologous Proteins. The Case Study of Aminodeoxychorismate Synthase. Acta Biotheor 2020; 68:139-156. [PMID: 31312977 DOI: 10.1007/s10441-019-09352-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 07/10/2019] [Indexed: 11/27/2022]
Abstract
During the course of evolution, variations of a protein sequence is an ongoing phenomenon however limited by the need to maintain its structural and functional integrity. Deciphering the evolutionary path of a protein is thus of fundamental interest. With the development of new methods to visualize high dimension spaces and the improvement of phylogenetic analysis tools, it is possible to study the evolutionary trajectories of proteins in the sequence space. Using the data-driven high-dimensional scaling method, we show that it is possible to predict and represent potential evolutionary trajectories by representing phylogenetic trees into a 3D projection of the sequence space. With the case of the aminodeoxychorismate synthase, an enzyme involved in folate synthesis, we show that this representation raises interesting questions about the complexity of the evolution of a given biological function, in particular concerning its capacity to explore the sequence space.
Collapse
Affiliation(s)
| | - Olivier De Clerck
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000, Ghent, Belgium
| | - Benoît Colange
- Univ. Grenoble Alpes, INES, 73375, Le Bourget du Lac, France
| | - Vera Gorelova
- Department of Biology, Laboratory of Functional Plant Biology, Ghent University, K.L Ledeganckstraat 35, 9000, Ghent, Belgium
- Department of Botany and Plant Biology, Laboratory of Plant Biochemistry and Physiology, University of Geneva, Quai E. Ansermet 30, 1211, Geneva, Switzerland
| | - Delphine Grando
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Eric Maréchal
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Dominique Van Der Straeten
- Department of Biology, Laboratory of Functional Plant Biology, Ghent University, K.L Ledeganckstraat 35, 9000, Ghent, Belgium
| | - Fabrice Rébeillé
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Olivier Bastien
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France.
- Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, CEA Grenoble, UMR 5168, CNRS-CEA-INRA-Université J. Fourier, 17 rue des Martyrs, 38054, Grenoble Cedex 09, France.
| |
Collapse
|
38
|
Li X, Li J, Zhang B, Gu Y, Li Q, Gu G, Xiong J, Li Y, Yang X, Qian Z. Comparative peptidome profiling reveals critical roles for peptides in the pathology of pancreatic cancer. Int J Biochem Cell Biol 2020; 120:105687. [PMID: 31927104 DOI: 10.1016/j.biocel.2020.105687] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 12/05/2019] [Accepted: 01/08/2020] [Indexed: 12/17/2022]
Abstract
BACKGROUNDS/AIMS Pancreatic cancer is a digestive system tumour disease with a notably poor prognosis and a 5-year survival rate of less than 10 %. In recent years, peptide drugs have shown great clinical value in antitumour applications. We aim to identify differentially expressed peptides by using peptidomics techniques to explore the mechanisms involved in the development and pathology of pancreatic cancer. METHODS We performed peptidomic analysis of pancreatic cancer and paired paracancerous tissues by using ITRAQ labelling technology and conducted in-depth bioinformatics analysis and functional studies on differentially expressed peptides. RESULTS A total of 2,881 peptides were identified, of which 133 were differentially expressed (116 were upregulated and 17 were downregulated). By using GO analysis, the differentially expressed peptides were found to be closely related to the tumour microenvironment and extracellular matrix. KEGG enrichment analysis revealed that precursor proteins were closely related to the T2DM and RAS signalling pathways. The endogenous peptide P1DG can significantly inhibit the proliferation, migration and invasion of pancreatic cancer cells. CONCLUSION P1DG and its precursor GAPDH may be closely related to the proliferation, migration and invasion of pancreatic cancer. Peptidomics can aid in understanding the pathogenesis of pancreatic cancer more comprehensively.
Collapse
Affiliation(s)
- Xingxing Li
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jingyun Li
- Nanjing Maternal and Child Health Medical Institute, Women's Hospital of Nanjing Medical University (Nanjing Maternity and Child Health Care Hospital), 123rd Tianfei Street, Mochou Road, Nanjing, 210004, China
| | - Bin Zhang
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yuqing Gu
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Qian Li
- Nanjing Maternal and Child Health Medical Institute, Women's Hospital of Nanjing Medical University (Nanjing Maternity and Child Health Care Hospital), 123rd Tianfei Street, Mochou Road, Nanjing, 210004, China
| | - Guangliang Gu
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jiageng Xiong
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yanan Li
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xiaojun Yang
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China.
| | - Zhuyin Qian
- Pancreas Center, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, China.
| |
Collapse
|
39
|
Hernandez-Guerrero R, Galán-Vásquez E, Pérez-Rueda E. The protein architecture in Bacteria and Archaea identifies a set of promiscuous and ancient domains. PLoS One 2019; 14:e0226604. [PMID: 31856202 PMCID: PMC6922389 DOI: 10.1371/journal.pone.0226604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 11/29/2019] [Indexed: 11/19/2022] Open
Abstract
In this work, we describe a systematic comparative genomic analysis of promiscuous domains in genomes of Bacteria and Archaea. A quantitative measure of domain promiscuity, the weighted domain architecture score (WDAS), was used and applied to 1317 domains in 1320 genomes of Bacteria and Archaea. A functional analysis associated with the WDAS per genome showed that 18 of 50 functional categories were identified as significantly enriched in the promiscuous domains; in particular, small-molecule binding domains, transferases domains, DNA binding domains (transcription factors), and signal transduction domains were identified as promiscuous. In contrast, non-promiscuous domains were identified as associated with 6 of 50 functional categories, and the category Function unknown was enriched. In addition, the WDASs of 52 domains correlated with genome size, i.e., WDAS values decreased as the genome size increased, suggesting that the number of combinations at larger domains increases, including domains in the superfamilies Winged helix-turn-helix and P-loop-containing nucleoside triphosphate hydrolases. Finally, based on classification of the domains according to their ancestry, we determined that the set of 52 promiscuous domains are also ancient and abundant among all the genomes, in contrast to the non-promiscuous domains. In summary, we consider that the association between these two classes of protein domains (promiscuous and non-promiscuous) provides bacterial and archaeal cells with the ability to respond to diverse environmental challenges.
Collapse
Affiliation(s)
- Rafael Hernandez-Guerrero
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
| | - Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Ciudad Universitaria, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Ernesto Pérez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- * E-mail:
| |
Collapse
|
40
|
Kaitoh K, Nakatsu A, Mori S, Kagechika H, Hashimoto Y, Fujii S. Design, Synthesis and Biological Evaluation of Novel Nonsteroidal Progesterone Receptor Antagonists Based on Phenylamino-1,3,5-triazine Scaffold. Chem Pharm Bull (Tokyo) 2019; 67:566-575. [PMID: 31155562 DOI: 10.1248/cpb.c19-00094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We report here the development of phenylamino-1,3,5-triazine derivatives as novel nonsteroidal progesterone receptor (PR) antagonists. PR plays key roles in various physiological systems, including the female reproductive system, and PR antagonists are promising candidates for clinical treatment of multiple diseases. By using the phenylamino-1,3,5-triazine scaffold as a template structure, we designed and synthesized a series of 4-cyanophenylamino-1,3,5-triazine derivatives. The synthesized compounds exhibited PR antagonistic activity, and among them, compound 12n was the most potent (IC50 = 0.30 µM); it also showed significant binding affinity to the PR ligand-binding domain. Docking simulation supported the design rationale of the compounds. Our results suggest that the phenylamino-1,3,5-triazine scaffold is a versatile template for development of nonsteroidal PR antagonists and that the developed compounds are promising lead compounds for further structural development of nonsteroidal PR antagonists.
Collapse
Affiliation(s)
- Kazuma Kaitoh
- Institute for Quantitative Biosciences, The University of Tokyo
| | - Aki Nakatsu
- Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University
| | - Shuichi Mori
- Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University
| | - Hiroyuki Kagechika
- Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University
| | | | - Shinya Fujii
- Institute for Quantitative Biosciences, The University of Tokyo.,Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University
| |
Collapse
|
41
|
Heller D, Szklarczyk D, Mering CV. Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies. BMC Bioinformatics 2019; 20:228. [PMID: 31060495 PMCID: PMC6501302 DOI: 10.1186/s12859-019-2828-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 04/17/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .
Collapse
Affiliation(s)
- Davide Heller
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| |
Collapse
|
42
|
Kovács IA, Luck K, Spirohn K, Wang Y, Pollis C, Schlabach S, Bian W, Kim DK, Kishore N, Hao T, Calderwood MA, Vidal M, Barabási AL. Network-based prediction of protein interactions. Nat Commun 2019; 10:1240. [PMID: 30886144 PMCID: PMC6423278 DOI: 10.1038/s41467-019-09177-y] [Citation(s) in RCA: 187] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 02/22/2019] [Indexed: 12/15/2022] Open
Abstract
Despite exceptional experimental efforts to map out the human interactome, the continued data incompleteness limits our ability to understand the molecular roots of human disease. Computational tools offer a promising alternative, helping identify biologically significant, yet unmapped protein-protein interactions (PPIs). While link prediction methods connect proteins on the basis of biological or network-based similarity, interacting proteins are not necessarily similar and similar proteins do not necessarily interact. Here, we offer structural and evolutionary evidence that proteins interact not if they are similar to each other, but if one of them is similar to the other's partners. This approach, that mathematically relies on network paths of length three (L3), significantly outperforms all existing link prediction methods. Given its high accuracy, we show that L3 can offer mechanistic insights into disease mechanisms and can complement future experimental efforts to complete the human interactome.
Collapse
Affiliation(s)
- István A Kovács
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA.
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA.
- Wigner Research Centre for Physics, Institute for Solid State Physics and Optics, H-1525, Budapest, P.O.Box 49, Hungary.
| | - Katja Luck
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Yang Wang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Carl Pollis
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Sadie Schlabach
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Wenting Bian
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Dae-Kyum Kim
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Donnelly Centre, Toronto, Ontario, Canada, Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Nishka Kishore
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Donnelly Centre, Toronto, Ontario, Canada, Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Albert-László Barabási
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, 02115, USA.
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02115, USA.
- Division of Network Medicine and Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Network and Data Science, Central European University, Budapest, H-1051, Hungary.
| |
Collapse
|
43
|
Yu Y, Wu M, Petropoulos E, Zhang J, Nie J, Liao Y, Li Z, Lin X, Feng Y. Responses of paddy soil bacterial community assembly to different long-term fertilizations in southeast China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 656:625-633. [PMID: 30529966 DOI: 10.1016/j.scitotenv.2018.11.359] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Revised: 11/23/2018] [Accepted: 11/24/2018] [Indexed: 06/09/2023]
Abstract
Recent works have shown that long-term fertilization has a critical influence on soil microbial communities; however, the underlying ecological assemblage of microbial community as well as its linkage with soil fertility and crop yield are still poorly understood. In this study, using analysis of high-throughput sequencing of 16S rRNA gene amplicons, we investigate mean pairwise phylogenetic distance (MPD), nearest relative index (NRI), taxonomic compositions and network topological properties to evaluate the assembly of the soil microbial community developed in 30-year fertilized soils. The phylogenetic signal indicates that environmental filtering was a more important assembly process that structure the microbial community than the stochastic process. Increase of soil fertility indexes, such as cation exchange capacity (CEC), soil organic matter (SOM) and available P (AP), driven by balanced fertilizations and straw returning amendment, result in the decrease of environmental filtering on the bacterial community assembly. Network parameters show that the amendment of straw returning provides with more niches, which lead to more complex phylotype co-occurrence. Increase of crop yield under balanced fertilizations might due to the increase of soil microbial function traits, which is associated with decreasing influence of environmental filtering. The significantly increased bacterial genera, Candidatus Koribacter, Candidatus Solibacter, and Fimbriimonas, in straw returning treatments, might be the key species in the competition caused by long-term environmental filtering. These results are helpful for a unified understanding of the ecological processes for microbial communities in different fertilized agroecosystem and the development of sustainable agriculture.
Collapse
Affiliation(s)
- Yongjie Yu
- College of Applied Meteorology, Nanjing University of Information Science and Technology, Nanjing 210044, PR China; State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China; School of Civil Engineering and Geosciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Meng Wu
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China
| | - Evangelos Petropoulos
- School of Civil Engineering and Geosciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Jianwei Zhang
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China
| | - Jun Nie
- Soil and Fertilizer Institute of Hunan Province, Changsha 410125, PR China; Key Field Monitoring Experimental Station for Reddish Paddy Soil Eco-Environment in Wangcheng, Ministry of Agriculture of China, Changsha 410125, PR China
| | - Yulin Liao
- Soil and Fertilizer Institute of Hunan Province, Changsha 410125, PR China; Key Field Monitoring Experimental Station for Reddish Paddy Soil Eco-Environment in Wangcheng, Ministry of Agriculture of China, Changsha 410125, PR China
| | - Zhongpei Li
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China
| | - Xiangui Lin
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China.
| | - Youzhi Feng
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China.
| |
Collapse
|
44
|
Abstract
Genomes appear similar to natural language texts, and protein domains can be treated as analogs of words. To investigate the linguistic properties of genomes further, we calculated the complexity of the “protein languages” in all major branches of life and identified a nearly universal value of information gain associated with the transition from a random domain arrangement to the current protein domain architecture. An exploration of the evolutionary relationship of the protein languages identified the domain combinations that discriminate between the major branches of cellular life. We conclude that there exists a “quasi-universal grammar” of protein domains and that the nearly constant information gain we identified corresponds to the minimal complexity required to maintain a functional cell. From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.
Collapse
|
45
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
46
|
Corominas-Murtra B, Seoane LF, Solé R. Zipf's Law, unbounded complexity and open-ended evolution. J R Soc Interface 2018; 15:20180395. [PMID: 30958235 PMCID: PMC6303796 DOI: 10.1098/rsif.2018.0395] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 11/19/2018] [Indexed: 11/12/2022] Open
Abstract
A major problem for evolutionary theory is understanding the so-called open-ended nature of evolutionary change, from its definition to its origins. Open-ended evolution (OEE) refers to the unbounded increase in complexity that seems to characterize evolution on multiple scales. This property seems to be a characteristic feature of biological and technological evolution and is strongly tied to the generative potential associated with combinatorics, which allows the system to grow and expand their available state spaces. Interestingly, many complex systems presumably displaying OEE, from language to proteins, share a common statistical property: the presence of Zipf's Law. Given an inventory of basic items (such as words or protein domains) required to build more complex structures (sentences or proteins) Zipf's Law tells us that most of these elements are rare whereas a few of them are extremely common. Using algorithmic information theory, in this paper we provide a fundamental definition for open-endedness, which can be understood as postulates. Its statistical counterpart, based on standard Shannon information theory, has the structure of a variational problem which is shown to lead to Zipf's Law as the expected consequence of an evolutionary process displaying OEE. We further explore the problem of information conservation through an OEE process and we conclude that statistical information (standard Shannon information) is not conserved, resulting in the paradoxical situation in which the increase of information content has the effect of erasing itself. We prove that this paradox is solved if we consider non-statistical forms of information. This last result implies that standard information theory may not be a suitable theoretical framework to explore the persistence and increase of the information content in OEE systems.
Collapse
Affiliation(s)
| | - Luís F. Seoane
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- UPF-PRBB, ICREA-Complex Systems Lab, Dr Aiguader 88, 08003 Barcelona, Spain
- Institute Evolutionary Biology, UPF-CSIC, Pg Maritim Barceloneta 37, 08003 Barcelona, Spain
| | - Ricard Solé
- UPF-PRBB, ICREA-Complex Systems Lab, Dr Aiguader 88, 08003 Barcelona, Spain
- Institute Evolutionary Biology, UPF-CSIC, Pg Maritim Barceloneta 37, 08003 Barcelona, Spain
- Santa Fe Institute, 1399 Hyde Park Road, 87501 Santa Fe, NM, USA
| |
Collapse
|
47
|
Allosteric landscapes of eukaryotic cytoplasmic Hsp70s are shaped by evolutionary tuning of key interfaces. Proc Natl Acad Sci U S A 2018; 115:11970-11975. [PMID: 30397123 DOI: 10.1073/pnas.1811105115] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The 70-kDa heat shock proteins (Hsp70s) are molecular chaperones that perform a wide range of critical cellular functions. They assist in the folding of newly synthesized proteins, facilitate assembly of specific protein complexes, shepherd proteins across membranes, and prevent protein misfolding and aggregation. Hsp70s perform these functions by a conserved mechanism that relies on allosteric cycles of nucleotide-modulated binding and release of client proteins. Current models for Hsp70 allostery have come from extensive study of the bacterial Hsp70, DnaK. Extending our understanding to eukaryotic Hsp70s is extremely important not only in providing a likely common mechanistic framework but also because of their central roles in cellular physiology. In this study, we examined the allosteric behaviors of the eukaryotic cytoplasmic Hsp70s, HspA1 and Hsc70, and found significant differences from that of DnaK. We found that HspA1 and Hsc70 favor a state in which the nucleotide-binding domain (NBD) and substrate-binding domain (SBD) are intimately docked significantly more as compared to DnaK. Past work established that the NBD-SBD interface and the helical lid-β-SBD interface govern the allosteric landscape of DnaK. Here, we identified sites on these interfaces that differ between eukaryotic cytoplasmic Hsp70s and DnaK. Our mutational analysis has revealed key evolutionary variations that account for the population shifts between the docked and undocked conformations. These results underline the tunability of Hsp70 functions by modulation of allosteric interfaces through evolutionary diversification and also suggest sites where the binding of small-molecule modulators could influence Hsp70 function.
Collapse
|
48
|
Kulkarni P, Uversky VN. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome. Proteomics 2018; 18:e1800061. [DOI: 10.1002/pmic.201800061] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 09/07/2018] [Indexed: 12/27/2022]
Affiliation(s)
- Prakash Kulkarni
- Department of Medical Oncology and Therapeutics Research; City of Hope National Medical Center; Duarte CA 91010 USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine; Morsani College of Medicine; University of South Florida; Tampa FL 33612 USA
- Laboratory of New methods in Biology; Institute for Biological Instrumentation; Russian Academy of Sciences; Pushchino Moscow Region 142290 Russia
| |
Collapse
|
49
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
50
|
Franklin MW, Nepomnyachiy S, Feehan R, Ben-Tal N, Kolodny R, Slusky JSG. Efflux Pumps Represent Possible Evolutionary Convergence onto the β-Barrel Fold. Structure 2018; 26:1266-1274.e2. [PMID: 30057025 PMCID: PMC6125174 DOI: 10.1016/j.str.2018.06.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/17/2018] [Accepted: 06/20/2018] [Indexed: 11/22/2022]
Abstract
There are around 100 varieties of outer membrane proteins in each Gram-negative bacteria. All of these proteins have the same fold-an up-down β-barrel. It has been suggested that all membrane β-barrels excluding lysins are homologous. Here we suggest that β-barrels of efflux pumps have converged on this fold as well. By grouping structurally solved outer membrane β-barrels (OMBBs) by sequence we find that the membrane environment may have led to convergent evolution of the barrel fold. Specifically, the lack of sequence linkage to other barrels coupled with distinctive structural differences, such as differences in strand tilt and barrel radius, suggest that the outer membrane factor of efflux pumps evolutionarily converged on the barrel. Rather than being related to other OMBBs, sequence and structural similarity in the periplasmic region of the outer membrane factor of efflux pumps suggests an evolutionary link to the periplasmic subunit of the same pump complex.
Collapse
Affiliation(s)
| | - Sergey Nepomnyachiy
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel; Department of Computer Science, University of Haifa, Mount Carmel, Haifa 3498838, Israel
| | - Ryan Feehan
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045, USA
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa 3498838, Israel
| | - Joanna S G Slusky
- Center for Computational Biology, University of Kansas, Lawrence, KS 66045, USA; Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045, USA.
| |
Collapse
|