1
|
Zhang C, Freddolino L. A large-scale assessment of sequence database search tools for homology-based protein function prediction. Brief Bioinform 2024; 25:bbae349. [PMID: 39038936 PMCID: PMC11262835 DOI: 10.1093/bib/bbae349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/03/2024] [Accepted: 07/05/2024] [Indexed: 07/24/2024] Open
Abstract
Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, Department of Biological Chemistry, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, United States
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, Department of Biological Chemistry, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, United States
| |
Collapse
|
2
|
Zhang H, Li Y, Ling J, Zhao J, Li Y, Mao Z, Cheng X, Xie B. NRPS-like ATRR in Plant-Parasitic Nematodes Involved in Glycine Betaine Metabolism to Promote Parasitism. Int J Mol Sci 2024; 25:4275. [PMID: 38673861 PMCID: PMC11050029 DOI: 10.3390/ijms25084275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/01/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024] Open
Abstract
Plant-parasitic nematodes (PPNs) are among the most serious phytopathogens and cause widespread and serious damage in major crops. In this study, using a genome mining method, we identified nonribosomal peptide synthetase (NRPS)-like enzymes in genomes of plant-parasitic nematodes, which are conserved with two consecutive reducing domains at the N-terminus (A-T-R1-R2) and homologous to fungal NRPS-like ATRR. We experimentally investigated the roles of the NRPS-like enzyme (MiATRR) in nematode (Meloidogyne incognita) parasitism. Heterologous expression of Miatrr in Saccharomyces cerevisiae can overcome the growth inhibition caused by high concentrations of glycine betaine. RT-qPCR detection shows that Miatrr is significantly upregulated at the early parasitic life stage (J2s in plants) of M. incognita. Host-derived Miatrr RNA interference (RNAi) in Arabidopsis thaliana can significantly decrease the number of galls and egg masses of M. incognita, as well as retard development and reduce the body size of the nematode. Although exogenous glycine betaine and choline have no obvious impact on the survival of free-living M. incognita J2s (pre-parasitic J2s), they impact the performance of the nematode in planta, especially in Miatrr-RNAi plants. Following application of exogenous glycine betaine and choline in the rhizosphere soil of A. thaliana, the numbers of galls and egg masses were obviously reduced by glycine betaine but increased by choline. Based on the knowledge about the function of fungal NRPS-like ATRR and the roles of glycine betaine in host plants and nematodes, we suggest that MiATRR is involved in nematode-plant interaction by acting as a glycine betaine reductase, converting glycine betaine to choline. This may be a universal strategy in plant-parasitic nematodes utilizing NRPS-like ATRR to promote their parasitism on host plants.
Collapse
Affiliation(s)
- Hongxia Zhang
- College of Horticulture, Hunan Agricultural University, Changsha 410128, China
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yanlin Li
- College of Horticulture, Hunan Agricultural University, Changsha 410128, China
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Jian Ling
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Jianlong Zhao
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yan Li
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Zhenchuan Mao
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xinyue Cheng
- College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Bingyan Xie
- State Key Laboratory of Vegetable Biobreeding, Institute of Vegetables and Flower, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
3
|
Kearney SK, Berger A, Baker E. Aon: a service to augment Alliance Genome Resource data with additional species. BMC Res Notes 2023; 16:297. [PMID: 37891644 PMCID: PMC10604687 DOI: 10.1186/s13104-023-06577-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 10/16/2023] [Indexed: 10/29/2023] Open
Abstract
OBJECTIVE Cross-species comparative genomics requires access to accurate homology data across the entire range of annotated genes. The Alliance of Genome Resources (AGR) provides an open-source and comprehensive database of homology data calculated using a wide array of algorithms at differing stringencies to elucidate orthologous relationships. However, the current AGR application program interface (API) is limited to five homology endpoints for nine species. While AGR provides a robust resource for several canonical species, its utility can be greatly enhanced by increased filtering and data processing options and incorporating additional species. RESULTS Here, we describe a novel API tool, AON, that expands access to the AGR orthology resource by creating a data structure that supports 50 additional endpoints. More importantly, it provides users with a framework for adding bespoke endpoints, custom species, and additional orthology data. We demonstrate AON's functionality by incorporating the service into the GeneWeaver ecosystem for supporting cross-species data analysis.
Collapse
Affiliation(s)
- Sophie K Kearney
- Department of Computer Science, Baylor University, One Bear Place Box 97356, Waco, 76798, USA
| | | | - Erich Baker
- Department of Computer Science, Baylor University, One Bear Place Box 97356, Waco, 76798, USA.
| |
Collapse
|
4
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
5
|
Garcia-Moreno FM, Gutiérrez-Naranjo MA. ALLERDET: A novel web app for prediction of protein allergenicity. J Biomed Inform 2022; 135:104217. [DOI: 10.1016/j.jbi.2022.104217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2022] [Accepted: 09/30/2022] [Indexed: 10/31/2022]
|
6
|
Peters LM, Howard J, Leeb T, Mevissen M, Graf R, Reding Graf T. Identification of regenerating island-derived protein 3E in dogs. Front Vet Sci 2022; 9:1010809. [PMID: 36387376 PMCID: PMC9650133 DOI: 10.3389/fvets.2022.1010809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 10/12/2022] [Indexed: 11/29/2022] Open
Abstract
Regenerating islet-derived protein (REG) 1A (aka pancreatic stone protein) and REG3A (aka pancreatitis-associated protein) are upregulated in humans with sepsis, pancreatitis, and gastrointestinal diseases, but little is known about this protein family in dogs. Our aim was to identify REG1 and REG3 family members in dogs. REG-family genes were computationally annotated in the canine genome and proteome, with verification of gene expression using publicly available RNA-seq data. The presence of the protein in canine pancreatic tissue and plasma was investigated with Western blot and immunohistochemistry, using anti-human REG1A and REG3A antibodies. Protein identity was confirmed with mass spectrometry. Two members of the REG3 subfamily were found in the canine genome, REG3E1 and REG3E2, both encoding for the same 176 AA protein, subsequently named REG3E. Anti-human REG3A antibodies demonstrated cross-reactivity with the canine REG3E protein in pancreas homogenates. In canine plasma, a protein band of approximately 17 kDa was apparent. Mass spectrometry confirmed this protein to be the product of the two annotated REG3E genes. Strong immunoreactivity to anti-human REG3A antibodies was found in sections of canine pancreas affected with acute pancreatitis, but it was weak in healthy pancreatic tissue. Recombinant canine REG3E protein underwent a selective trypsin digestion as described in other species. No evidence for the presence of a homolog of REG1A in dogs was found in any of the investigations. In conclusion, dogs express REG3E in the pancreas, whose role as biomarker merits further investigations. Homologs to human REG1A are not likely to exist in dogs.
Collapse
Affiliation(s)
- Laureen M. Peters
- Department of Clinical Veterinary Medicine, Clinical Diagnostic Laboratory, Vetsuisse Faculty, University of Bern, Bern, Switzerland
- *Correspondence: Laureen M. Peters
| | - Judith Howard
- Department of Clinical Veterinary Medicine, Clinical Diagnostic Laboratory, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Tosso Leeb
- Department of Clinical Research and Veterinary Public Health, Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Meike Mevissen
- Division of Veterinary Pharmacology and Toxicology, Department of Clinical Research and Veterinary Public Health, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Rolf Graf
- Department of Surgery and Transplantation, Pancreas Research Laboratory, University Hospital Zürich, University of Zürich, Zürich, Switzerland
| | - Theresia Reding Graf
- Department of Surgery and Transplantation, Pancreas Research Laboratory, University Hospital Zürich, University of Zürich, Zürich, Switzerland
| |
Collapse
|
7
|
Roles of Species-Specific Legumains in Pathogenicity of the Pinewood Nematode Bursaphelenchus xylophilus. Int J Mol Sci 2022; 23:ijms231810437. [PMID: 36142347 PMCID: PMC9499627 DOI: 10.3390/ijms231810437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/24/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Peptidases are very important to parasites, which have central roles in parasite biology and pathogenesis. In this study, by comparative genome analysis, genome-wide peptidase diversities among plant-parasitic nematodes are estimated. We find that genes encoding cysteine peptidases in family C13 (legumain) are significantly abundant in pine wood nematodes Bursaphelenchus genomes, compared to those in other plant-parasitic nematodes. By phylogenetic analysis, a clade of B. xylophilus-specific legumain is identified. RT-qPCR detection shows that these genes are highly expressed at early stage during the nematode infection process. Utilizing transgene technology, cDNAs of three species-specific legumain were introduced into the Arabidopsis γvpe mutant. Functional complementation assay shows that these B. xylophilus legumains can fully complement the activity of Arabidopsis γVPE to mediate plant cell death triggered by the fungal toxin FB1. Secretory activities of these legumains are experimentally validated. By comparative transcriptome analysis, genes involved in plant cell death mediated by legumains are identified, which enrich in GO terms related to ubiquitin protein transferase activity in category molecular function, and response to stimuli in category biological process. Our results suggest that B. xylophilu-specific legumains have potential as effectors to be involved in nematode-plant interaction and can be related to host cell death.
Collapse
|
8
|
Ho CT, Huang YW, Chen TR, Lo CH, Lo WC. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021; 11:1627. [PMID: 34827624 PMCID: PMC8615938 DOI: 10.3390/biom11111627] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 12/29/2022] Open
Abstract
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Collapse
Affiliation(s)
- Chia-Tzu Ho
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Yu-Wei Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
| | - Chia-Hua Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (C.-T.H.); (Y.-W.H.); (T.-R.C.); (C.-H.L.)
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
9
|
Marques AT, Tanoeiro L, Duarte A, Gonçalves L, Vítor JMB, Vale FF. Genomic Analysis of Prophages from Klebsiella pneumoniae Clinical Isolates. Microorganisms 2021; 9:2252. [PMID: 34835377 PMCID: PMC8617712 DOI: 10.3390/microorganisms9112252] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/15/2021] [Accepted: 10/25/2021] [Indexed: 12/15/2022] Open
Abstract
Klebsiella pneumoniae is an increasing threat to public health and represents one of the most concerning pathogens involved in life-threatening infections. The resistant and virulence determinants are coded by mobile genetic elements which can easily spread between bacteria populations and co-evolve with its genomic host. In this study, we present the full genomic sequences, insertion sites and phylogenetic analysis of 150 prophages found in 40 K. pneumoniae clinical isolates obtained from an outbreak in a Portuguese hospital. All strains harbored at least one prophage and we identified 104 intact prophages (69.3%). The prophage size ranges from 29.7 to 50.6 kbp, coding between 32 and 78 putative genes. The prophage GC content is 51.2%, lower than the average GC content of 57.1% in K. pneumoniae. Complete prophages were classified into three families in the order Caudolovirales: Myoviridae (59.6%), Siphoviridae (38.5%) and Podoviridae (1.9%). In addition, an alignment and phylogenetic analysis revealed nine distinct clusters. Evidence of recombination was detected within the genome of some prophages but, in most cases, proteins involved in viral structure, transcription, replication and regulation (lysogenic/lysis) were maintained. These results support the knowledge that prophages are diverse and widely disseminated in K. pneumoniae genomes, contributing to the evolution of this species and conferring additional phenotypes. Moreover, we identified K. pneumoniae prophages in a set of endolysin genes, which were found to code for proteins with lysozyme activity, cleaving the β-1,4 linkages between N-acetylmuramic acid and N-acetyl-D-glucosamine residues in the peptidoglycan network and thus representing genes with the potential for lysin phage therapy.
Collapse
Affiliation(s)
- Andreia T. Marques
- Pathogen Genome Bioinformatics and Computational Biology, Research Institute for Medicines (iMed-ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal; (L.T.); (J.M.B.V.)
| | - Luís Tanoeiro
- Pathogen Genome Bioinformatics and Computational Biology, Research Institute for Medicines (iMed-ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal; (L.T.); (J.M.B.V.)
| | - Aida Duarte
- Faculty of Pharmacy, Universidade de Lisboa, Av. Gama Pinto, 1649-003 Lisboa, Portugal;
- Centro de Investigação Interdisciplinar Egas Moniz, Instituto Universitário Egas Moniz, 2829-511 Monte da Caparica, Portugal
| | - Luisa Gonçalves
- Clinical Pathology Unit, Hospital SAMS, Cidade de Gabela, 1849-017 Lisboa, Portugal;
| | - Jorge M. B. Vítor
- Pathogen Genome Bioinformatics and Computational Biology, Research Institute for Medicines (iMed-ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal; (L.T.); (J.M.B.V.)
| | - Filipa F. Vale
- Pathogen Genome Bioinformatics and Computational Biology, Research Institute for Medicines (iMed-ULisboa), Faculty of Pharmacy, Universidade de Lisboa, 1649-003 Lisboa, Portugal; (L.T.); (J.M.B.V.)
| |
Collapse
|
10
|
Hsin KT, Yang TJ, Lee YH, Cheng YS. Phylogenetic and Structural Analysis of NIN-Like Proteins With a Type I/II PB1 Domain That Regulates Oligomerization for Nitrate Response. FRONTIERS IN PLANT SCIENCE 2021; 12:672035. [PMID: 34135927 PMCID: PMC8200828 DOI: 10.3389/fpls.2021.672035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 05/05/2021] [Indexed: 06/12/2023]
Abstract
Absorption of macronutrients such as nitrogen is a critical process for land plants. There is little information available on the correlation between the root evolution of land plants and the protein regulation of nitrogen absorption and responses. NIN-like protein (NLP) transcription factors contain a Phox and Bem1 (PB1) domain, which may regulate nitrate-response genes and seem to be involved in the adaptation to growing on land in terms of plant root development. In this report, we reveal the NLP phylogeny in land plants and the origin of NLP genes that may be involved in the nitrate-signaling pathway. Our NLP phylogeny showed that duplication of NLP genes occurred before divergence of chlorophyte and land plants. Duplicated NLP genes may lost in most chlorophyte lineages. The NLP genes of bryophytes were initially monophyletic, but this was followed by divergence of lycophyte NLP genes and then angiosperm NLP genes. Among those identified NLP genes, PB1, a protein-protein interaction domain was identified across our phylogeny. To understand how protein-protein interaction mediate via PB1 domain, we examined the PB1 domain of Arabidopsis thaliana NLP7 (AtNLP7) in terms of its molecular oligomerization and function as representative. Based on the structure of the PB1 domain, determined using small-angle x-ray scattering (SAXS) and site-directed mutagenesis, we found that the NLP7 PB1 protein forms oligomers and that several key residues (K867 and D909/D911/E913/D922 in the OPCA motif) play a pivotal role in the oligomerization of NLP7 proteins. The fact that these residues are all conserved across land plant lineages means that this oligomerization may have evolved after the common ancestor of extant land plants colonized the land. It would then have rapidly become established across land-plant lineages in order to mediate protein-protein interactions in the nitrate-signaling pathway.
Collapse
Affiliation(s)
- Kuan-Ting Hsin
- Department of Life Science, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Tzu-Jing Yang
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
- Institute of Biochemical Sciences, College of Life Science, National Taiwan University, Taipei, Taiwan
- Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Yu-Hsuan Lee
- Department of Life Science, College of Life Science, National Taiwan University, Taipei, Taiwan
| | - Yi-Sheng Cheng
- Department of Life Science, College of Life Science, National Taiwan University, Taipei, Taiwan
- Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, Taiwan
- Genome and Systems Biology Degree Program, College of Life Science, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
11
|
Liu C, Du MX, Abuduaini R, Yu HY, Li DH, Wang YJ, Zhou N, Jiang MZ, Niu PX, Han SS, Chen HH, Shi WY, Wu L, Xin YH, Ma J, Zhou Y, Jiang CY, Liu HW, Liu SJ. Enlightening the taxonomy darkness of human gut microbiomes with a cultured biobank. MICROBIOME 2021; 9:119. [PMID: 34020714 PMCID: PMC8140505 DOI: 10.1186/s40168-021-01064-3] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/30/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND In gut microbiome studies, the cultured gut microbial resource plays essential roles, such as helping to unravel gut microbial functions and host-microbe interactions. Although several major studies have been performed to elucidate the cultured human gut microbiota, up to 70% of the Unified Human Gastrointestinal Genome species have not been cultured to date. Large-scale gut microbial isolation and identification as well as availability to the public are imperative for gut microbial studies and further characterizing human gut microbial functions. RESULTS In this study, we constructed a human Gut Microbial Biobank (hGMB; homepage: hgmb.nmdc.cn ) through the cultivation of 10,558 isolates from 31 sample mixtures of 239 fresh fecal samples from healthy Chinese volunteers, and deposited 1170 strains representing 400 different species in culture collections of the International Depository Authority for long-term preservation and public access worldwide. Following the rules of the International Code of Nomenclature of Prokaryotes, 102 new species were characterized and denominated, while 28 new genera and 3 new families were proposed. hGMB represented over 80% of the common and dominant human gut microbial genera and species characterized from global human gut 16S rRNA gene amplicon data (n = 11,647) and cultured 24 "most-wanted" and "medium priority" taxa proposed by the Human Microbiome Project. We in total sequenced 115 genomes representing 102 novel taxa and 13 previously known species. Further in silico analysis revealed that the newly sequenced hGMB genomes represented 22 previously uncultured species in the Unified Human Gastrointestinal Genome (UHGG) and contributed 24 representatives of potentially "dark taxa" that had not been discovered by UHGG. The nonredundant gene catalogs generated from the hGMB genomes covered over 50% of the functionally known genes (KEGG orthologs) in the largest global human gut gene catalogs and approximately 10% of the "most wanted" functionally unknown proteins in the FUnkFams database. CONCLUSIONS A publicly accessible human Gut Microbial Biobank (hGMB) was established that contained 1170 strains and represents 400 human gut microbial species. hGMB expands the gut microbial resources and genomic repository by adding 102 novel species, 28 new genera, 3 new families, and 115 new genomes of human gut microbes. Video abstract.
Collapse
Affiliation(s)
- Chang Liu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China.
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China.
| | - Meng-Xuan Du
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
| | - Rexiding Abuduaini
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hai-Ying Yu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
| | - Dan-Hua Li
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Yu-Jing Wang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Nan Zhou
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Min-Zhi Jiang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
| | - Peng-Xia Niu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Shan-Shan Han
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
| | - Hong-He Chen
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Wen-Yu Shi
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Linhuan Wu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Yu-Hua Xin
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- China General Microorganism Culture Collection, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Juncai Ma
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Yuguang Zhou
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- China General Microorganism Culture Collection, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Cheng-Ying Jiang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hong-Wei Liu
- University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, China
| | - Shuang-Jiang Liu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, PR China.
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichenxi Road, Chaoyang District, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
12
|
Villegas-Morcillo A, Makrodimitris S, van Ham RCHJ, Gomez AM, Sanchez V, Reinders MJT. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function. Bioinformatics 2021; 37:162-170. [PMID: 32797179 PMCID: PMC8055213 DOI: 10.1093/bioinformatics/btaa701] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 07/10/2020] [Accepted: 08/12/2020] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. RESULTS We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. AVAILABILITY AND IMPLEMENTATION Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, 18071 Granada, Spain
| | - Stavros Makrodimitris
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands
- Keygene N.V., 6708PW Wageningen, The Netherlands
| | - Roeland C H J van Ham
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands
- Keygene N.V., 6708PW Wageningen, The Netherlands
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, 18071 Granada, Spain
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, 18071 Granada, Spain
| | - Marcel J T Reinders
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, 2333ZC Leiden, The Netherlands
| |
Collapse
|
13
|
Vancura A, Lanzós A, Bosch-Guiteras N, Esteban MT, Gutierrez AH, Haefliger S, Johnson R. Cancer LncRNA Census 2 (CLC2): an enhanced resource reveals clinical features of cancer lncRNAs. NAR Cancer 2021; 3:zcab013. [PMID: 34316704 PMCID: PMC8210278 DOI: 10.1093/narcan/zcab013] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 03/12/2021] [Accepted: 03/17/2021] [Indexed: 01/28/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play key roles in cancer and are at the vanguard of precision therapeutic development. These efforts depend on large and high-confidence collections of cancer lncRNAs. Here, we present the Cancer LncRNA Census 2 (CLC2). With 492 cancer lncRNAs, CLC2 is 4-fold greater in size than its predecessor, without compromising on strict criteria of confident functional/genetic roles and inclusion in the GENCODE annotation scheme. This increase was enabled by leveraging high-throughput transposon insertional mutagenesis screening data, yielding 92 novel cancer lncRNAs. CLC2 makes a valuable addition to existing collections: it is amongst the largest, contains numerous unique genes (not found in other databases) and carries functional labels (oncogene/tumour suppressor). Analysis of this dataset reveals that cancer lncRNAs are impacted by germline variants, somatic mutations and changes in expression consistent with inferred disease functions. Furthermore, we show how clinical/genomic features can be used to vet prospective gene sets from high-throughput sources. The combination of size and quality makes CLC2 a foundation for precision medicine, demonstrating cancer lncRNAs’ evolutionary and clinical significance.
Collapse
Affiliation(s)
- Adrienne Vancura
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Andrés Lanzós
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Núria Bosch-Guiteras
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Mònica Torres Esteban
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Alejandro H Gutierrez
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Simon Haefliger
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| |
Collapse
|
14
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
15
|
Médigue C, Calteau A, Cruveiller S, Gachet M, Gautreau G, Josso A, Lajus A, Langlois J, Pereira H, Planel R, Roche D, Rollin J, Rouy Z, Vallenet D. MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data. Brief Bioinform 2020; 20:1071-1084. [PMID: 28968784 PMCID: PMC6931091 DOI: 10.1093/bib/bbx113] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 07/17/2017] [Indexed: 12/11/2022] Open
Abstract
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources.
Collapse
|
16
|
Whole genome sequence analysis reveals the broad distribution of the RtxA type 1 secretion system and four novel putative type 1 secretion systems throughout the Legionella genus. PLoS One 2020; 15:e0223033. [PMID: 31935215 PMCID: PMC6959600 DOI: 10.1371/journal.pone.0223033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 12/17/2019] [Indexed: 01/18/2023] Open
Abstract
Type 1 secretion systems (T1SSs) are broadly distributed among bacteria and translocate effectors with diverse function across the bacterial cell membrane. Legionella pneumophila, the species most commonly associated with Legionellosis, encodes a T1SS at the lssXYZABD locus which is responsible for the secretion of the virulence factor RtxA. Many investigations have failed to detect lssD, the gene encoding the membrane fusion protein of the RtxA T1SS, in non-pneumophila Legionella, which has led to the assumption that this system is a virulence factor exclusively possessed by L. pneumophila. Here we discovered RtxA and its associated T1SS in a novel Legionella taurinensis strain, leading us to question whether this system may be more widespread than previously thought. Through a bioinformatic analysis of publicly available data, we classified and determined the distribution of four T1SSs including the RtxA T1SS and four novel T1SSs among diverse Legionella spp. The ABC transporter of the novel Legionella T1SS Legionella repeat protein secretion system shares structural similarity to those of diverse T1SS families, including the alkaline protease T1SS in Pseudomonas aeruginosa. The Legionella bacteriocin (1-3) secretion systems T1SSs are novel putative bacteriocin transporting T1SSs as their ABC transporters include C-39 peptidase domains in their N-terminal regions, with LB2SS and LB3SS likely constituting a nitrile hydratase leader peptide transport T1SSs. The LB1SS is more closely related to the colicin V T1SS in Escherichia coli. Of 45 Legionella spp. whole genomes examined, 19 (42%) were determined to possess lssB and lssD homologs. Of these 19, only 7 (37%) are known pathogens. There was no difference in the proportions of disease associated and non-disease associated species that possessed the RtxA T1SS (p = 0.4), contrary to the current consensus regarding the RtxA T1SS. These results draw into question the nature of RtxA and its T1SS as a singular virulence factor. Future studies should investigate mechanistic explanations for the association of RtxA with virulence.
Collapse
|
17
|
Liu C, Zhou N, Du MX, Sun YT, Wang K, Wang YJ, Li DH, Yu HY, Song Y, Bai BB, Xin Y, Wu L, Jiang CY, Feng J, Xiang H, Zhou Y, Ma J, Wang J, Liu HW, Liu SJ. The Mouse Gut Microbial Biobank expands the coverage of cultured bacteria. Nat Commun 2020; 11:79. [PMID: 31911589 PMCID: PMC6946648 DOI: 10.1038/s41467-019-13836-5] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Accepted: 11/25/2019] [Indexed: 02/07/2023] Open
Abstract
Mice are widely used as experimental models for gut microbiome (GM) studies, yet the majority of mouse GM members remain uncharacterized. Here, we report the construction of a mouse gut microbial biobank (mGMB) that contains 126 species, represented by 244 strains that have been deposited in the China General Microorganism Culture Collection. We sequence and phenotypically characterize 77 potential new species and propose their nomenclatures. The mGMB includes 22 and 17 species that are significantly enriched in ob/ob and wild-type C57BL/6J mouse cecal samples, respectively. The genomes of the 126 species in the mGMB cover 52% of the metagenomic nonredundant gene catalog (sequence identity ≥ 60%) and represent 93-95% of the KEGG-Orthology-annotated functions of the sampled mouse GMs. The microbial and genome data assembled in the mGMB enlarges the taxonomic characterization of mouse GMs and represents a useful resource for studies of host-microbe interactions and of GM functions associated with host health and diseases.
Collapse
Affiliation(s)
- Chang Liu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Nan Zhou
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Meng-Xuan Du
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Yu-Tong Sun
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Kai Wang
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Yu-Jing Wang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Dan-Hua Li
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Hai-Ying Yu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Yuqin Song
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Bing-Bing Bai
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Yuhua Xin
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Linhuan Wu
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Cheng-Ying Jiang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
| | - Jie Feng
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Hua Xiang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Yuguang Zhou
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Juncai Ma
- Microbial Resources and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Jun Wang
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China
| | - Hong-Wei Liu
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China.
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China.
| | - Shuang-Jiang Liu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China.
- Environmental Microbiology Research Center, Institute of Microbiology, Chinese Academy of Sciences, No. 1 Beichenxi Road, Chaoyang District, Beijing, 100101, P. R. China.
- University of Chinese Academy of Sciences, Beijing, 100049, P. R. China.
| |
Collapse
|
18
|
Konaté MM, Plata G, Park J, Usmanova DR, Wang H, Vitkup D. Molecular function limits divergent protein evolution on planetary timescales. eLife 2019; 8:e39705. [PMID: 31532392 PMCID: PMC6750897 DOI: 10.7554/elife.39705] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 08/07/2019] [Indexed: 01/25/2023] Open
Abstract
Functional conservation is known to constrain protein evolution. Nevertheless, the long-term divergence patterns of proteins maintaining the same molecular function and the possible limits of this divergence have not been explored in detail. We investigate these fundamental questions by characterizing the divergence between ancient protein orthologs with conserved molecular function. Our results demonstrate that the decline of sequence and structural similarities between such orthologs significantly slows down after ~1-2 billion years of independent evolution. As a result, the sequence and structural similarities between ancient orthologs have not substantially decreased for the past billion years. The effective divergence limit (>25% sequence identity) is not primarily due to protein sites universally conserved in all linages. Instead, less than four amino acid types are accepted, on average, per site across orthologous protein sequences. Our analysis also reveals different divergence patterns for protein sites with experimentally determined small and large fitness effects of mutations. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Mariam M Konaté
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Division of Cancer Treatment and Diagnosis, National Cancer InstituteBethesdaUnited States
| | - Germán Plata
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Jimin Park
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dinara R Usmanova
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
| | - Harris Wang
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Pathology and Cell BiologyColumbia UniversityNew YorkUnited States
| | - Dennis Vitkup
- Department of Systems BiologyColumbia UniversityNew YorkUnited States
- Department of Biomedical InformaticsColumbia UniversityNew YorkUnited States
| |
Collapse
|
19
|
Transcriptional and physiological responses to inorganic nutrition in a tropical Pacific strain of Alexandrium minutum: Implications for nutrient uptakes and assimilation. Gene 2019; 711:143950. [PMID: 31255736 DOI: 10.1016/j.gene.2019.143950] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 06/25/2019] [Accepted: 06/26/2019] [Indexed: 11/22/2022]
Abstract
The marine dinoflagellate Alexandrium minutum is known to produce saxitoxins that cause paralytic shellfish poisoning in human worldwide through consumption of the contaminated shellfish mollusks. Despite numerous studies on the growth physiology and saxitoxin production of this species, the knowledge on the molecular basis of nutrient uptakes in relation to toxin production in this species is limited. In this study, relative expressions of the high-affinity transporter genes of nitrate, ammonium, and phosphate (AmNrt2, AmAmt1 and AmPiPT1) and the assimilation genes, nitrate reductase (AmNas), glutamine synthase (AmGSIII) and carbamoyl phosphate synthase (AmCPSII) from A. minutum were studied in batch clonal culture condition with two nitrogen sources (nitrate: NO3- or ammonium: NH4+) under different N:P ratios (high-P: N:P of 14 and 16, and low-P: N:P of 155). The expression of AmAmt1 was suppressed in excess NH4+-grown condition but was not observed in AmNrt2 and AmNas. Expressions of AmAmt1, AmNrt2, AmNas, AmGSIII, AmCPSII, and AmPiPT1 were high in P-deficient condition, showing that A. minutum is likely to take up nutrients for growth under P-stress condition. Conversely, relative expression of AmCPSII was incongruent with cell growth, but was well correlated with toxin quota, suggesting that the gene might involve in arginine metabolism and related toxin production pathway. The expression of AmGSIII is found coincided with higher toxin production and is believed to involve in mechanism to detoxify the cells from excess ammonium stress. The gene regulation observed in this study has provided better insights into the ecophysiology of A. minutum in relation to its adaptive strategies in unfavorable environments.
Collapse
|
20
|
Marchant A, Cisneros AF, Dubé AK, Gagnon-Arsenault I, Ascencio D, Jain H, Aubé S, Eberlein C, Evans-Yamamoto D, Yachie N, Landry CR. The role of structural pleiotropy and regulatory evolution in the retention of heteromers of paralogs. eLife 2019; 8:46754. [PMID: 31454312 PMCID: PMC6711710 DOI: 10.7554/elife.46754] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 08/11/2019] [Indexed: 01/07/2023] Open
Abstract
Gene duplication is a driver of the evolution of new functions. The duplication of genes encoding homomeric proteins leads to the formation of homomers and heteromers of paralogs, creating new complexes after a single duplication event. The loss of these heteromers may be required for the two paralogs to evolve independent functions. Using yeast as a model, we find that heteromerization is frequent among duplicated homomers and correlates with functional similarity between paralogs. Using in silico evolution, we show that for homomers and heteromers sharing binding interfaces, mutations in one paralog can have structural pleiotropic effects on both interactions, resulting in highly correlated responses of the complexes to selection. Therefore, heteromerization could be preserved indirectly due to selection for the maintenance of homomers, thus slowing down functional divergence between paralogs. We suggest that paralogs can overcome the obstacle of structural pleiotropy by regulatory evolution at the transcriptional and post-translational levels.
Collapse
Affiliation(s)
- Axelle Marchant
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Angel F Cisneros
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada
| | - Alexandre K Dubé
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Diana Ascencio
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Honey Jain
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Department of Biological Sciences, Birla Institute of Technology and Sciences, Pilani, India
| | - Simon Aubé
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada
| | - Chris Eberlein
- PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Daniel Evans-Yamamoto
- Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Nozomu Yachie
- Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan.,Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
| | - Christian R Landry
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| |
Collapse
|
21
|
A blue light receptor that mediates RNA binding and translational regulation. Nat Chem Biol 2019; 15:1085-1092. [PMID: 31451761 PMCID: PMC6811359 DOI: 10.1038/s41589-019-0346-y] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 07/15/2019] [Indexed: 01/30/2023]
Abstract
Sensory photoreceptor proteins underpin light-dependent adaptations in nature and enable the optogenetic control of organismal behavior and physiology. We identified the bacterial light-oxygen-voltage (LOV) photoreceptor PAL that sequence-specifically binds short RNA stem loops with around 20 nM affinity in blue light and weaker than 1 μM in darkness. A crystal structure rationalizes the unusual receptor architecture of PAL with C-terminal LOV photosensor and N-terminal effector units. The light-activated PAL:RNA interaction can be harnessed to regulate gene expression at the RNA level as a function of light in both bacteria and mammalian cells. The present results elucidate a new signal-transduction paradigm in LOV receptors and conjoin RNA biology with optogenetic regulation, thereby paving the way towards hitherto inaccessible optoribogenetic modalities.
Collapse
|
22
|
Nair P, Mall M, Sharma P, Khan F, Nagegowda DA, Rout PK, Gupta MM, Pandey A, Shasany AK, Gupta AK, Shukla AK. Characterization of a class III peroxidase from Artemisia annua: relevance to artemisinin metabolism and beyond. PLANT MOLECULAR BIOLOGY 2019; 100:527-541. [PMID: 31093899 DOI: 10.1007/s11103-019-00879-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 05/04/2019] [Indexed: 05/25/2023]
Abstract
A class III peroxidase from Artemisia annua has been shown to indicate the possibility of cellular localization-based role diversity, which may have implications in artemisinin catabolism as well as lignification. Artemisia annua derives its importance from the antimalarial artemisinin. The -O-O- linkage in artemisinin makes peroxidases relevant to its metabolism. Earlier, we identified three peroxidase-coding genes from A. annua, whereby Aa547 showed higher expression in the low-artemisinin plant stage whereas Aa528 and Aa540 showed higher expression in the artemisinin-rich plant stage. Here we carried out tertiary structure homology modelling of the peroxidases for docking studies. Maximum binding affinity for artemisinin was shown by Aa547. Further, Aa547 showed greater binding affinity for post-artemisinin metabolite, deoxyartemisinin, as compared to pre-artemisinin metabolites (dihydroartemisinic hydroperoxide, artemisinic acid, dihydroartemisinic acid). It also showed significant binding affinity for the monolignol, coniferyl alcohol. Moreover, Aa547 expression was related inversely to artemisinin content and directly to total lignin content as indicated by its transient silencing and overexpression in A. annua. Artemisinin reduction assay also indicated inverse relationship between Aa547 expression and artemisinin content. Subcellular localization using GFP fusion suggested that Aa547 is peroxisomal. Nevertheless, dual localization (intracellular/extracellular) of Aa547 could not be ruled out due to its effect on both, artemisinin and lignin. Taken together, this indicates possibility of localization-based role diversity for Aa547, which may have implications in artemisinin catabolism as well as lignification in A. annua.
Collapse
Affiliation(s)
- Priya Nair
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Maneesha Mall
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Pooja Sharma
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Feroz Khan
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Dinesh A Nagegowda
- CSIR-Central Institute of Medicinal and Aromatic Plants, Research Centre, Bengaluru, Karnataka, 560065, India
| | - Prasant K Rout
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Madan M Gupta
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Alok Pandey
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Ajit K Shasany
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Anil K Gupta
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India
| | - Ashutosh K Shukla
- CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, U.P., 226015, India.
| |
Collapse
|
23
|
Kacsoh BZ, Barton S, Jiang Y, Zhou N, Mooney SD, Friedberg I, Radivojac P, Greene CS, Bosco G. New Drosophila Long-Term Memory Genes Revealed by Assessing Computational Function Prediction Methods. G3 (BETHESDA, MD.) 2019; 9:251-267. [PMID: 30463884 PMCID: PMC6325913 DOI: 10.1534/g3.118.200867] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 11/20/2018] [Indexed: 01/26/2023]
Abstract
A major bottleneck to our understanding of the genetic and molecular foundation of life lies in the ability to assign function to a gene and, subsequently, a protein. Traditional molecular and genetic experiments can provide the most reliable forms of identification, but are generally low-throughput, making such discovery and assignment a daunting task. The bottleneck has led to an increasing role for computational approaches. The Critical Assessment of Functional Annotation (CAFA) effort seeks to measure the performance of computational methods. In CAFA3, we performed selected screens, including an effort focused on long-term memory. We used homology and previous CAFA predictions to identify 29 key Drosophila genes, which we tested via a long-term memory screen. We identify 11 novel genes that are involved in long-term memory formation and show a high level of connectivity with previously identified learning and memory genes. Our study provides first higher-order behavioral assay and organism screen used for CAFA assessments and revealed previously uncharacterized roles of multiple genes as possible regulators of neuronal plasticity at the boundary of information acquisition and memory formation.
Collapse
Affiliation(s)
- Balint Z Kacsoh
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Stephen Barton
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Yuxiang Jiang
- Department of Computer Science, Indiana University, Bloomington, IN
| | - Naihui Zhou
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa 50011
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa 50011
| | - Predrag Radivojac
- College of Computer and Information Science, Northeastern University, Boston, MA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, 19104
| | - Giovanni Bosco
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| |
Collapse
|
24
|
Zhang Z, Wang J, Gong Y, Li Y. Contributions of substitutions and indels to the structural variations in ancient protein superfamilies. BMC Genomics 2018; 19:771. [PMID: 30355304 PMCID: PMC6201574 DOI: 10.1186/s12864-018-5178-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/16/2018] [Indexed: 11/10/2022] Open
Abstract
Background Quantitative evaluation of protein structural evolution is important for our understanding of protein biological functions and their evolutionary adaptation, and is useful in guiding protein engineering. However, compared to the models for sequence evolution, the quantitative models for protein structural evolution received less attention. Ancient protein superfamilies are often considered versatile, allowing genetic and functional diversifications during long-term evolution. In this study, we investigated the quantitative impacts of sequence variations on the structural evolution of homologues in 68 ancient protein superfamilies that exist widely in sequenced eukaryotic, bacterial and archaeal genomes. Results We found that the accumulated structural variations within ancient superfamilies could be explained largely by a bilinear model that simultaneously considers amino acid substitution and insertion/deletion (indel). Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those ancient superfamilies with high bilinear multiple correlation coefficients, the influence of each unit of substitution or indel on structural variations is almost constant within each superfamily, but varies greatly among different superfamilies. The influence of each unit indel on structural variations is always larger than that of each unit substitution within each superfamily, but the accumulated contributions of indels to structural variations are lower than those of substitutions in most superfamilies. The total contributions of sequence indels and substitutions (46% and 54%, respectively) to the structural variations that result from sequence variations are slightly different in ancient superfamilies. Conclusions Structural variations within ancient protein superfamilies accumulated under the significantly bilinear influence of amino acid substitutions and indels in sequences. Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those structural variations resulting from sequence variations, the total contribution of indels is slightly lower than that of amino acid substitutions. The regular clock exists not only in protein sequences, but also probably in protein structures. Electronic supplementary material The online version of this article (10.1186/s12864-018-5178-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Jinlan Wang
- Physical Examination Office of Shandong Province, Health and Family Planning Commission of Shandong Province, Jinan, 250014, China
| | - Ya Gong
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Yuezhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
25
|
Han X, Wei Q, Kihara D. Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D-SURFER2.0 and EM-SURFER. ACTA ACUST UNITED AC 2017; 60:3.14.1-3.14.15. [PMID: 29220075 DOI: 10.1002/cpbi.37] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
With the rapid growth in the number of solved protein structures stored in the Protein Data Bank (PDB) and the Electron Microscopy Data Bank (EMDB), it is essential to develop tools to perform real-time structure similarity searches against the entire structure database. Since conventional structure alignment methods need to sample different orientations of proteins in the three-dimensional space, they are time consuming and unsuitable for rapid, real-time database searches. To this end, we have developed 3D-SURFER and EM-SURFER, which utilize 3D Zernike descriptors (3DZD) to conduct high-throughput protein structure comparison, visualization, and analysis. Taking an atomic structure or an electron microscopy map of a protein or a protein complex as input, the 3DZD of a query protein is computed and compared with the 3DZD of all other proteins in PDB or EMDB. In addition, local geometrical characteristics of a query protein can be analyzed using VisGrid and LIGSITECSC in 3D-SURFER. This article describes how to use 3D-SURFER and EM-SURFER to carry out protein surface shape similarity searches, local geometric feature analysis, and interpretation of the search results. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Xusi Han
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Qing Wei
- Department of Computer Science, Purdue University, West Lafayette, Indiana
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana.,Department of Computer Science, Purdue University, West Lafayette, Indiana
| |
Collapse
|
26
|
Kikuchi A, Okuyama M, Kato K, Osaki S, Ma M, Kumagai Y, Matsunaga K, Klahan P, Tagami T, Yao M, Kimura A. A novel glycoside hydrolase family 97 enzyme: Bifunctional β- l -arabinopyranosidase/α-galactosidase from Bacteroides thetaiotaomicron. Biochimie 2017; 142:41-50. [DOI: 10.1016/j.biochi.2017.08.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 08/07/2017] [Indexed: 10/19/2022]
|
27
|
Monzon AM, Zea DJ, Marino-Buslje C, Parisi G. Homology modeling in a dynamical world. Protein Sci 2017; 26:2195-2206. [PMID: 28815769 DOI: 10.1002/pro.3274] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 08/09/2017] [Accepted: 08/09/2017] [Indexed: 12/31/2022]
Abstract
A key concept in template-based modeling (TBM) is the high correlation between sequence and structural divergence, with the practical consequence that homologous proteins that are similar at the sequence level will also be similar at the structural level. However, conformational diversity of the native state will reduce the correlation between structural and sequence divergence, because structural variation can appear without sequence diversity. In this work, we explore the impact that conformational diversity has on the relationship between structural and sequence divergence. We find that the extent of conformational diversity can be as high as the maximum structural divergence among families. Also, as expected, conformational diversity impairs the well-established correlation between sequence and structural divergence, which is nosier than previously suggested. However, we found that this noise can be resolved using a priori information coming from the structure-function relationship. We show that protein families with low conformational diversity show a well-correlated relationship between sequence and structural divergence, which is severely reduced in proteins with larger conformational diversity. This lack of correlation could impair TBM results in highly dynamical proteins. Finally, we also find that the presence of order/disorder can provide useful beforehand information for better TBM performance.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, B1876BXD, Bernal, Argentina
| | - Diego Javier Zea
- Structural Bioinformatics Unit, Fundación Instituto Leloir, CONICET, C1405BWE Ciudad Autónoma de Buenos Aires, Argentina
| | - Cristina Marino-Buslje
- Structural Bioinformatics Unit, Fundación Instituto Leloir, CONICET, C1405BWE Ciudad Autónoma de Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, B1876BXD, Bernal, Argentina
| |
Collapse
|
28
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
29
|
Myosin B of Plasmodium falciparum (PfMyoB): in silico prediction of its three-dimensional structure and its possible interaction with MTIP. Parasitol Res 2017; 116:1373-1382. [PMID: 28265752 DOI: 10.1007/s00436-017-5417-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 02/21/2017] [Indexed: 10/24/2022]
Abstract
The mobility and invasion strategy of Plasmodium falciparum is governed by a protein complex known as the glideosome, which contains an actin-myosin motor. It has been shown that myosin A of the parasite (PfMyoA) is the myosin of the glideosome, and the interaction of PfMyoA with myosin tail domain interacting protein (MTIP) determines its correct location and its ability to function in the complex. Because PfMyoA and myosin B of P. falciparum (PfMyoB) share high sequence identity, are both small proteins without a tail domain, belong to the class XIV myosins, and are expressed in late schizonts and merozoites, we suspect that these myosins may have similar or redundant functions. Therefore, this work examined the structural similarity between PfMyoA and PfMyoB and performed a molecular docking between PfMyoB and MTIP. Three-dimensional (3D) models obtained for PfMyoA and PfMyoB achieved high scores in the structural validation programs used, and their superimposition revealed high structural similarity, supporting the hypothesis of possible similar functions for these two proteins. The 3D interaction models obtained and energy values found suggested that interaction between PfMyoB and MTIP is possible. Given the apparent abundance of PfMyoA relative to PfMyoB in the parasite, we believe that the interaction between PfMyoB and MTIP would only be detectable in specific cellular environments because under normal circumstances, it would be masked by the interaction between PfMyoA and MTIP.
Collapse
|
30
|
Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
31
|
Fused Regression for Multi-source Gene Regulatory Network Inference. PLoS Comput Biol 2016; 12:e1005157. [PMID: 27923054 PMCID: PMC5140053 DOI: 10.1371/journal.pcbi.1005157] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 09/20/2016] [Indexed: 12/03/2022] Open
Abstract
Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms) and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method’s utility in learning from data collected on different experimental platforms. Gene regulatory networks describing related biological processes are thought to share conserved interaction structure. This assumption motivates a great deal of work in model systems–where discovery of gene regulation may be more experimentally tractable–but is difficult to directly evaluate using existing methods. The presence of shared structure in a well studied model system or process should make the problem of network inference in a related process easier, but this information is not often applied to the discovery of global gene regulatory networks. Further, to be able to successfully translate findings between different organisms, it is important to be able to identify where regulatory structure is different. We provide a method based on penalized fused regression for inferring gene regulatory networks given prior knowledge about the similarity of interactions in each network. This method is demonstrated on synthetic data, and applied to the problem of inferring networks in distantly related bacterial organisms. We then introduce an extension of the method to deal with the condition of uncertainty over the degree of regulatory conservation by simultaneously inferring gene conservation and interaction weights.
Collapse
|
32
|
Das A, Srinivasan M, Ghosh TS, Mande SS. Xenobiotic Metabolism and Gut Microbiomes. PLoS One 2016; 11:e0163099. [PMID: 27695034 PMCID: PMC5047465 DOI: 10.1371/journal.pone.0163099] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 09/03/2016] [Indexed: 12/27/2022] Open
Abstract
Humans are exposed to numerous xenobiotics, a majority of which are in the form of pharmaceuticals. Apart from human enzymes, recent studies have indicated the role of the gut bacterial community (microbiome) in metabolizing xenobiotics. However, little is known about the contribution of the plethora of gut microbiome in xenobiotic metabolism. The present study reports the results of analyses on xenobiotic metabolizing enzymes in various human gut microbiomes. A total of 397 available gut metagenomes from individuals of varying age groups from 8 nationalities were analyzed. Based on the diversities and abundances of the xenobiotic metabolizing enzymes, various bacterial taxa were classified into three groups, namely, least versatile, intermediately versatile and highly versatile xenobiotic metabolizers. Most interestingly, specific relationships were observed between the overall drug consumption profile and the abundance and diversity of the xenobiotic metabolizing repertoire in various geographies. The obtained differential abundance patterns of xenobiotic metabolizing enzymes and bacterial genera harboring them, suggest their links to pharmacokinetic variations among individuals. Additional analyses of a few well studied classes of drug modifying enzymes (DMEs) also indicate geographic as well as age specific trends.
Collapse
Affiliation(s)
- Anubhav Das
- TCS Research, Tata Consultancy Services Ltd., Pune, Maharashtra, India
| | - Meenakshi Srinivasan
- Manipal College of Pharmaceutical Sciences, Manipal University, Manipal, Karnataka, India
| | | | - Sharmila S. Mande
- TCS Research, Tata Consultancy Services Ltd., Pune, Maharashtra, India
- * E-mail:
| |
Collapse
|
33
|
Pan ST, Xue D, Li ZL, Zhou ZW, He ZX, Yang Y, Yang T, Qiu JX, Zhou SF. Computational Identification of the Paralogs and Orthologs of Human Cytochrome P450 Superfamily and the Implication in Drug Discovery. Int J Mol Sci 2016; 17:E1020. [PMID: 27367670 PMCID: PMC4964396 DOI: 10.3390/ijms17071020] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 04/02/2016] [Accepted: 06/07/2016] [Indexed: 12/31/2022] Open
Abstract
The human cytochrome P450 (CYP) superfamily consisting of 57 functional genes is the most important group of Phase I drug metabolizing enzymes that oxidize a large number of xenobiotics and endogenous compounds, including therapeutic drugs and environmental toxicants. The CYP superfamily has been shown to expand itself through gene duplication, and some of them become pseudogenes due to gene mutations. Orthologs and paralogs are homologous genes resulting from speciation or duplication, respectively. To explore the evolutionary and functional relationships of human CYPs, we conducted this bioinformatic study to identify their corresponding paralogs, homologs, and orthologs. The functional implications and implications in drug discovery and evolutionary biology were then discussed. GeneCards and Ensembl were used to identify the paralogs of human CYPs. We have used a panel of online databases to identify the orthologs of human CYP genes: NCBI, Ensembl Compara, GeneCards, OMA ("Orthologous MAtrix") Browser, PATHER, TreeFam, EggNOG, and Roundup. The results show that each human CYP has various numbers of paralogs and orthologs using GeneCards and Ensembl. For example, the paralogs of CYP2A6 include CYP2A7, 2A13, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 2F1, 2J2, 2R1, 2S1, 2U1, and 2W1; CYP11A1 has 6 paralogs including CYP11B1, 11B2, 24A1, 27A1, 27B1, and 27C1; CYP51A1 has only three paralogs: CYP26A1, 26B1, and 26C1; while CYP20A1 has no paralog. The majority of human CYPs are well conserved from plants, amphibians, fishes, or mammals to humans due to their important functions in physiology and xenobiotic disposition. The data from different approaches are also cross-validated and validated when experimental data are available. These findings facilitate our understanding of the evolutionary relationships and functional implications of the human CYP superfamily in drug discovery.
Collapse
Affiliation(s)
- Shu-Ting Pan
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Nanchang University, Nanchang 330003, China.
| | - Danfeng Xue
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Nanchang University, Nanchang 330003, China.
| | - Zhi-Ling Li
- Department of Pharmacy, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
| | - Zhi-Wei Zhou
- Department of Pharmaceutical Sciences, School of Pharmacy, Texas Tech University Health Sciences Center, Amarillo, TX 79106, USA.
| | - Zhi-Xu He
- Guizhou Provincial Key Laboratory for Regenerative Medicine, Stem Cell and Tissue Engineering Research Center & Sino-US Joint Laboratory for Medical Sciences, Guizhou Medical University, Guiyang 550004, China.
| | - Yinxue Yang
- Department of Colorectal Surgery, General Hospital of Ningxia Medical University, Yinchuan 750004, China.
| | - Tianxin Yang
- Department of Internal Medicine, University of Utah and Salt Lake Veterans Affairs Medical Center, Salt Lake City, UT 84132, USA.
| | - Jia-Xuan Qiu
- Department of Oral and Maxillofacial Surgery, the First Affiliated Hospital of Nanchang University, Nanchang 330003, China.
| | - Shu-Feng Zhou
- Department of Chemical and Pharmaceutical Engineering, College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian, China.
| |
Collapse
|
34
|
Lee SB, Kim JA, Lim HS. Metabolic pathway of 3,6-anhydro-D-galactose in carrageenan-degrading microorganisms. Appl Microbiol Biotechnol 2016; 100:4109-21. [DOI: 10.1007/s00253-016-7346-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 01/18/2016] [Accepted: 01/22/2016] [Indexed: 10/22/2022]
|
35
|
Shin WH, Bures MG, Kihara D. PatchSurfers: Two methods for local molecular property-based binding ligand prediction. Methods 2016; 93:41-50. [PMID: 26427548 PMCID: PMC4718779 DOI: 10.1016/j.ymeth.2015.09.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Revised: 09/27/2015] [Accepted: 09/28/2015] [Indexed: 01/09/2023] Open
Abstract
Protein function prediction is an active area of research in computational biology. Function prediction can help biologists make hypotheses for characterization of genes and help interpret biological assays, and thus is a productive area for collaboration between experimental and computational biologists. Among various function prediction methods, predicting binding ligand molecules for a target protein is an important class because ligand binding events for a protein are usually closely intertwined with the proteins' biological function, and also because predicted binding ligands can often be directly tested by biochemical assays. Binding ligand prediction methods can be classified into two types: those which are based on protein-protein (or pocket-pocket) comparison, and those that compare a target pocket directly to ligands. Recently, our group proposed two computational binding ligand prediction methods, Patch-Surfer, which is a pocket-pocket comparison method, and PL-PatchSurfer, which compares a pocket to ligand molecules. The two programs apply surface patch-based descriptions to calculate similarity or complementarity between molecules. A surface patch is characterized by physicochemical properties such as shape, hydrophobicity, and electrostatic potentials. These properties on the surface are represented using three-dimensional Zernike descriptors (3DZD), which are based on a series expansion of a 3 dimensional function. Utilizing 3DZD for describing the physicochemical properties has two main advantages: (1) rotational invariance and (2) fast comparison. Here, we introduce Patch-Surfer and PL-PatchSurfer with an emphasis on PL-PatchSurfer, which is more recently developed. Illustrative examples of PL-PatchSurfer performance on binding ligand prediction as well as virtual drug screening are also provided.
Collapse
Affiliation(s)
- Woong-Hee Shin
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Mark Gregory Bures
- Discovery Chemistry Research and Technologies, Eli Lilly and Company, Indianapolis, IN 46285, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA.
| |
Collapse
|
36
|
Sinha R, Clarke J, Benson AK. Alignment behaviors of short peptides provide a roadmap for functional profiling of metagenomic data. BMC Genomics 2015; 16:1080. [PMID: 26691573 PMCID: PMC4687345 DOI: 10.1186/s12864-015-2272-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 12/03/2015] [Indexed: 12/03/2022] Open
Abstract
Background Functional assignments for short-read metagenomic data pose a significant computational challenge due to perceived unpredictability of alignment behavior and the inability to infer useful functional information from translated protein-fragments/peptides. To address this problem, we have examined the predictability of short peptide alignments by systematically studying alignment behavior of large sets of short peptides generated from well-characterized proteins as well as hypothetical proteins in the KEGG database. Results Using test sets of peptides modeling the length and phylogenetic distributions of short-read metagenomic data, we observed that peptides from well-characterized proteins had indistinguishable alignments to proteins from the same orthologous family and proteins from different families. Nonetheless, the patterns contained remarkable phylogenetic and structural signals, with alignments of even very short peptides naturally restricted to their orthologous family and/or proteins having similar structural folds. In stark contrast, peptides from “hypothetical proteins” had only sparse hit patterns with low frequencies and much lower identities. By weighting the structure-driven alignments and filtering peptides with behaviors similar to those derived from “hypothetical proteins”, we demonstrate that the accuracy of abundance predictions of protein families is dramatically improved. Conclusions Evolutionary processes have dispersed protein folds across multiple protein families, precluding accurate functional assignment to short peptides, whose alignment behavior is non-random and driven by structure. Algorithms that filter sparse peptides and weight hit patterns of peptides from “known space” dramatically improve quantification of functions from diverse mixtures of peptides and should substantially improve applications of metagenomic analyses requiring accurate quantitative measures of functional families. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2272-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rohita Sinha
- Department of Food Science and Technology, University of Nebraska, 256 Food Innovation Complex, Lincoln, NE, 68588-6205, USA.
| | - Jennifer Clarke
- Department of Food Science and Technology, University of Nebraska, 256 Food Innovation Complex, Lincoln, NE, 68588-6205, USA. .,Department of Statistics, University of Nebraska, Lincoln, NE, 68583, USA. .,Quantitative Life Sciences Initiative, University of Nebraska, Lincoln, NE, 68583, USA.
| | - Andrew K Benson
- Department of Food Science and Technology, University of Nebraska, 256 Food Innovation Complex, Lincoln, NE, 68588-6205, USA.
| |
Collapse
|
37
|
orthoFind Facilitates the Discovery of Homologous and Orthologous Proteins. PLoS One 2015; 10:e0143906. [PMID: 26624019 PMCID: PMC4666658 DOI: 10.1371/journal.pone.0143906] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 10/07/2015] [Indexed: 11/19/2022] Open
Abstract
Finding homologous and orthologous protein sequences is often the first step in evolutionary studies, annotation projects, and experiments of functional complementation. Despite all currently available computational tools, there is a requirement for easy-to-use tools that provide functional information. Here, a new web application called orthoFind is presented, which allows a quick search for homologous and orthologous proteins given one or more query sequences, allowing a recurrent and exhaustive search against reference proteomes, and being able to include user databases. It addresses the protein multidomain problem, searching for homologs with the same domain architecture, and gives a simple functional analysis of the results to help in the annotation process. orthoFind is easy to use and has been proven to provide accurate results with different datasets. Availability: http://www.bioinfocabd.upo.es/orthofind/.
Collapse
|
38
|
Della Corte D, Wildberg A, Schröder GF. Protein structure refinement with adaptively restrained homologous replicas. Proteins 2015; 84 Suppl 1:302-13. [PMID: 26441154 DOI: 10.1002/prot.24939] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 09/02/2015] [Accepted: 09/29/2015] [Indexed: 12/27/2022]
Abstract
A novel protein refinement protocol is presented which utilizes molecular dynamics (MD) simulations of an ensemble of adaptively restrained homologous replicas. This approach adds evolutionary information to the force field and reduces random conformational fluctuations by coupling of several replicas. It is shown that this protocol refines the majority of models from the CASP11 refinement category and that larger conformational changes of the starting structure are possible than with current state of the art methods. The performance of this protocol in the CASP11 experiment is discussed. We found that the quality of the refined model is correlated with the structural variance of the coupled replicas, which therefore provides a good estimator of model quality. Furthermore, some remarkable refinement results are discussed in detail. Proteins 2016; 84(Suppl 1):302-313. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Dennis Della Corte
- Institute of Complex Systems (ICS-6), Forschungszentrum Jülich, Jülich, 52425, Germany
| | - André Wildberg
- Institute of Complex Systems (ICS-6), Forschungszentrum Jülich, Jülich, 52425, Germany
| | - Gunnar F Schröder
- Institute of Complex Systems (ICS-6), Forschungszentrum Jülich, Jülich, 52425, Germany. .,Physics Department, University of Düsseldorf, Düsseldorf, 40225, Germany.
| |
Collapse
|
39
|
Liu Z, Hu J. Mislocalization-related disease gene discovery using gene expression based computational protein localization prediction. Methods 2015; 93:119-27. [PMID: 26416496 DOI: 10.1016/j.ymeth.2015.09.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 09/17/2015] [Accepted: 09/21/2015] [Indexed: 01/09/2023] Open
Abstract
Protein sorting is an important mechanism for transporting proteins to their target subcellular locations after their synthesis. Mutations on genes may disrupt the well regulated protein sorting process, leading to a variety of mislocation related diseases. This paper proposes a methodology to discover such disease genes based on gene expression data and computational protein localization prediction. A kernel logistic regression based algorithm is used to successfully identify several candidate cancer genes which may cause cancers due to their mislocation within the cell. Our results also showed that compared to the gene co-expression network defined on Pearson correlation coefficients, the nonlinear Maximum Correlation Coefficients (MIC) based co-expression network give better results for subcellular localization prediction.
Collapse
Affiliation(s)
- Zhonghao Liu
- Department of Computer Science & Engineering, University of South Carolina, 301 Main Street, Columbia, SC 29208, United States
| | - Jianjun Hu
- Department of Computer Science & Engineering, University of South Carolina, 301 Main Street, Columbia, SC 29208, United States.
| |
Collapse
|
40
|
Lee SB. Unusual metabolism of 3,6-anhydro-L-galactose in Vibrio sp. EJY3 and in E. coli containing two Vibrio sp. EJY3 genes. BIOTECHNOL BIOPROC E 2015. [DOI: 10.1007/s12257-015-0440-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
41
|
|
42
|
Identification and characterization of 2-keto-3-deoxy-l-rhamnonate dehydrogenase belonging to the MDR superfamily from the thermoacidophilic bacterium Sulfobacillus thermosulfidooxidans: implications to l-rhamnose metabolism in archaea. Extremophiles 2015; 19:469-78. [DOI: 10.1007/s00792-015-0731-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 01/09/2015] [Indexed: 10/24/2022]
|
43
|
Xu R, Zhou J, Liu B, He Y, Zou Q, Wang X, Chou KC. Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach. J Biomol Struct Dyn 2014; 33:1720-30. [PMID: 25252709 DOI: 10.1080/07391102.2014.968624] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
Collapse
Affiliation(s)
- Ruifeng Xu
- a School of Computer Science and Technology , Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town , Xili, Shenzhen 518055 , Guangdong , China
| | | | | | | | | | | | | |
Collapse
|
44
|
Cuadrat RRC, da Serra Cruz SM, Tschoeke DA, Silva E, Tosta F, Jucá H, Jardim R, Campos MLM, Mattoso M, Dávila AMR. An orthology-based analysis of pathogenic protozoa impacting global health: an improved comparative genomics approach with prokaryotes and model eukaryote orthologs. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:524-38. [PMID: 24960463 DOI: 10.1089/omi.2013.0172] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
Collapse
Affiliation(s)
- Rafael R C Cuadrat
- 1 Computational and Systems Biology Laboratory, Computational and Systems Biology Pole, Oswaldo Cruz Institute , Fiocruz, Brazil
| | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Gao L, Gao F, Jiang X, Zhang C, Zhang D, Wang L, Wu G, Chen S. Biochemical characterization of a new β-glucosidase (Cel3E) from Penicillium piceum and its application in boosting lignocelluloses bioconversion and forming disaccharide inducers: New insights into the role of β-glucosidase. Process Biochem 2014. [DOI: 10.1016/j.procbio.2014.02.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
46
|
Feinstein WP, Brylinski M. eFindSite: Enhanced Fingerprint-Based Virtual Screening Against Predicted Ligand Binding Sites in Protein Models. Mol Inform 2014; 33:135-50. [PMID: 27485570 DOI: 10.1002/minf.201300143] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 12/06/2013] [Indexed: 12/26/2022]
Abstract
A standard practice for lead identification in drug discovery is ligand virtual screening, which utilizes computing technologies to detect small compounds that likely bind to target proteins prior to experimental screens. A high accuracy is often achieved when the target protein has a resolved crystal structure; however, using protein models still renders significant challenges. Towards this goal, we recently developed eFindSite that predicts ligand binding sites using a collection of effective algorithms, including meta-threading, machine learning and reliable confidence estimation systems. Here, we incorporate fingerprint-based virtual screening capabilities in eFindSite in addition to its flagship role as a ligand binding pocket predictor. Virtual screening benchmarks using the enhanced Directory of Useful Decoys demonstrate that eFindSite significantly outperforms AutoDock Vina as assessed by several evaluation metrics. Importantly, this holds true regardless of the quality of target protein structures. As a first genome-wide application of eFindSite, we conduct large-scale virtual screening of the entire proteome of Escherichia coli with encouraging results. In the new approach to fingerprint-based virtual screening using remote protein homology, eFindSite demonstrates its compelling proficiency offering a high ranking accuracy and low susceptibility to target structure deformations. The enhanced version of eFindSite is freely available to the academic community at http://www.brylinski.org/efindsite.
Collapse
Affiliation(s)
- Wei P Feinstein
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA. .,Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
47
|
Muñoz-Mérida A, Viguera E, Claros MG, Trelles O, Pérez-Pulido AJ. Sma3s: a three-step modular annotator for large sequence datasets. DNA Res 2014; 21:341-53. [PMID: 24501397 PMCID: PMC4131829 DOI: 10.1093/dnares/dsu001] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes.
Collapse
Affiliation(s)
- Antonio Muñoz-Mérida
- Integrated Bioinformatics, National Institute for Bioinformatics, University of Málaga, Campus de Teatinos, Spain
| | - Enrique Viguera
- Cellular Biology, Genetics and Physiology Department, University of Málaga, Campus de Teatinos, Spain
| | - M Gonzalo Claros
- Molecular Biology and Biochemistry Department, University of Málaga, Campus de Teatinos, Spain
| | - Oswaldo Trelles
- Integrated Bioinformatics, National Institute for Bioinformatics, University of Málaga, Campus de Teatinos, Spain Computer Architecture Department, University of Málaga, Campus de Teatinos, Spain
| | - Antonio J Pérez-Pulido
- Centro Andaluz de Biología del Desarrollo (CABD, UPO-CSIC-JA), Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, Sevilla 41013, Spain
| |
Collapse
|
48
|
Oeffner RD, Bunkóczi G, McCoy AJ, Read RJ. Improved estimates of coordinate error for molecular replacement. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:2209-15. [PMID: 24189232 PMCID: PMC3817694 DOI: 10.1107/s0907444913023512] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 08/21/2013] [Indexed: 11/10/2022]
Abstract
The estimate of the root-mean-square deviation (r.m.s.d.) in coordinates between the model and the target is an essential parameter for calibrating likelihood functions for molecular replacement (MR). Good estimates of the r.m.s.d. lead to good estimates of the variance term in the likelihood functions, which increases signal to noise and hence success rates in the MR search. Phaser has hitherto used an estimate of the r.m.s.d. that only depends on the sequence identity between the model and target and which was not optimized for the MR likelihood functions. Variance-refinement functionality was added to Phaser to enable determination of the effective r.m.s.d. that optimized the log-likelihood gain (LLG) for a correct MR solution. Variance refinement was subsequently performed on a database of over 21,000 MR problems that sampled a range of sequence identities, protein sizes and protein fold classes. Success was monitored using the translation-function Z-score (TFZ), where a TFZ of 8 or over for the top peak was found to be a reliable indicator that MR had succeeded for these cases with one molecule in the asymmetric unit. Good estimates of the r.m.s.d. are correlated with the sequence identity and the protein size. A new estimate of the r.m.s.d. that uses these two parameters in a function optimized to fit the mean of the refined variance is implemented in Phaser and improves MR outcomes. Perturbing the initial estimate of the r.m.s.d. from the mean of the distribution in steps of standard deviations of the distribution further increases MR success rates.
Collapse
Affiliation(s)
- Robert D. Oeffner
- Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| | - Gábor Bunkóczi
- Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| | - Airlie J. McCoy
- Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| | - Randy J. Read
- Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England
| |
Collapse
|
49
|
Wang HX, Xiao H, Zhong L, Tao K, Li YJ, Huang SF, Wen JP, Feng WL. Cell-penetrating fusion peptides OD1 and OD2 interact with Bcr-Abl and influence the growth and apoptosis of K562 cells. Mol Cell Biochem 2013; 385:311-8. [PMID: 24091918 DOI: 10.1007/s11010-013-1841-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/26/2013] [Indexed: 11/26/2022]
Abstract
The Bcr-Abl oncoprotein is the cause of chronic myelogenous leukemia (CML). Crystal structure analysis suggests that Bcr30-63 is the core of the Bcr-Abl oligomerization interface for aberrant kinase activity; however, the precise role of other residues of Bcr1-72 excluding Bcr30-63 have not been evaluated. In this study, Bcr30-63 was named OD2 and other residues of Bcr1-72 were named OD1. Cytoplasmic transduction peptide (CTP) was used to carry molecules into cytoplasm. CTP-OD1 and CTP-OD2 fusion peptides were expressed from a cold-inducible expression system. Our results demonstrated that both fusion peptides could localize into the cytoplasm, specifically interact with the Bcr-Abl protein and further inhibit growth, induce apoptosis, and decrease the phosphorylation of Bcr-Abl in K562 cell lines. However, the viability of THP-1, a Bcr-Abl negative cell line, was unaffected. These results suggested that CTP-OD1 and CTP-OD2 may be an attractive therapeutic option to inhibit the activation of Bcr-Abl kinase in CML.
Collapse
Affiliation(s)
- Hai-Xia Wang
- Key Laboratory of Laboratory Medical Diagnostics Designated by the Ministry of Education, Department of Clinical Hematology, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Murakami Y, Kinoshita K, Kinjo AR, Nakamura H. Exhaustive comparison and classification of ligand-binding surfaces in proteins. Protein Sci 2013; 22:1379-91. [PMID: 23934772 PMCID: PMC3795496 DOI: 10.1002/pro.2329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 07/29/2013] [Accepted: 08/05/2013] [Indexed: 12/03/2022]
Abstract
Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.
Collapse
Affiliation(s)
- Yoichi Murakami
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki-aza-aoba, Aoba-ku, Sendai, Miyagi, 982-0036, Japan
| | | | | | | |
Collapse
|