1
|
Utriainen M, Morris JH. clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape. BMC Bioinformatics 2023; 24:134. [PMID: 37020209 PMCID: PMC10074866 DOI: 10.1186/s12859-023-05225-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 03/11/2023] [Indexed: 04/07/2023] Open
Abstract
BACKGROUND Since the initial publication of clusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis. clusterMaker2 has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the Cytoscape jobs API, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity. RESULTS The use of clusterMaker2 is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein-protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from within clusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes. CONCLUSIONS clusterMaker2 represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.
Collapse
Affiliation(s)
| | - John H Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
2
|
Sim M, Lee J, Kwon D, Lee D, Park N, Wy S, Ko Y, Kim J. Reference-based read clustering improves the de novo genome assembly of microbial strains. Comput Struct Biotechnol J 2022; 21:444-451. [PMID: 36618978 PMCID: PMC9804104 DOI: 10.1016/j.csbj.2022.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/17/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022] Open
Abstract
Constructing accurate microbial genome assemblies is necessary to understand genetic diversity in microbial genomes and its functional consequences. However, it still remains as a challenging task especially when only short-read sequencing technologies are used. Here, we present a new read-clustering algorithm, called RBRC, for improving de novo microbial genome assembly, by accurately estimating read proximity using multiple reference genomes. The performance of RBRC was confirmed by simulation-based evaluation in terms of assembly contiguity and the number of misassemblies, and was successfully applied to existing fungal and bacterial genomes by improving the quality of the assemblies without using additional sequencing data. RBRC is a very useful read-clustering algorithm that can be used (i) for generating high-quality genome assemblies of microbial strains when genome assemblies of related strains are available, and (ii) for upgrading existing microbial genome assemblies when the generation of additional sequencing data, such as long reads, is difficult.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do 17035, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea,Corresponding author.
| |
Collapse
|
3
|
Rocafort M, Bowen JK, Hassing B, Cox MP, McGreal B, de la Rosa S, Plummer KM, Bradshaw RE, Mesarich CH. The Venturia inaequalis effector repertoire is dominated by expanded families with predicted structural similarity, but unrelated sequence, to avirulence proteins from other plant-pathogenic fungi. BMC Biol 2022; 20:246. [PMID: 36329441 PMCID: PMC9632046 DOI: 10.1186/s12915-022-01442-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Scab, caused by the biotrophic fungus Venturia inaequalis, is the most economically important disease of apples worldwide. During infection, V. inaequalis occupies the subcuticular environment, where it secretes virulence factors, termed effectors, to promote host colonization. Consistent with other plant-pathogenic fungi, many of these effectors are expected to be non-enzymatic proteins, some of which can be recognized by corresponding host resistance proteins to activate plant defences, thus acting as avirulence determinants. To develop durable control strategies against scab, a better understanding of the roles that these effector proteins play in promoting subcuticular growth by V. inaequalis, as well as in activating, suppressing, or circumventing resistance protein-mediated defences in apple, is required. RESULTS We generated the first comprehensive RNA-seq transcriptome of V. inaequalis during colonization of apple. Analysis of this transcriptome revealed five temporal waves of gene expression that peaked during early, mid, or mid-late infection. While the number of genes encoding secreted, non-enzymatic proteinaceous effector candidates (ECs) varied in each wave, most belonged to waves that peaked in expression during mid-late infection. Spectral clustering based on sequence similarity determined that the majority of ECs belonged to expanded protein families. To gain insights into function, the tertiary structures of ECs were predicted using AlphaFold2. Strikingly, despite an absence of sequence similarity, many ECs were predicted to have structural similarity to avirulence proteins from other plant-pathogenic fungi, including members of the MAX, LARS, ToxA and FOLD effector families. In addition, several other ECs, including an EC family with sequence similarity to the AvrLm6 avirulence effector from Leptosphaeria maculans, were predicted to adopt a KP6-like fold. Thus, proteins with a KP6-like fold represent another structural family of effectors shared among plant-pathogenic fungi. CONCLUSIONS Our study reveals the transcriptomic profile underpinning subcuticular growth by V. inaequalis and provides an enriched list of ECs that can be investigated for roles in virulence and avirulence. Furthermore, our study supports the idea that numerous sequence-unrelated effectors across plant-pathogenic fungi share common structural folds. In doing so, our study gives weight to the hypothesis that many fungal effectors evolved from ancestral genes through duplication, followed by sequence diversification, to produce sequence-unrelated but structurally similar proteins.
Collapse
Affiliation(s)
- Mercedes Rocafort
- Laboratory of Molecular Plant Pathology/Bioprotection Aotearoa, School of Agriculture and Environment, Massey University, Private Bag 11222, Palmerston North, 4442, New Zealand
| | - Joanna K Bowen
- The New Zealand Institute for Plant and Food Research Limited, Mount Albert Research Centre, Auckland, 1025, New Zealand
| | - Berit Hassing
- Laboratory of Molecular Plant Pathology/Bioprotection Aotearoa, School of Agriculture and Environment, Massey University, Private Bag 11222, Palmerston North, 4442, New Zealand
| | - Murray P Cox
- Bioprotection Aotearoa, School of Natural Sciences, Massey University, Private Bag 11222, Palmerston North, 4442, New Zealand
| | - Brogan McGreal
- The New Zealand Institute for Plant and Food Research Limited, Mount Albert Research Centre, Auckland, 1025, New Zealand
| | - Silvia de la Rosa
- Laboratory of Molecular Plant Pathology/Bioprotection Aotearoa, School of Agriculture and Environment, Massey University, Private Bag 11222, Palmerston North, 4442, New Zealand
| | - Kim M Plummer
- Department of Animal, Plant and Soil Sciences, La Trobe University, AgriBio, Centre for AgriBiosciences, La Trobe University, Bundoora, Victoria, 3086, Australia
| | - Rosie E Bradshaw
- Bioprotection Aotearoa, School of Natural Sciences, Massey University, Private Bag 11222, Palmerston North, 4442, New Zealand
| | - Carl H Mesarich
- Laboratory of Molecular Plant Pathology/Bioprotection Aotearoa, School of Agriculture and Environment, Massey University, Private Bag 11222, Palmerston North, 4442, New Zealand.
| |
Collapse
|
4
|
He K. Pharmacological affinity fingerprints derived from bioactivity data for the identification of designer drugs. J Cheminform 2022; 14:35. [PMID: 35672835 PMCID: PMC9171973 DOI: 10.1186/s13321-022-00607-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022] Open
Abstract
Facing the continuous emergence of new psychoactive substances (NPS) and their threat to public health, more effective methods for NPS prediction and identification are critical. In this study, the pharmacological affinity fingerprints (Ph-fp) of NPS compounds were predicted by Random Forest classification models using bioactivity data from the ChEMBL database. The binary Ph-fp is the vector consisting of a compound's activity against a list of molecular targets reported to be responsible for the pharmacological effects of NPS. Their performance in similarity searching and unsupervised clustering was assessed and compared to 2D structure fingerprints Morgan and MACCS (1024-bits ECFP4 and 166-bits SMARTS-based MACCS implementation of RDKit). The performance in retrieving compounds according to their pharmacological categorizations is influenced by the predicted active assay counts in Ph-fp and the choice of similarity metric. Overall, the comparative unsupervised clustering analysis suggests the use of a classification model with Morgan fingerprints as input for the construction of Ph-fp. This combination gives satisfactory clustering performance based on external and internal clustering validation indices.
Collapse
Affiliation(s)
- Kedan He
- Physical Sciences, Eastern Connecticut State University, 83 Windham St, Willimantic, CT, 06226, USA.
| |
Collapse
|
5
|
Karatzas E, Gkonta M, Hotova J, Baltoumas FA, Kontou PI, Bobotsis CJ, Bagos PG, Pavlopoulos GA. VICTOR: A visual analytics web application for comparing cluster sets. Comput Biol Med 2021; 135:104557. [PMID: 34139436 DOI: 10.1016/j.compbiomed.2021.104557] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/04/2021] [Accepted: 06/04/2021] [Indexed: 01/21/2023]
Abstract
Clustering is the process of grouping different data objects based on similar properties. Clustering has applications in various case studies from several fields such as graph theory, image analysis, pattern recognition, statistics and others. Nowadays, there are numerous algorithms and tools able to generate clustering results. However, different algorithms or parameterizations may produce quite dissimilar cluster sets. In this way, the user is often forced to manually filter and compare these results in order to decide which of them generate the ideal clusters. To automate this process, in this study, we present VICTOR, the first fully interactive and dependency-free visual analytics web application which allows the visual comparison of the results of various clustering algorithms. VICTOR can handle multiple cluster set results simultaneously and compare them using ten different metrics. Clustering results can be filtered and compared to each other with the use of data tables or interactive heatmaps, bar plots, correlation networks, sankey and circos plots. We demonstrate VICTOR's functionality using three examples. In the first case, we compare five different network clustering algorithms on a Yeast protein-protein interaction dataset whereas in the second example, we test four different parameters of the MCL clustering algorithm on the same dataset. Finally, as a third example, we compare four different meta-analyses with hierarchically clustered differentially expressed genes found to be involved in myocardial infarction. VICTOR is available at http://victor.pavlopouloslab.info or http://bib.fleming.gr:3838/VICTOR.
Collapse
Affiliation(s)
- Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece.
| | - Maria Gkonta
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece; Department of Biology, University of Athens, Greece
| | - Joana Hotova
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece; Department of Biology, University of Athens, Greece
| | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Panagiota I Kontou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | | | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | | |
Collapse
|
6
|
Orb-weaving spider Araneus ventricosus genome elucidates the spidroin gene catalogue. Sci Rep 2019; 9:8380. [PMID: 31182776 PMCID: PMC6557832 DOI: 10.1038/s41598-019-44775-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Accepted: 05/22/2019] [Indexed: 02/02/2023] Open
Abstract
Members of the family Araneidae are common orb-weaving spiders, and they produce several types of silks throughout their behaviors and lives, from reproduction to foraging. Egg sac, prey capture thread, or dragline silk possesses characteristic mechanical properties, and its variability makes it a highly attractive material for ecological, evolutional, and industrial fields. However, the complete set of constituents of silks produced by a single species is still unclear, and novel spidroin genes as well as other proteins are still being found. Here, we present the first genome in genus Araneus together with the full set of spidroin genes with unamplified long reads and confirmed with transcriptome of the silk glands and proteome analysis of the dragline silk. The catalogue includes the first full length sequence of a paralog of major ampullate spidroin MaSp3, and several spider silk-constituting elements designated SpiCE. Family-wide phylogenomic analysis of Araneidae suggests the relatively recent acquisition of these genes, and multiple-omics analyses demonstrate that these proteins are critical components in the abdominal spidroin gland and dragline silk, contributing to the outstanding mechanical properties of silk in this group of species.
Collapse
|
7
|
Firrincieli A, Presentato A, Favoino G, Marabottini R, Allevato E, Stazi SR, Scarascia Mugnozza G, Harfouche A, Petruccioli M, Turner RJ, Zannoni D, Cappelletti M. Identification of Resistance Genes and Response to Arsenic in Rhodococcus aetherivorans BCP1. Front Microbiol 2019; 10:888. [PMID: 31133997 PMCID: PMC6514093 DOI: 10.3389/fmicb.2019.00888] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 04/08/2019] [Indexed: 11/28/2022] Open
Abstract
Arsenic (As) ranks among the priority metal(loid)s that are of public health concern. In the environment, arsenic is present in different forms, organic or inorganic, featured by various toxicity levels. Bacteria have developed different strategies to deal with this toxicity involving different resistance genetic determinants. Bacterial strains of Rhodococcus genus, and more in general Actinobacteria phylum, have the ability to cope with high concentrations of toxic metalloids, although little is known on the molecular and genetic bases of these metabolic features. Here we show that Rhodococcus aetherivorans BCP1, an extremophilic actinobacterial strain able to tolerate high concentrations of organic solvents and toxic metalloids, can grow in the presence of high concentrations of As(V) (up to 240 mM) under aerobic growth conditions using glucose as sole carbon and energy source. Notably, BCP1 cells improved their growth performance as well as their capacity of reducing As(V) into As(III) when the concentration of As(V) is within 30–100 mM As(V). Genomic analysis of BCP1 compared to other actinobacterial strains revealed the presence of three gene clusters responsible for organic and inorganic arsenic resistance. In particular, two adjacent and divergently oriented ars gene clusters include three arsenate reductase genes (arsC1/2/3) involved in resistance mechanisms against As(V). A sequence similarity network (SSN) and phylogenetic analysis of these arsenate reductase genes indicated that two of them (ArsC2/3) are functionally related to thioredoxin (Trx)/thioredoxin reductase (TrxR)-dependent class and one of them (ArsC1) to the mycothiol (MSH)/mycoredoxin (Mrx)-dependent class. A targeted transcriptomic analysis performed by RT-qPCR indicated that the arsenate reductase genes as well as other genes included in the ars gene cluster (possible regulator gene, arsR, and arsenite extrusion genes, arsA, acr3, and arsD) are transcriptionally induced when BCP1 cells were exposed to As(V) supplied at two different sub-lethal concentrations. This work provides for the first time insights into the arsenic resistance mechanisms of a Rhodococcus strain, revealing some of the unique metabolic requirements for the environmental persistence of this bacterial genus and its possible use in bioremediation procedures of toxic metal contaminated sites.
Collapse
Affiliation(s)
- Andrea Firrincieli
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Alessandro Presentato
- Department of Biotechnology, University of Verona, Verona, Italy.,Department of Biological Sciences, University of Calgary, Calgary, AB, Canada
| | - Giusi Favoino
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rosita Marabottini
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Enrica Allevato
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Silvia Rita Stazi
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Giuseppe Scarascia Mugnozza
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Antoine Harfouche
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Maurizio Petruccioli
- Department for the Innovation in Biological Systems, Agro-Food and Forestry, University of Tuscia, Viterbo, Italy
| | - Raymond J Turner
- Department of Biological Sciences, University of Calgary, Calgary, AB, Canada
| | - Davide Zannoni
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Martina Cappelletti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
8
|
Early Diverging Insect-Pathogenic Fungi of the Order Entomophthorales Possess Diverse and Unique Subtilisin-Like Serine Proteases. G3-GENES GENOMES GENETICS 2018; 8:3311-3319. [PMID: 30111619 PMCID: PMC6169396 DOI: 10.1534/g3.118.200656] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Insect-pathogenic fungi use subtilisin-like serine proteases (SLSPs) to degrade chitin-associated proteins in the insect procuticle. Most insect-pathogenic fungi in the order Hypocreales (Ascomycota) are generalist species with a broad host-range, and most species possess a high number of SLSPs. The other major clade of insect-pathogenic fungi is part of the subphylum Entomophthoromycotina (Zoopagomycota, formerly Zygomycota) which consists of high host-specificity insect-pathogenic fungi that naturally only infect a single or very few host species. The extent to which insect-pathogenic fungi in the order Entomophthorales rely on SLSPs is unknown. Here we take advantage of recently available transcriptomic and genomic datasets from four genera within Entomophthoromycotina: the saprobic or opportunistic pathogens Basidiobolus meristosporus, Conidiobolus coronatus, C. thromboides, C. incongruus, and the host-specific insect pathogens Entomophthora muscae and Pandora formicae, specific pathogens of house flies (Muscae domestica) and wood ants (Formica polyctena), respectively. In total 154 SLSP from six fungi in the subphylum Entomophthoromycotina were identified: E. muscae (n = 22), P. formicae (n = 6), B. meristosporus (n = 60), C. thromboides (n = 18), C. coronatus (n = 36), and C. incongruus (n = 12). A unique group of 11 SLSPs was discovered in the genomes of the obligate biotrophic fungi E. muscae, P. formicae and the saprobic human pathogen C. incongruus that loosely resembles bacillopeptidase F-like SLSPs. Phylogenetics and protein domain analysis show this class represents a unique group of SLSPs so far only observed among Bacteria, Oomycetes and early diverging fungi such as Cryptomycota, Microsporidia, and Entomophthoromycotina. This group of SLSPs is missing in the sister fungal lineages of Kickxellomycotina and the fungal phyla Mucoromyocta, Ascomycota and Basidiomycota fungi suggesting interesting gene loss patterns.
Collapse
|
9
|
Whole genome sequence and comparative analysis of Borrelia burgdorferi MM1. PLoS One 2018; 13:e0198135. [PMID: 29889842 PMCID: PMC5995427 DOI: 10.1371/journal.pone.0198135] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 05/14/2018] [Indexed: 11/21/2022] Open
Abstract
Lyme disease is caused by spirochaetes of the Borrelia burgdorferi sensu lato genospecies. Complete genome assemblies are available for fewer than ten strains of Borrelia burgdorferi sensu stricto, the primary cause of Lyme disease in North America. MM1 is a sensu stricto strain originally isolated in the midwestern United States. Aside from a small number of genes, the complete genome sequence of this strain has not been reported. Here we present the complete genome sequence of MM1 in relation to other sensu stricto strains and in terms of its Multi Locus Sequence Typing. Our results indicate that MM1 is a new sequence type which contains a conserved main chromosome and 15 plasmids. Our results include the first contiguous 28.5 kb assembly of lp28-8, a linear plasmid carrying the vls antigenic variation system, from a Borrelia burgdorferi sensu stricto strain.
Collapse
|
10
|
Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018; 34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. Availability and implementation MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. Contact emoriyama2@unl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brittney N Keel
- USDA †, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA.,Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Bo Deng
- Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences and Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
11
|
Pavlopoulos GA, Kontou PI, Pavlopoulou A, Bouyioukos C, Markou E, Bagos PG. Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience 2018; 7:1-31. [PMID: 29648623 PMCID: PMC6333914 DOI: 10.1093/gigascience/giy014] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/15/2018] [Accepted: 02/13/2018] [Indexed: 11/14/2022] Open
Abstract
The latest advances in high-throughput techniques during the past decade allowed the systems biology field to expand significantly. Today, the focus of biologists has shifted from the study of individual biological components to the study of complex biological systems and their dynamics at a larger scale. Through the discovery of novel bioentity relationships, researchers reveal new information about biological functions and processes. Graphs are widely used to represent bioentities such as proteins, genes, small molecules, ligands, and others such as nodes and their connections as edges within a network. In this review, special focus is given to the usability of bipartite graphs and their impact on the field of network biology and medicine. Furthermore, their topological properties and how these can be applied to certain biological case studies are discussed. Finally, available methodologies and software are presented, and useful insights on how bipartite graphs can shape the path toward the solution of challenging biological problems are provided.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Lawrence Berkeley Labs, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Panagiota I Kontou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Athanasia Pavlopoulou
- Izmir International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylül University, 35340, Turkey
| | - Costas Bouyioukos
- Université Paris Diderot, Sorbonne Paris Cité, Epigenetics and Cell Fate, UMR7216, CNRS, France
| | - Evripides Markou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Pantelis G Bagos
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| |
Collapse
|
12
|
Nagata S, Imai J, Makino G, Tomita M, Kanai A. Evolutionary Analysis of HIV-1 Pol Proteins Reveals Representative Residues for Viral Subtype Differentiation. Front Microbiol 2017; 8:2151. [PMID: 29163435 PMCID: PMC5666293 DOI: 10.3389/fmicb.2017.02151] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 10/20/2017] [Indexed: 11/15/2022] Open
Abstract
RNA viruses have been used as model systems to understand the patterns and processes of molecular evolution because they have high mutation rates and are genetically diverse. Human immunodeficiency virus 1 (HIV-1), the etiological agent of acquired immune deficiency syndrome, is highly genetically diverse, and is classified into several groups and subtypes. However, it has been difficult to use its diverse sequences to establish the overall phylogenetic relationships of different strains or the trends in sequence conservation with the construction of phylogenetic trees. Our aims were to systematically characterize HIV-1 subtype evolution and to identify the regions responsible for HIV-1 subtype differentiation at the amino acid level in the Pol protein, which is often used to classify the HIV-1 subtypes. In this study, we systematically characterized the mutation sites in 2,052 Pol proteins from HIV-1 group M (144 subtype A; 1,528 subtype B; 380 subtype C), using sequence similarity networks. We also used spectral clustering to group the sequences based on the network graph structures. A stepwise analysis of the cluster hierarchies allowed us to estimate a possible evolutionary pathway for the Pol proteins. The subtype A sequences also clustered according to when and where the viruses were isolated, whereas both the subtype B and C sequences remained as single clusters. Because the Pol protein has several functional domains, we identified the regions that are discriminative by comparing the structures of the domain-based networks. Our results suggest that sequence changes in the RNase H domain and the reverse transcriptase (RT) connection domain are responsible for the subtype classification. By analyzing the different amino acid compositions at each site in both domain sequences, we found that a few specific amino acid residues (i.e., M357 in the RT connection domain and Q480, Y483, and L491 in the RNase H domain) represent the differences among the subtypes. These residues were located on the surface of the RT structure and in the vicinity of the amino acid sites responsible for RT enzymatic activity or function.
Collapse
Affiliation(s)
- Shohei Nagata
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan
| | - Junnosuke Imai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Gakuto Makino
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan.,Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Akio Kanai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Faculty of Environment and Information Studies, Keio University, Fujisawa, Japan.,Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| |
Collapse
|
13
|
Chowdhary J, Löffler FE, Smith JC. Community detection in sequence similarity networks based on attribute clustering. PLoS One 2017; 12:e0178650. [PMID: 28738060 PMCID: PMC5524321 DOI: 10.1371/journal.pone.0178650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 05/16/2017] [Indexed: 11/18/2022] Open
Abstract
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs, for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments.
Collapse
Affiliation(s)
- Janamejaya Chowdhary
- Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
- University of Tennessee-Oak Ridge National Laboratory, Joint Institute for Biological Sciences and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Frank E. Löffler
- Department of Microbiology, Department of Civil and Environmental Engineering, University of Tennessee, Knoxville, Tennessee, United States of America
- Center for Environmental Biotechnology, University of Tennessee, Knoxville, Tennessee, United States of America
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee, United States of America
| | - Jeremy C. Smith
- Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
- University of Tennessee-Oak Ridge National Laboratory, Joint Institute for Biological Sciences and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee, United States of America
| |
Collapse
|
14
|
Empirical Comparison of Visualization Tools for Larger-Scale Network Analysis. Adv Bioinformatics 2017; 2017:1278932. [PMID: 28804499 PMCID: PMC5540468 DOI: 10.1155/2017/1278932] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 05/14/2017] [Accepted: 06/04/2017] [Indexed: 12/19/2022] Open
Abstract
Gene expression, signal transduction, protein/chemical interactions, biomedical literature cooccurrences, and other concepts are often captured in biological network representations where nodes represent a certain bioentity and edges the connections between them. While many tools to manipulate, visualize, and interactively explore such networks already exist, only few of them can scale up and follow today's indisputable information growth. In this review, we shortly list a catalog of available network visualization tools and, from a user-experience point of view, we identify four candidate tools suitable for larger-scale network analysis, visualization, and exploration. We comment on their strengths and their weaknesses and empirically discuss their scalability, user friendliness, and postvisualization capabilities.
Collapse
|
15
|
Deng CH, Plummer KM, Jones DAB, Mesarich CH, Shiller J, Taranto AP, Robinson AJ, Kastner P, Hall NE, Templeton MD, Bowen JK. Comparative analysis of the predicted secretomes of Rosaceae scab pathogens Venturia inaequalis and V. pirina reveals expanded effector families and putative determinants of host range. BMC Genomics 2017; 18:339. [PMID: 28464870 PMCID: PMC5412055 DOI: 10.1186/s12864-017-3699-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 04/11/2017] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Fungal plant pathogens belonging to the genus Venturia cause damaging scab diseases of members of the Rosaceae. In terms of economic impact, the most important of these are V. inaequalis, which infects apple, and V. pirina, which is a pathogen of European pear. Given that Venturia fungi colonise the sub-cuticular space without penetrating plant cells, it is assumed that effectors that contribute to virulence and determination of host range will be secreted into this plant-pathogen interface. Thus the predicted secretomes of a range of isolates of Venturia with distinct host-ranges were interrogated to reveal putative proteins involved in virulence and pathogenicity. RESULTS Genomes of Venturia pirina (one European pear scab isolate) and Venturia inaequalis (three apple scab, and one loquat scab, isolates) were sequenced and the predicted secretomes of each isolate identified. RNA-Seq was conducted on the apple-specific V. inaequalis isolate Vi1 (in vitro and infected apple leaves) to highlight virulence and pathogenicity components of the secretome. Genes encoding over 600 small secreted proteins (candidate effectors) were identified, most of which are novel to Venturia, with expansion of putative effector families a feature of the genus. Numerous genes with similarity to Leptosphaeria maculans AvrLm6 and the Verticillium spp. Ave1 were identified. Candidates for avirulence effectors with cognate resistance genes involved in race-cultivar specificity were identified, as were putative proteins involved in host-species determination. Candidate effectors were found, on average, to be in regions of relatively low gene-density and in closer proximity to repeats (e.g. transposable elements), compared with core eukaryotic genes. CONCLUSIONS Comparative secretomics has revealed candidate effectors from Venturia fungal plant pathogens that attack pome fruit. Effectors that are putative determinants of host range were identified; both those that may be involved in race-cultivar and host-species specificity. Since many of the effector candidates are in close proximity to repetitive sequences this may point to a possible mechanism for the effector gene family expansion observed and a route to diversification via transposition and repeat-induced point mutation.
Collapse
Affiliation(s)
- Cecilia H. Deng
- The New Zealand Institute for Plant & Food Research Limited (PFR), Auckland, New Zealand
| | - Kim M. Plummer
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
- Plant Biosecurity Cooperative Research Centre, Bruce, ACT Australia
| | - Darcy A. B. Jones
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
- Present Address: The Centre for Crop and Disease Management, Curtin University, Bentley, Australia
| | - Carl H. Mesarich
- The New Zealand Institute for Plant & Food Research Limited (PFR), Auckland, New Zealand
- The School of Biological Sciences, University of Auckland, Auckland, New Zealand
- Present Address: Institute of Agriculture & Environment, Massey University, Palmerston North, New Zealand
| | - Jason Shiller
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
- Present Address: INRA-Angers, Beaucouzé, Cedex, France
| | - Adam P. Taranto
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
- Plant Sciences Division, Research School of Biology, The Australian National University, Canberra, Australia
| | - Andrew J. Robinson
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
- Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative (VLSCI), Victoria, Australia
| | - Patrick Kastner
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
| | - Nathan E. Hall
- Animal, Plant & Soil Sciences Department, AgriBio Centre for AgriBioscience, La Trobe University, Melbourne, Victoria Australia
- Life Sciences Computation Centre, Victorian Life Sciences Computation Initiative (VLSCI), Victoria, Australia
| | - Matthew D. Templeton
- The New Zealand Institute for Plant & Food Research Limited (PFR), Auckland, New Zealand
- The School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Joanna K. Bowen
- The New Zealand Institute for Plant & Food Research Limited (PFR), Auckland, New Zealand
| |
Collapse
|
16
|
Berg J, Järvisalo M. Cost-optimal constrained correlation clustering via weighted partial Maximum Satisfiability. ARTIF INTELL 2017. [DOI: 10.1016/j.artint.2015.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
17
|
Tamimi A, Ashhab Y, Tamimi H. Accelerating Information Retrieval from Profile Hidden Markov Model Databases. PLoS One 2016; 11:e0166358. [PMID: 27875548 PMCID: PMC5119741 DOI: 10.1371/journal.pone.0166358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 10/27/2016] [Indexed: 11/18/2022] Open
Abstract
Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.
Collapse
Affiliation(s)
- Ahmad Tamimi
- College of Information Technology and Computer Engineering, Palestine Polytechnic University, Hebron, Palestine
- * E-mail: (AT); (HT)
| | - Yaqoub Ashhab
- Palestine-Korea Biotechnology Center, Palestine Polytechnic University, Hebron, Palestine
| | - Hashem Tamimi
- College of Information Technology and Computer Engineering, Palestine Polytechnic University, Hebron, Palestine
- Palestine-Korea Biotechnology Center, Palestine Polytechnic University, Hebron, Palestine
- * E-mail: (AT); (HT)
| |
Collapse
|
18
|
Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I. DrugQuest - a text mining workflow for drug association discovery. BMC Bioinformatics 2016; 17 Suppl 5:182. [PMID: 27295093 PMCID: PMC4905607 DOI: 10.1186/s12859-016-1041-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Results Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank “Description”, “Indication”, “Pharmacodynamics” and “Mechanism of Action” text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. Conclusions DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest.
Collapse
Affiliation(s)
- Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece
| | - Theodosios Theodosiou
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece
| | - Ioannis S Vizirianakis
- School of Pharmacy, Laboratory of Pharmacology, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece.
| |
Collapse
|
19
|
Shehu A, Barbará D, Molloy K. A Survey of Computational Methods for Protein Function Prediction. BIG DATA ANALYTICS IN GENOMICS 2016:225-298. [DOI: 10.1007/978-3-319-41279-5_7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
20
|
Hamashima K, Tomita M, Kanai A. Expansion of Noncanonical V-Arm-Containing tRNAs in Eukaryotes. Mol Biol Evol 2015; 33:530-40. [DOI: 10.1093/molbev/msv253] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
21
|
Sharma R, Xia X, Riess K, Bauer R, Thines M. Comparative Genomics Including the Early-Diverging Smut Fungus Ceraceosorus bombacis Reveals Signatures of Parallel Evolution within Plant and Animal Pathogens of Fungi and Oomycetes. Genome Biol Evol 2015; 7:2781-98. [PMID: 26314305 PMCID: PMC4607519 DOI: 10.1093/gbe/evv162] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Ceraceosorus bombacis is an early-diverging lineage of smut fungi and a pathogen of cotton trees (Bombax ceiba). To study the evolutionary genomics of smut fungi in comparison with other fungal and oomycete pathogens, the genome of C. bombacis was sequenced and comparative genomic analyses were performed. The genome of 26.09 Mb encodes for 8,024 proteins, of which 576 are putative-secreted effector proteins (PSEPs). Orthology analysis revealed 30 ortholog PSEPs among six Ustilaginomycotina genomes, the largest groups of which are lytic enzymes, such as aspartic peptidase and glycoside hydrolase. Positive selection analyses revealed the highest percentage of positively selected PSEPs in C. bombacis compared with other Ustilaginomycotina genomes. Metabolic pathway analyses revealed the absence of genes encoding for nitrite and nitrate reductase in the genome of the human skin pathogen Malassezia globosa, but these enzymes are present in the sequenced plant pathogens in smut fungi. Interestingly, these genes are also absent in cultivable oomycete animal pathogens, while nitrate reductase has been lost in cultivable oomycete plant pathogens. Similar patterns were also observed for obligate biotrophic and hemi-biotrophic fungal and oomycete pathogens. Furthermore, it was found that both fungal and oomycete animal pathogen genomes are lacking cutinases and pectinesterases. Overall, these findings highlight the parallel evolution of certain genomic traits, revealing potential common evolutionary trajectories among fungal and oomycete pathogens, shaping the pathogen genomes according to their lifestyle.
Collapse
Affiliation(s)
- Rahul Sharma
- Biodiversity and Climate Research Centre (BiK-F), Frankfurt (Main), Germany Department for Biological Sciences, Institute of Ecology, Evolution and Diversity, Goethe University, Frankfurt (Main), Germany Senckenberg Gesellschaft für Naturforschung, Frankfurt (Main), Germany Cluster for Integrative Fungal Research (IPF), Frankfurt (Main), Germany
| | - Xiaojuan Xia
- Biodiversity and Climate Research Centre (BiK-F), Frankfurt (Main), Germany Department for Biological Sciences, Institute of Ecology, Evolution and Diversity, Goethe University, Frankfurt (Main), Germany Senckenberg Gesellschaft für Naturforschung, Frankfurt (Main), Germany
| | - Kai Riess
- Plant Evolutionary Ecology, Institute of Evolution and Ecology, University of Tübingen, Germany
| | - Robert Bauer
- Plant Evolutionary Ecology, Institute of Evolution and Ecology, University of Tübingen, Germany
| | - Marco Thines
- Biodiversity and Climate Research Centre (BiK-F), Frankfurt (Main), Germany Department for Biological Sciences, Institute of Ecology, Evolution and Diversity, Goethe University, Frankfurt (Main), Germany Senckenberg Gesellschaft für Naturforschung, Frankfurt (Main), Germany Cluster for Integrative Fungal Research (IPF), Frankfurt (Main), Germany
| |
Collapse
|
22
|
Ahmed HA, Bhattacharyya DK, Kalita JK. Core and peripheral connectivity based cluster analysis over PPI network. Comput Biol Chem 2015; 59 Pt B:32-41. [PMID: 26362299 DOI: 10.1016/j.compbiolchem.2015.08.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Revised: 07/31/2015] [Accepted: 08/18/2015] [Indexed: 10/23/2022]
Abstract
A number of methods have been proposed in the literature of protein-protein interaction (PPI) network analysis for detection of clusters in the network. Clusters are identified by these methods using various graph theoretic criteria. Most of these methods have been found time consuming due to involvement of preprocessing and post processing tasks. In addition, they do not achieve high precision and recall consistently and simultaneously. Moreover, the existing methods do not employ the idea of core-periphery structural pattern of protein complexes effectively to extract clusters. In this paper, we introduce a clustering method named CPCA based on a recent observation by researchers that a protein complex in a PPI network is arranged as a relatively dense core region and additional proteins weakly connected to the core. CPCA uses two connectivity criterion functions to identify core and peripheral regions of the cluster. To locate initial node of a cluster we introduce a measure called DNQ (Degree based Neighborhood Qualification) index that evaluates tendency of the node to be part of a cluster. CPCA performs well when compared with well-known counterparts. Along with protein complex gold standards, a co-localization dataset has also been used for validation of the results.
Collapse
|
23
|
Bernardes JS, Vieira FRJ, Costa LMM, Zaverucha G. Evaluation and improvements of clustering algorithms for detecting remote homologous protein families. BMC Bioinformatics 2015; 16:34. [PMID: 25651949 PMCID: PMC4339679 DOI: 10.1186/s12859-014-0445-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 11/26/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An important problem in computational biology is the automatic detection of protein families (groups of homologous sequences). Clustering sequences into families is at the heart of most comparative studies dealing with protein evolution, structure, and function. Many methods have been developed for this task, and they perform reasonably well (over 0.88 of F-measure) when grouping proteins with high sequence identity. However, for highly diverged proteins the performance of these methods can be much lower, mainly because a common evolutionary origin is not deduced directly from sequence similarity. To the best of our knowledge, a systematic evaluation of clustering methods over distant homologous proteins is still lacking. RESULTS We performed a comparative assessment of four clustering algorithms: Markov Clustering (MCL), Transitive Clustering (TransClust), Spectral Clustering of Protein Sequences (SCPS), and High-Fidelity clustering of protein sequences (HiFix), considering several datasets with different levels of sequence similarity. Two types of similarity measures, required by the clustering sequence methods, were used to evaluate the performance of the algorithms: the standard measure obtained from sequence-sequence comparisons, and a novel measure based on profile-profile comparisons, used here for the first time. CONCLUSIONS The results reveal low clustering performance for the highly divergent datasets when the standard measure was used. However, the novel measure based on profile-profile comparisons substantially improved the performance of the four methods, especially when very low sequence identity datasets were evaluated. We also performed a parameter optimization step to determine the best configuration for each clustering method. We found that TransClust clearly outperformed the other methods for most datasets. This work also provides guidelines for the practical application of clustering sequence methods aimed at detecting accurately groups of related protein sequences.
Collapse
Affiliation(s)
- Juliana S Bernardes
- Programa de Engenharia de Sistemas e Computação, COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil. .,Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Biologie Computationnelle et Quantitative, Paris, France.
| | - Fabio R J Vieira
- Programa de Engenharia de Sistemas e Computação, COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Lygia M M Costa
- Engenharia de Computação e Informação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Gerson Zaverucha
- Programa de Engenharia de Sistemas e Computação, COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.
| |
Collapse
|
24
|
Gan S, Cosgrove DA, Gardiner EJ, Gillet VJ. Investigation of the use of spectral clustering for the analysis of molecular data. J Chem Inf Model 2014; 54:3302-19. [PMID: 25379955 DOI: 10.1021/ci500480b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Spectral clustering involves placing objects into clusters based on the eigenvectors and eigenvalues of an associated matrix. The technique was first applied to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727-1733] who demonstrated its use on a very small dataset of 125 COX-2 inhibitors. We have determined suitable parameters for spectral clustering using a wide variety of molecular descriptors and several datasets of a few thousand compounds and compared the results of clustering using a nonoverlapping version of Brewer's use of Sarker and Boyer's algorithm with that of Ward's and k-means clustering. We then replaced the exact eigendecomposition method with two different approximate methods and concluded that Singular Value Decomposition is the most appropriate method for clustering larger compound collections of up to 100,000 compounds. We have also used spectral clustering with the Tversky coefficient to generate two sets of clusters linked by a common set of eigenvalues and have used this novel approach to cluster sets of fragments such as those used in fragment-based drug design.
Collapse
Affiliation(s)
- Sonny Gan
- Information School, University of Sheffield , Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom
| | | | | | | |
Collapse
|
25
|
Weyenberg G, Huggins PM, Schardl CL, Howe DK, Yoshida R. kdetrees: Non-parametric estimation of phylogenetic tree distributions. Bioinformatics 2014; 30:2280-7. [PMID: 24764459 PMCID: PMC4176058 DOI: 10.1093/bioinformatics/btu258] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/04/2014] [Accepted: 04/22/2014] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.
Collapse
Affiliation(s)
- Grady Weyenberg
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Peter M Huggins
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Christopher L Schardl
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Daniel K Howe
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Ruriko Yoshida
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| |
Collapse
|
26
|
Papanikolaou N, Pavlopoulos GA, Pafilis E, Theodosiou T, Schneider R, Satagopam VP, Ouzounis CA, Eliopoulos AG, Promponas VJ, Iliopoulos I. BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery. ACTA ACUST UNITED AC 2014; 30:3249-56. [PMID: 25100685 DOI: 10.1093/bioinformatics/btu524] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
SUMMARY The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Evangelos Pafilis
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Theodosios Theodosiou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Reinhard Schneider
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Venkata P Satagopam
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Christos A Ouzounis
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Aristides G Eliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Vasilis J Promponas
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| |
Collapse
|
27
|
Gabaldón T, Martin T, Marcet-Houben M, Durrens P, Bolotin-Fukuhara M, Lespinet O, Arnaise S, Boisnard S, Aguileta G, Atanasova R, Bouchier C, Couloux A, Creno S, Almeida Cruz J, Devillers H, Enache-Angoulvant A, Guitard J, Jaouen L, Ma L, Marck C, Neuvéglise C, Pelletier E, Pinard A, Poulain J, Recoquillay J, Westhof E, Wincker P, Dujon B, Hennequin C, Fairhead C. Comparative genomics of emerging pathogens in the Candida glabrata clade. BMC Genomics 2013; 14:623. [PMID: 24034898 PMCID: PMC3847288 DOI: 10.1186/1471-2164-14-623] [Citation(s) in RCA: 147] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 07/31/2013] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Candida glabrata follows C. albicans as the second or third most prevalent cause of candidemia worldwide. These two pathogenic yeasts are distantly related, C. glabrata being part of the Nakaseomyces, a group more closely related to Saccharomyces cerevisiae. Although C. glabrata was thought to be the only pathogenic Nakaseomyces, two new pathogens have recently been described within this group: C. nivariensis and C. bracarensis. To gain insight into the genomic changes underlying the emergence of virulence, we sequenced the genomes of these two, and three other non-pathogenic Nakaseomyces, and compared them to other sequenced yeasts. RESULTS Our results indicate that the two new pathogens are more closely related to the non-pathogenic N. delphensis than to C. glabrata. We uncover duplications and accelerated evolution that specifically affected genes in the lineage preceding the group containing N. delphensis and the three pathogens, which may provide clues to the higher propensity of this group to infect humans. Finally, the number of Epa-like adhesins is specifically enriched in the pathogens, particularly in C. glabrata. CONCLUSIONS Remarkably, some features thought to be the result of adaptation of C. glabrata to a pathogenic lifestyle, are present throughout the Nakaseomyces, indicating these are rather ancient adaptations to other environments. Phylogeny suggests that human pathogenesis evolved several times, independently within the clade. The expansion of the EPA gene family in pathogens establishes an evolutionary link between adhesion and virulence phenotypes. Our analyses thus shed light onto the relationships between virulence and the recent genomic changes that occurred within the Nakaseomyces. SEQUENCE ACCESSION NUMBERS Nakaseomyces delphensis: CAPT01000001 to CAPT01000179Candida bracarensis: CAPU01000001 to CAPU01000251Candida nivariensis: CAPV01000001 to CAPV01000123Candida castellii: CAPW01000001 to CAPW01000101Nakaseomyces bacillisporus: CAPX01000001 to CAPX01000186.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
- Comparative Genomics Group, CRG-Centre for Genomic Regulation, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Tiphaine Martin
- Université de Bordeaux 1, LaBRI, INRIA Bordeaux Sud-Ouest (MAGNOME), Talence, F-33405, France
| | - Marina Marcet-Houben
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Pascal Durrens
- Université de Bordeaux 1, LaBRI, INRIA Bordeaux Sud-Ouest (MAGNOME), Talence, F-33405, France
| | - Monique Bolotin-Fukuhara
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Olivier Lespinet
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Sylvie Arnaise
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Stéphanie Boisnard
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Gabriela Aguileta
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Ralitsa Atanasova
- APHP, Hôpital St Antoine, Service de Parasitologie-Mycologie, and UMR S945, Inserm, Université P. M. Curie, Paris, France
| | - Christiane Bouchier
- Département Génomes et Génétique, Institut Pasteur, Plate-forme Génomique, rue du Dr. Roux, Paris, F-75015, France
| | - Arnaud Couloux
- CEA, IG, DSV, Genoscope, 2 rue Gaston Crémieux, Evry Cedex, 91057, France
| | - Sophie Creno
- Département Génomes et Génétique, Institut Pasteur, Plate-forme Génomique, rue du Dr. Roux, Paris, F-75015, France
| | - Jose Almeida Cruz
- Architecture et Réactivité de l‘ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, Strasbourg Cedex, F-67084, France
- Present adress: Champalimaud Foundation, Av. Brasília, Lisboa, 1400-038, Portugal
| | - Hugo Devillers
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Adela Enache-Angoulvant
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
- APHP, Hôpital Bicêtre, Service de Microbiologie, Paris, France
| | - Juliette Guitard
- APHP, Hôpital St Antoine, Service de Parasitologie-Mycologie, and UMR S945, Inserm, Université P. M. Curie, Paris, France
| | - Laure Jaouen
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Laurence Ma
- Département Génomes et Génétique, Institut Pasteur, Plate-forme Génomique, rue du Dr. Roux, Paris, F-75015, France
| | - Christian Marck
- Institut de biologie et technologies de Saclay (iBiTec-S), Gif-sur-Yvette cedex, 91191, France
| | | | - Eric Pelletier
- CEA, IG, DSV, Genoscope, 2 rue Gaston Crémieux, Evry Cedex, 91057, France
| | - Amélie Pinard
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Julie Poulain
- CEA, IG, DSV, Genoscope, 2 rue Gaston Crémieux, Evry Cedex, 91057, France
| | - Julien Recoquillay
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| | - Eric Westhof
- Architecture et Réactivité de l‘ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, Strasbourg Cedex, F-67084, France
| | - Patrick Wincker
- CEA, IG, DSV, Genoscope, 2 rue Gaston Crémieux, Evry Cedex, 91057, France
| | - Bernard Dujon
- Institut Pasteur, Unité de Génétique moléculaires des levures, UMR3525 CNRS, UFR927, Université P. M. Curie, 25 rue du Docteur Roux, Paris Cedex15, F75724, France
| | - Christophe Hennequin
- APHP, Hôpital St Antoine, Service de Parasitologie-Mycologie, and UMR S945, Inserm, Université P. M. Curie, Paris, France
| | - Cécile Fairhead
- Institut de Génétique et Microbiologie, UMR8621 CNRS-Université Paris Sud, Bât 400, UFR des Sciences, Orsay Cedex, F 91405, France
| |
Collapse
|
28
|
Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences. PLoS One 2013; 8:e75458. [PMID: 24069417 PMCID: PMC3771926 DOI: 10.1371/journal.pone.0075458] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Accepted: 08/19/2013] [Indexed: 11/19/2022] Open
Abstract
Identifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences.
Collapse
|
29
|
Matsui M, Tomita M, Kanai A. Comprehensive computational analysis of bacterial CRP/FNR superfamily and its target motifs reveals stepwise evolution of transcriptional networks. Genome Biol Evol 2013; 5:267-82. [PMID: 23315382 PMCID: PMC3590769 DOI: 10.1093/gbe/evt004] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The cAMP receptor protein (CRP)/fumarate and nitrate reduction regulatory protein (FNR)-type transcription factors (TFs) are members of a well-characterized global TF family in bacteria and have two conserved domains: the N-terminal ligand-binding domain for small molecules (e.g., cAMP, NO, or O2) and the C-terminal DNA-binding domain. Although the CRP/FNR-type TFs recognize very similar consensus DNA target sequences, they can regulate different sets of genes in response to environmental signals. To clarify the evolution of the CRP/FNR-type TFs throughout the bacterial kingdom, we undertook a comprehensive computational analysis of a large number of annotated CRP/FNR-type TFs and the corresponding bacterial genomes. Based on the amino acid sequence similarities among 1,455 annotated CRP/FNR-type TFs, spectral clustering classified the TFs into 12 representative groups, and stepwise clustering allowed us to propose a possible process of protein evolution. Although each cluster mainly consists of functionally distinct members (e.g., CRP, NTC, FNR-like protein, and FixK), FNR-related TFs are found in several groups and are distributed in a wide range of bacterial phyla in the sequence similarity network. This result suggests that the CRP/FNR-type TFs originated from an ancestral FNR protein, involved in nitrogen fixation. Furthermore, a phylogenetic profiling analysis showed that combinations of TFs and their target genes have fluctuated dynamically during bacterial evolution. A genome-wide analysis of TF-binding sites also suggested that the diversity of the transcriptional regulatory system was derived by the stepwise adaptation of TF-binding sites to the evolution of TFs.
Collapse
Affiliation(s)
- Motomu Matsui
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan
| | | | | |
Collapse
|
30
|
Pretsch K, Kemen A, Kemen E, Geiger M, Mendgen K, Voegele R. The rust transferred proteins-a new family of effector proteins exhibiting protease inhibitor function. MOLECULAR PLANT PATHOLOGY 2013; 14:96-107. [PMID: 22998218 PMCID: PMC6638633 DOI: 10.1111/j.1364-3703.2012.00832.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Only few fungal effectors have been described to be delivered into the host cell during obligate biotrophic interactions. RTP1p, from the rust fungi Uromyces fabae and U. striatus, was the first fungal protein for which localization within the host cytoplasm could be demonstrated directly. We investigated the occurrence of RTP1 homologues in rust fungi and examined the structural and biochemical characteristics of the corresponding gene products. The analysis of 28 homologues showed that members of the RTP family are most likely to occur ubiquitously in rust fungi and to be specific to the order Pucciniales. Sequence analyses indicated that the structure of the RTPp effectors is bipartite, consisting of a variable N-terminus and a conserved and structured C-terminus. The characterization of Uf-RTP1p mutants showed that four conserved cysteine residues sustain structural stability. Furthermore, the C-terminal domain exhibits similarities to that of cysteine protease inhibitors, and it was shown that Uf-RTP1p and Us-RTP1p are able to inhibit proteolytic activity in Pichia pastoris culture supernatants. We conclude that the RTP1p homologues constitute a rust fungi-specific family of modular effector proteins comprising an unstructured N-terminal domain and a structured C-terminal domain, which exhibit protease inhibitory activity possibly associated with effector function during biotrophic interactions.
Collapse
Affiliation(s)
- Klara Pretsch
- Phytopathologie, Fachbereich Biologie, Universität Konstanz, 78457, Konstanz, Germany
| | | | | | | | | | | |
Collapse
|
31
|
|
32
|
Sasidharan R, Nepusz T, Swarbreck D, Huala E, Paccanaro A. GFam: a platform for automatic annotation of gene families. Nucleic Acids Res 2012; 40:e152. [PMID: 22790981 PMCID: PMC3479161 DOI: 10.1093/nar/gks631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Collapse
Affiliation(s)
- Rajkumar Sasidharan
- Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | | | | | | | | |
Collapse
|
33
|
Santi-Rocca J, Smith S, Weber C, Pineda E, Hon CC, Saavedra E, Olivos-García A, Rousseau S, Dillies MA, Coppée JY, Guillén N. Endoplasmic reticulum stress-sensing mechanism is activated in Entamoeba histolytica upon treatment with nitric oxide. PLoS One 2012; 7:e31777. [PMID: 22384074 PMCID: PMC3286455 DOI: 10.1371/journal.pone.0031777] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2011] [Accepted: 01/18/2012] [Indexed: 12/14/2022] Open
Abstract
The Endoplasmic Reticulum stores calcium and is a site of protein synthesis and modification. Changes in ER homeostasis lead to stress responses with an activation of the unfolded protein response (UPR). The Entamoeba histolytica endomembrane system is simple compared to those of higher eukaryotes, as a canonical ER is not observed. During amoebiasis, an infection of the human intestine and liver by E. histolytica, nitric oxide (NO) triggers an apoptotic-like event preceded by an impairment of energy production and a loss of important parasite pathogenic features. We address the question of how this ancient eukaryote responds to stress induced by immune components (i.e. NO) and whether stress leads to ER changes and subsequently to an UPR. Gene expression analysis suggested that NO triggers stress responses marked by (i) dramatic up-regulation of hsp genes although a bona fide UPR is absent; (ii) induction of DNA repair and redox gene expression and iii) up-regulation of glycolysis-related gene expression. Enzymology approaches demonstrate that NO directly inhibits glycolysis and enhance cysteine synthase activity. Using live imaging and confocal microscopy we found that NO dramatically provokes extensive ER fragmentation. ER fission in E. histolytica appears as a protective response against stress, as it has been recently proposed for neuron self-defense during neurologic disorders. Chronic ER stress is also involved in metabolic diseases including diabetes, where NO production reduces ER calcium levels and activates cell death. Our data highlighted unique cellular responses of interest to understand the mechanisms of parasite death during amoebiasis.
Collapse
Affiliation(s)
- Julien Santi-Rocca
- Institut Pasteur, Unité Biologie Cellulaire du Parasitisme, Paris, France.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, Bader GD, Ferrin TE. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 2011; 12:436. [PMID: 22070249 PMCID: PMC3262844 DOI: 10.1186/1471-2105-12-436] [Citation(s) in RCA: 443] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Accepted: 11/09/2011] [Indexed: 12/02/2022] Open
Abstract
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.
Collapse
Affiliation(s)
- John H Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA.
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Papanikolaou N, Pafilis E, Nikolaou S, Ouzounis CA, Iliopoulos I, Promponas VJ. BioTextQuest: a web-based biomedical text mining suite for concept discovery. ACTA ACUST UNITED AC 2011; 27:3327-8. [PMID: 21994227 DOI: 10.1093/bioinformatics/btr564] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. AVAILABILITY http://biotextquest.biol.ucy.ac.cy CONTACT vprobon@ucy.ac.cy; iliopj@med.uoc.gr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nikolas Papanikolaou
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, CY 1678, Nicosia, Cyprus, Greece
| | | | | | | | | | | |
Collapse
|
36
|
Wang K, Fan W, Cai L, Huang B, Lu C. Genetic analysis of the capsular polysaccharide synthesis locus in 15 Streptococcus suis serotypes. FEMS Microbiol Lett 2011; 324:117-24. [PMID: 22092812 DOI: 10.1111/j.1574-6968.2011.02394.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Revised: 08/22/2011] [Accepted: 08/22/2011] [Indexed: 11/29/2022] Open
Abstract
The capsular polysaccharide (CPS) synthesis locus of 13 Streptococcus suis serotypes (serotype 1, 3, 4, 5, 7, 8, 9, 10, 14, 19, 23, 25 and 1/2) was sequenced and compared with that of serotype 2 and 16. The CPS synthesis locus of these 15 serotypes falls into two genetic groups. The locus is located on the chromosome between orfZ and aroA. All the translated proteins in the CPS synthesis locus were clustered into 127 homology groups using the tribemcl algorithm. The general organization of the locus suggested that the CPS of S. suis could be synthesized by the Wzy-dependent pathway. The capsule of serotypes 3, 4, 5, 7, 9, 10, 19 and 23 was predicted to be amino-polysaccharide. Sialic acid was predicted to be present in the capsule of serotypes 1, 2, 14, 16 and 1/2. The characteristics of the CPS synthesis locus suggest that some genes may have been imported into S. suis (or their ancestors) on multiple occasions from different and unknown sources.
Collapse
Affiliation(s)
- Kaicheng Wang
- Key Lab Animal Disease Diagnostic & Immunology, Ministry of Agriculture, Nanjing Agricultural University, Nanjing, China
| | | | | | | | | |
Collapse
|
37
|
Lees JG, Heriche JK, Morilla I, Ranea JA, Orengo CA. Systematic computational prediction of protein interaction networks. Phys Biol 2011; 8:035008. [PMID: 21572181 DOI: 10.1088/1478-3975/8/3/035008] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use.
Collapse
Affiliation(s)
- J G Lees
- Research Department of Structural & Molecular Biology, University College London, London, UK.
| | | | | | | | | |
Collapse
|
38
|
Is There a Best Quality Metric for Graph Clusters? MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2011. [DOI: 10.1007/978-3-642-23780-5_13] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|