1
|
Danis-Wlodarczyk KM, Wozniak DJ, Abedon ST. Treating Bacterial Infections with Bacteriophage-Based Enzybiotics: In Vitro, In Vivo and Clinical Application. Antibiotics (Basel) 2021; 10:1497. [PMID: 34943709 PMCID: PMC8698926 DOI: 10.3390/antibiotics10121497] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Revised: 11/23/2021] [Accepted: 11/29/2021] [Indexed: 12/14/2022] Open
Abstract
Over the past few decades, we have witnessed a surge around the world in the emergence of antibiotic-resistant bacteria. This global health threat arose mainly due to the overuse and misuse of antibiotics as well as a relative lack of new drug classes in development pipelines. Innovative antibacterial therapeutics and strategies are, therefore, in grave need. For the last twenty years, antimicrobial enzymes encoded by bacteriophages, viruses that can lyse and kill bacteria, have gained tremendous interest. There are two classes of these phage-derived enzymes, referred to also as enzybiotics: peptidoglycan hydrolases (lysins), which degrade the bacterial peptidoglycan layer, and polysaccharide depolymerases, which target extracellular or surface polysaccharides, i.e., bacterial capsules, slime layers, biofilm matrix, or lipopolysaccharides. Their features include distinctive modes of action, high efficiency, pathogen specificity, diversity in structure and activity, low possibility of bacterial resistance development, and no observed cross-resistance with currently used antibiotics. Additionally, and unlike antibiotics, enzybiotics can target metabolically inactive persister cells. These phage-derived enzymes have been tested in various animal models to combat both Gram-positive and Gram-negative bacteria, and in recent years peptidoglycan hydrolases have entered clinical trials. Here, we review the testing and clinical use of these enzymes.
Collapse
Affiliation(s)
| | - Daniel J. Wozniak
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH 43210, USA;
- Department of Microbiology, The Ohio State University, Columbus, OH 43210, USA;
| | - Stephen T. Abedon
- Department of Microbiology, The Ohio State University, Columbus, OH 43210, USA;
| |
Collapse
|
2
|
Yang YS, Fernandez B, Lagorce A, Aloin V, De Guillen KM, Boyer JB, Dedieu A, Confalonieri F, Armengaud J, Roumestand C. Prioritizing targets for structural biology through the lens of proteomics: the archaeal protein TGAM_1934 from Thermococcus gammatolerans. Proteomics 2015; 15:114-23. [PMID: 25359407 DOI: 10.1002/pmic.201300535] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 10/01/2014] [Accepted: 10/24/2014] [Indexed: 11/09/2022]
Abstract
ORFans are hypothetical proteins lacking any significant sequence similarity with other proteins. Here, we highlighted by quantitative proteomics the TGAM_1934 ORFan from the hyperradioresistant Thermococcus gammatolerans archaeon as one of the most abundant hypothetical proteins. This protein has been selected as a priority target for structure determination on the basis of its abundance in three cellular conditions. Its solution structure has been determined using multidimensional heteronuclear NMR spectroscopy. TGAM_1934 displays an original fold, although sharing some similarities with the 3D structure of the bacterial ortholog of frataxin, CyaY, a protein conserved in bacteria and eukaryotes and involved in iron-sulfur cluster biogenesis. These results highlight the potential of structural proteomics in prioritizing ORFan targets for structure determination based on quantitative proteomics data. The proteomic data and structure coordinates have been deposited to the ProteomeXchange with identifier PXD000402 (http://proteomecentral.proteomexchange.org/dataset/PXD000402) and Protein Data Bank under the accession number 2mcf, respectively.
Collapse
Affiliation(s)
- Yin-Shan Yang
- Centre de Biochimie Structurale, Universités de Montpellier, Montpellier, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Milani L, Ghiselli F, Guerra D, Breton S, Passamonti M. A comparative analysis of mitochondrial ORFans: new clues on their origin and role in species with doubly uniparental inheritance of mitochondria. Genome Biol Evol 2013; 5:1408-34. [PMID: 23824218 PMCID: PMC3730352 DOI: 10.1093/gbe/evt101] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Despite numerous comparative mitochondrial genomics studies revealing that animal mitochondrial genomes are highly conserved in terms of gene content, supplementary genes are sometimes found, often arising from gene duplication. Mitochondrial ORFans (ORFs having no detectable homology and unknown function) were found in bivalve molluscs with Doubly Uniparental Inheritance (DUI) of mitochondria. In DUI animals, two mitochondrial lineages are present: one transmitted through females (F-type) and the other through males (M-type), each showing a specific and conserved ORF. The analysis of 34 mitochondrial major Unassigned Regions of Musculista senhousia F- and M-mtDNA allowed us to verify the presence of novel mitochondrial ORFs in this species and to compare them with ORFs from other species with ascertained DUI, with other bivalves and with animals showing new mitochondrial elements. Overall, 17 ORFans from nine species were analyzed for structure and function. Many clues suggest that the analyzed ORFans arose from endogenization of viral genes. The co-option of such novel genes by viral hosts may have determined some evolutionary aspects of host life cycle, possibly involving mitochondria. The structure similarity of DUI ORFans within evolutionary lineages may also indicate that they originated from independent events. If these novel ORFs are in some way linked to DUI establishment, a multiple origin of DUI has to be considered. These putative proteins may have a role in the maintenance of sperm mitochondria during embryo development, possibly masking them from the degradation processes that normally affect sperm mitochondria in species with strictly maternal inheritance.
Collapse
Affiliation(s)
- Liliana Milani
- Dipartimento di Scienze Biologiche, Geologiche ed Ambientali, University of Bologna, Bologna, Italy.
| | | | | | | | | |
Collapse
|
4
|
Faure G, Callebaut I. Comprehensive repertoire of foldable regions within whole genomes. PLoS Comput Biol 2013; 9:e1003280. [PMID: 24204229 PMCID: PMC3812050 DOI: 10.1371/journal.pcbi.1003280] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 08/15/2013] [Indexed: 11/30/2022] Open
Abstract
In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA). These hydrophobic clusters mainly correspond to regular secondary structures, which together form structured or foldable regions. Genome-wide analyses revealed that SEG-HCA is opposite of disorder predictors, both addressing distinct structural states. Interestingly, there is however an overlap between the two predictions, including small segments of disordered sequences, which undergo coupled folding and binding. SEG-HCA thus gives access to these specific domains, which are generally poorly represented in domain databases. Comparison of the whole set of SEG-HCA predictions with the Conserved Domain Database (CDD) also highlighted a wide proportion of predicted large (length >50 amino acids) segments, which are CDD orphan. These orphan sequences may either correspond to highly divergent members of already known families or belong to new families of domains. Their comprehensive description thus opens new avenues to investigate new functional and/or structural features, which remained so far uncovered. Altogether, the data described here provide new insights into the protein architecture and organization throughout the three kingdoms of life. Spontaneous or induced folding into a specific 3D structure is a key property of proteins to perform their biological functions. Folded 3D structures of proteins perform specific functions, including interactions with other proteins. Intrinsically disordered regions also mediate interaction, gaining structure only when bound to a target protein. In both cases, hydrophobicity generally plays a major role in the protein segment “foldability”. Here, we developed an original procedure to identify foldable segments from only the information of a single amino acid sequence and to explore protein structures at a proteomic scale. Our approach goes beyond the simple consideration of mean hydrophobicity, by including the secondary structure information through the use of a two-dimensional transposition of the sequence. The developed procedure, combined with disorder predictors, may facilitate the specific identification of small segments that undergo coupled folding and binding. Combined with the analysis of specific domain databases, it also highlights orphan foldable segments, which remain yet uncharacterized.
Collapse
Affiliation(s)
- Guilhem Faure
- CNRS, UPMC Univ Paris 6, IMPMC, UMR7590 - IUC, Paris, France
| | | |
Collapse
|
5
|
Faure G, Callebaut I. Identification of hidden relationships from the coupling of hydrophobic cluster analysis and domain architecture information. ACTA ACUST UNITED AC 2013; 29:1726-33. [PMID: 23677940 DOI: 10.1093/bioinformatics/btt271] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Describing domain architecture is a critical step in the functional characterization of proteins. However, some orphan domains do not match any profile stored in dedicated domain databases and are thereby difficult to analyze. RESULTS We present here an original novel approach, called TREMOLO-HCA, for the analysis of orphan domain sequences and inspired from our experience in the use of Hydrophobic Cluster Analysis (HCA). Hidden relationships between protein sequences can be more easily identified from the PSI-BLAST results, using information on domain architecture, HCA plots and the conservation degree of amino acids that may participate in the protein core. This can lead to reveal remote relationships with known families of domains, as illustrated here with the identification of a hidden Tudor tandem in the human BAHCC1 protein and a hidden ET domain in the Saccharomyces cerevisiae Taf14p and human AF9 proteins. The results obtained in such a way are consistent with those provided by HHPRED, based on pairwise comparisons of HHMs. Our approach can, however, be applied even in absence of domain profiles or known 3D structures for the identification of novel families of domains. It can also be used in a reverse way for refining domain profiles, by starting from known protein domain families and identifying highly divergent members, hitherto considered as orphan. AVAILABILITY We provide a possible integration of this approach in an open TREMOLO-HCA package, which is fully implemented in python v2.7 and is available on request. Instructions are available at http://www.impmc.upmc.fr/∼callebau/tremolohca.html. CONTACT isabelle.callebaut@impmc.upmc.fr SUPPLEMENTARY INFORMATION Supplementary Data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guilhem Faure
- IMPMC, UMR7590, CNRS, Université Pierre et Marie Curie-Paris6, Paris Cedex 05, France
| | | |
Collapse
|
6
|
Zhu JY, Fu ZQ, Chen L, Xu H, Chrzas J, Rose J, Wang BC. Structure of the Archaeoglobus fulgidus orphan ORF AF1382 determined by sulfur SAD from a moderately diffracting crystal. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2012; 68:1242-52. [PMID: 22948926 PMCID: PMC3489105 DOI: 10.1107/s0907444912026212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 06/09/2012] [Indexed: 12/22/2022]
Abstract
The crystal structure of the 11.14 kDa orphan ORF 1382 from Archaeoglobus fulgidus (AF1382) has been determined by sulfur SAD phasing using a moderately diffracting crystal and 1.9 Å wavelength synchrotron X-rays. AF1382 was selected as a structural genomics target by the Southeast Collaboratory for Structural Genomics (SECSG) since sequence analyses showed that it did not belong to the Pfam-A database and thus could represent a novel fold. The structure was determined by exploiting longer wavelength X-rays and data redundancy to increase the anomalous signal in the data. AF1382 is a 95-residue protein containing five S atoms associated with four methionine residues and a single cysteine residue that yields a calculated Bijvoet ratio (ΔF(anom)/F) of 1.39% for 1.9 Å wavelength X-rays. Coupled with an average Bijvoet redundancy of 25 (two 360° data sets), this produced an excellent electron-density map that allowed 69 of the 95 residues to be automatically fitted. The S-SAD model was then manually completed and refined (R = 23.2%, R(free) = 26.8%) to 2.3 Å resolution (PDB entry 3o3k). High-resolution data were subsequently collected from a better diffracting crystal using 0.97 Å wavelength synchrotron X-rays and the S-SAD model was refined (R = 17.9%, R(free) = 21.4%) to 1.85 Å resolution (PDB entry 3ov8). AF1382 has a winged-helix-turn-helix structure common to many DNA-binding proteins and most closely resembles the N-terminal domain (residues 1-82) of the Rio2 kinase from A. fulgidus, which has been shown to bind DNA, and a number of MarR-family transcriptional regulators, suggesting a similar DNA-binding function for AF1382. The analysis also points out the advantage gained from carrying out data reduction and structure determination on-site while the crystal is still available for further data collection.
Collapse
Affiliation(s)
- Jin-Yi Zhu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | - Zheng-Qing Fu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
- Southeast Regional Collaborative Access Team (SER-CAT), Advanced Photon Source, Argonne National Laboratory, Argonne, Illinois, USA
| | - Lirong Chen
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | - Hao Xu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | - John Chrzas
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
- Southeast Regional Collaborative Access Team (SER-CAT), Advanced Photon Source, Argonne National Laboratory, Argonne, Illinois, USA
| | - John Rose
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
- Southeast Regional Collaborative Access Team (SER-CAT), Advanced Photon Source, Argonne National Laboratory, Argonne, Illinois, USA
| | - Bi-Cheng Wang
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA
- Southeast Regional Collaborative Access Team (SER-CAT), Advanced Photon Source, Argonne National Laboratory, Argonne, Illinois, USA
| |
Collapse
|
7
|
Ancient origin of the divergent forms of leucyl-tRNA synthetases in the Halobacteriales. BMC Evol Biol 2012; 12:85. [PMID: 22694720 PMCID: PMC3436685 DOI: 10.1186/1471-2148-12-85] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2011] [Accepted: 04/27/2012] [Indexed: 02/01/2023] Open
Abstract
Background Horizontal gene transfer (HGT) has greatly impacted the genealogical history of many lineages, particularly for prokaryotes, with genes frequently moving in and out of a line of descent. Many genes that were acquired by a lineage in the past likely originated from ancestral relatives that have since gone extinct. During the course of evolution, HGT has played an essential role in the origin and dissemination of genetic and metabolic novelty. Results Three divergent forms of leucyl-tRNA synthetase (LeuRS) exist in the archaeal order Halobacteriales, commonly known as haloarchaea. Few haloarchaeal genomes have the typical archaeal form of this enzyme and phylogenetic analysis indicates it clusters within the Euryarchaeota as expected. The majority of sequenced halobacterial genomes possess a bacterial form of LeuRS. Phylogenetic reconstruction puts this larger group of haloarchaea at the base of the bacterial domain. The most parsimonious explanation is that an ancient transfer of LeuRS took place from an organism related to the ancestor of the bacterial domain to the haloarchaea. The bacterial form of LeuRS further underwent gene duplications and/or gene transfers within the haloarchaea, with some genomes possessing two distinct types of bacterial LeuRS. The cognate tRNALeu also reveals two distinct clusters for the haloarchaea; however, these tRNALeu clusters do not coincide with the groupings found in the LeuRS tree, revealing that LeuRS evolved independently of its cognate tRNA. Conclusions The study of leucyl-tRNA synthetase in haloarchaea illustrates the importance of gene transfer originating in lineages that went extinct since the transfer occurred. The haloarchaeal LeuRS and tRNALeu did not co-evolve.
Collapse
|
8
|
Abstract
Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations.
Collapse
|
9
|
Domitrovic T, Kozlov G, Freire JCG, Masuda CA, da Silva Almeida M, Montero-Lomeli M, Atella GC, Matta-Camacho E, Gehring K, Kurtenbach E. Structural and functional study of YER067W, a new protein involved in yeast metabolism control and drug resistance. PLoS One 2010; 5:e11163. [PMID: 20567505 PMCID: PMC2887356 DOI: 10.1371/journal.pone.0011163] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2009] [Accepted: 04/21/2010] [Indexed: 11/19/2022] Open
Abstract
The genome of Saccharomyces cerevisiae is arguably the best studied eukaryotic genome, and yet, it contains approximately 1000 genes that are still relatively uncharacterized. As the majority of these ORFs have no homologs with characterized sequence or protein structure, traditional sequence-based approaches cannot be applied to deduce their biological function. Here, we characterize YER067W, a conserved gene of unknown function that is strongly induced in response to many stress conditions and repressed in drug resistant yeast strains. Gene expression patterns of YER067W and its paralog YIL057C suggest an involvement in energy metabolism. We show that yeast lacking YER067W display altered levels of reserve carbohydrates and a growth deficiency in media that requires aerobic metabolism. Impaired mitochondrial function and overall reduction of ergosterol content in the YER067W deleted strain explained the observed 2- and 4-fold increase in resistance to the drugs fluconazole and amphotericin B, respectively. Cell fractionation and immunofluorescence microscopy revealed that Yer067w is associated with cellular membranes despite the absence of a transmembrane domain in the protein. Finally, the 1.7 A resolution crystal structure of Yer067w shows an alpha-beta fold with low similarity to known structures and a putative functional site.YER067W's involvement with aerobic energetic metabolism suggests the assignment of the gene name RGI1, standing for respiratory growth induced 1. Altogether, the results shed light on a previously uncharacterized protein family and provide basis for further studies of its apparent role in energy metabolism control and drug resistance.
Collapse
Affiliation(s)
- Tatiana Domitrovic
- Programa de Biologia Molecular e Estrutural, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Yomtovian I, Teerakulkittipong N, Lee B, Moult J, Unger R. Composition bias and the origin of ORFan genes. ACTA ACUST UNITED AC 2010; 26:996-9. [PMID: 20231229 PMCID: PMC2853687 DOI: 10.1093/bioinformatics/btq093] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Motivation: Intriguingly, sequence analysis of genomes reveals that a large number of genes are unique to each organism. The origin of these genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple measure called ‘composition bias’, based on the deviation of the amino acid composition of a given sequence from the average composition of all proteins of a given genome. Results: For a set of 47 prokaryotic genomes, we show that the amino acid composition bias of real proteins, random ‘proteins’ (created by using the nucleotide frequencies of each genome) and ‘proteins’ translated from intergenic regions are distinct. For ORFans, we observed a correlation between their composition bias and their relative evolutionary age. Recent ORFan proteins have compositions more similar to those of random ‘proteins’, while the compositions of more ancient ORFan proteins are more similar to those of the set of all proteins of the organism. This observation is consistent with an evolutionary scenario wherein ORFan genes emerged and underwent a large number of random mutations and selection, eventually adapting to the composition preference of their organism over time. Contact:ron@biocoml.ls.biu.ac.il Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Inbal Yomtovian
- Department of Computer Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | | | | | | | | |
Collapse
|
11
|
Spriggs RV, Jones S. RNA-binding residues in sequence space: conservation and interaction patterns. Comput Biol Chem 2009; 33:397-403. [PMID: 19700370 DOI: 10.1016/j.compbiolchem.2009.07.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2009] [Revised: 07/14/2009] [Accepted: 07/18/2009] [Indexed: 10/20/2022]
Abstract
RNA-binding proteins (RBPs) perform fundamental and diverse functions within the cell. Approximately 15% of proteins sequences are annotated as RNA-binding, but with a significant number of proteins without functional annotation, many RBPs are yet to be identified. A percentage of uncharacterised proteins can be annotated by transferring functional information from proteins sharing significant sequence homology. However, genomes contain a significant number of orphan open reading frames (ORFs) that do not share significant sequence similarity to other ORFs, but correspond to functional proteins. Hence methods for protein function annotation that go beyond sequence homology are essential. One method of annotation is the identification of ligands that bind to proteins, through the characterisation of binding site residues. In the current work RNA-binding residues (RBRs) are characterised in terms of their evolutionary conservation and the patterns they form in sequence space. The potential for such characteristics to be used to identify RBPs from sequence is then evaluated. In the current work the conservation of residues in 261 RBPs is compared for (a) RBRs vs. non-RBRs surface residues, and for (b) specific and non-specific RBRs. The analysis shows that RBRs are more conserved than other surface residues, and RBRs hydrogen-bonded to the RNA backbone are more conserved than those making hydrogen bonds to RNA bases. This observed conservation of RBRs was then used to inform the construction of RBR sequence patterns from known protein-RNA structures. A series of RBR patterns were generated for a case study protein aspartyl-tRNA synthetase bound to tRNA; and used to differentiate between RNA-binding and non-RNA-binding protein sequences. Six sequence patterns performed with high precision values of >80% and recall values 7 times that of an homology search. When the method was expanded to the complete dataset of 261 proteins, many patterns were of poor predictive value, as they had not been manipulated on a family-specific basis. However, two patterns with precision values > or = 85% were used to make function predictions for a set of hypothetical proteins. This revealed a number of potential RBPs that require experimental verification.
Collapse
Affiliation(s)
- Ruth V Spriggs
- Department of Chemistry and Biochemistry, School of Life Sciences, John Maynard-Smith Building, University of Sussex, Falmer, Brighton, BN1 9QG, UK
| | | |
Collapse
|
12
|
Comparative genomics using microarrays reveals divergence and loss of virulence-associated genes in host-specific strains of the insect pathogen Metarhizium anisopliae. EUKARYOTIC CELL 2009; 8:888-98. [PMID: 19395664 DOI: 10.1128/ec.00058-09] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Many strains of Metarhizium anisopliae have broad host ranges, but others are specialists and adapted to particular hosts. Patterns of gene duplication, divergence, and deletion in three generalist and three specialist strains were investigated by heterologous hybridization of genomic DNA to genes from the generalist strain Ma2575. As expected, major life processes are highly conserved, presumably due to purifying selection. However, up to 7% of Ma2575 genes were highly divergent or absent in specialist strains. Many of these sequences are conserved in other fungal species, suggesting that there has been rapid evolution and loss in specialist Metarhizium genomes. Some poorly hybridizing genes in specialists were functionally coordinated, indicative of reductive evolution. These included several involved in toxin biosynthesis and sugar metabolism in root exudates, suggesting that specialists are losing genes required to live in alternative hosts or as saprophytes. Several components of mobile genetic elements were also highly divergent or lost in specialists. Exceptionally, the genome of the specialist cricket pathogen Ma443 contained extra insertion elements that might play a role in generating evolutionary novelty. This study throws light on the abundance of orphans in genomes, as 15% of orphan sequences were found to be rapidly evolving in the Ma2575 lineage.
Collapse
|
13
|
Abstract
ORFan genes can constitute a large fraction of a bacterial genome, but due to their lack of homologs, their functions have remained largely unexplored. To determine if particular features of ORFan-encoded proteins promote their presence in a genome, we analyzed properties of ORFans that originated over a broad evolutionary timescale. We also compared ORFan genes to another class of acquired genes, heterogeneous occurrence in prokaryotes (HOPs), which have homologs in other bacteria. A total of 54 ORFan and HOP genes selected from different phylogenetic depths in the Escherichia coli lineage were cloned, expressed, purified, and subjected to circular dichroism (CD) spectroscopy. A majority of genes could be expressed, but only 18 yielded sufficient soluble protein for spectral analysis. Of these, half were significantly alpha-helical, three were predominantly beta-sheet, and six were of intermediate/indeterminate structure. Although a higher proportion of HOPs yielded soluble proteins with resolvable secondary structures, ORFans resembled HOPs with regard to most of the other features tested. Overall, we found that those ORFan and HOP genes that have persisted in the E. coli lineage were more likely to encode soluble and folded proteins, more likely to display environmental modulation of their gene expression, and by extrapolation, are more likely to be functional.
Collapse
Affiliation(s)
- Hema Prasad Narra
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ, USA
| | - Matthew H. J. Cordes
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ, USA
| | - Howard Ochman
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
14
|
Genomes and knowledge - a questionable relationship? Trends Microbiol 2008; 16:512-9. [PMID: 18819801 DOI: 10.1016/j.tim.2008.08.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Revised: 08/15/2008] [Accepted: 08/21/2008] [Indexed: 11/22/2022]
Abstract
The availability of bacterial genome sequences has ushered in an era of post-genomic research - accelerating and often enabling molecular genetic analyses. For bacteriologists focussing on an individual bacterium, comparing genomes has also led to a greater understanding of their favoured organism through contextualization. But how does the value of such contextualization vary with the number of available genomes? It seems that for most genome metrics, comparison against approximately 100 genomes is sufficient, with comparison against further genomes not considerably affecting the contextual knowledge gained. It appears that quality, rather than quantity, might be the most important factor when comparing genomes.
Collapse
|
15
|
Luhua S, Ciftci-Yilmaz S, Harper J, Cushman J, Mittler R. Enhanced tolerance to oxidative stress in transgenic Arabidopsis plants expressing proteins of unknown function. PLANT PHYSIOLOGY 2008; 148:280-92. [PMID: 18614705 PMCID: PMC2528079 DOI: 10.1104/pp.108.124875] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2008] [Accepted: 07/02/2008] [Indexed: 05/19/2023]
Abstract
Over one-quarter of all plant genes encode proteins of unknown function that can be further classified as proteins with obscure features (POFs), which lack currently defined motifs or domains, or proteins with defined features, which contain at least one previously defined domain or motif. Although empirical data in the form of transcriptome and proteome profiling suggest that many of these proteins play important roles in plants, their functional characterization remains one of the main challenges in modern biology. To begin the functional annotation of proteins with unknown function, which are involved in the oxidative stress response of Arabidopsis (Arabidopsis thaliana), we generated transgenic Arabidopsis plants that constitutively expressed 23 different POFs (four of which were specific to Arabidopsis) and 18 different proteins with defined features. All were previously found to be expressed in response to oxidative stress in Arabidopsis. Transgenic plants were tested for their tolerance to oxidative stress imposed by paraquat or t-butyl hydroperoxide, or were subjected to osmotic, salinity, cold, and heat stresses. More than 70% of all expressed proteins conferred tolerance to oxidative stress. In contrast, >90% of the expressed proteins did not confer enhanced tolerance to the other abiotic stresses tested, and approximately 50% rendered plants more susceptible to osmotic or salinity stress. Two Arabidopsis-specific POFs, and an Arabidopsis and Brassica-specific protein of unknown function, conferred enhanced tolerance to oxidative stress. Our findings suggest that tolerance to oxidative stress involves mechanisms and pathways that are unknown at present, including some that are specific to Arabidopsis or the Brassicaceae.
Collapse
Affiliation(s)
- Song Luhua
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno Nevada 89557, USA
| | | | | | | | | |
Collapse
|
16
|
Koharudin LMI, Viscomi AR, Jee JG, Ottonello S, Gronenborn AM. The evolutionarily conserved family of cyanovirin-N homologs: structures and carbohydrate specificity. Structure 2008; 16:570-84. [PMID: 18400178 DOI: 10.1016/j.str.2008.01.015] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Revised: 01/09/2008] [Accepted: 01/11/2008] [Indexed: 11/28/2022]
Abstract
Solution structures for three members of the recently discovered cyanovirin-N (CV-N) homolog family of lectins have been determined. Cyanovirin-N homologs (CVNHs) from Tuber borchii, Ceratopteris richardii, and Neurospora crassa, representing each of the three phylogenetic groups, were selected. All proteins exhibit the same fold, and the overall structures resemble that of the founding member of the family, CV-N, albeit with noteworthy differences in loop conformation and detailed local structure. Since no data are available regarding the proteins' function or their natural ligands, extensive carbohydrate-binding studies were conducted. We delineated ligand-binding sites on all three proteins by nuclear magnetic resonance and identified which sugars interact by array screening. The number and location of binding sites vary for the three proteins, and different ligand specificities exist. Potential physiological roles for two family members, TbCVNH and NcCVNH, were probed in nutrition deprivation experiments that suggest a possible involvement of these proteins in lifestyle-related responses.
Collapse
Affiliation(s)
- Leonardus M I Koharudin
- Department of Structural Biology, School of Medicine, University of Pittsburgh, Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | | | | | | | | |
Collapse
|
17
|
|
18
|
Smith DG, Gawryluk RM, Spencer DF, Pearlman RE, Siu KM, Gray MW. Exploring the Mitochondrial Proteome of the Ciliate Protozoon Tetrahymena thermophila: Direct Analysis by Tandem Mass Spectrometry. J Mol Biol 2007; 374:837-63. [DOI: 10.1016/j.jmb.2007.09.051] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2007] [Revised: 09/18/2007] [Accepted: 09/19/2007] [Indexed: 11/27/2022]
|
19
|
Gollery M, Harper J, Cushman J, Mittler T, Mittler R. POFs: what we don't know can hurt us. TRENDS IN PLANT SCIENCE 2007; 12:492-496. [PMID: 17928258 DOI: 10.1016/j.tplants.2007.08.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Revised: 08/07/2007] [Accepted: 08/14/2007] [Indexed: 05/25/2023]
Abstract
Over a quarter of all eukaryotic genes encode proteins with obscure features that lack currently defined motifs or domains (POFs). Interestingly, most of the differences in gene repertoire among species were recently found to be attributed to POFs. A comparison of the Arabidopsis, rice and poplar genomes reveals that Arabidopsis contains 5069 POFs, of which 2045 have no obvious homologs in rice or poplar and are likely to be involved in species- or phylogenetic-specific functions in Arabidopsis. The study of POFs is an important endeavor that will shed much needed light on the genetic properties that make any given plant species unique. Furthermore, with respect to many species-specific features, such studies show that we seem to be limited in what we can expect to learn from a model plant such as Arabidopsis.
Collapse
Affiliation(s)
- Martin Gollery
- TimeLogic - a Division of Active Motif, Incline Village, NV 89451, USA
| | - Jeff Harper
- Department of Biochemistry and Molecular Biology, MS200, University of Nevada, Reno, NV 89557, USA
| | - John Cushman
- Department of Biochemistry and Molecular Biology, MS200, University of Nevada, Reno, NV 89557, USA
| | - Taliah Mittler
- Department of Biochemistry and Molecular Biology, MS200, University of Nevada, Reno, NV 89557, USA
| | - Ron Mittler
- Department of Biochemistry and Molecular Biology, MS200, University of Nevada, Reno, NV 89557, USA; Department of Plant Science, Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, Israel.
| |
Collapse
|
20
|
Fujishima K, Komasa M, Kitamura S, Suzuki H, Tomita M, Kanai A. Proteome-wide prediction of novel DNA/RNA-binding proteins using amino acid composition and periodicity in the hyperthermophilic archaeon Pyrococcus furiosus. DNA Res 2007; 14:91-102. [PMID: 17573465 PMCID: PMC2779898 DOI: 10.1093/dnares/dsm011] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I–III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87% of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.
Collapse
Affiliation(s)
- Kosuke Fujishima
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Mizuki Komasa
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Sayaka Kitamura
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Haruo Suzuki
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, Fujisawa 252-8520, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-8520, Japan
| | - Akio Kanai
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0017, Japan
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-8520, Japan
- To whom correspondence should be addressed. Tel. +81 235-29-0524. Fax. +81 235-29-0525. E-mail:
| |
Collapse
|
21
|
Gollery M, Harper J, Cushman J, Mittler T, Girke T, Zhu JK, Bailey-Serres J, Mittler R. What makes species unique? The contribution of proteins with obscure features. Genome Biol 2007; 7:R57. [PMID: 16859532 PMCID: PMC1779552 DOI: 10.1186/gb-2006-7-7-r57] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2006] [Revised: 04/28/2006] [Accepted: 06/27/2006] [Indexed: 11/23/2022] Open
Abstract
An analysis of proteins with obscure features in ten eukaryotic genomes revealed that the majority are species-specific. Background Proteins with obscure features (POFs), which lack currently defined motifs or domains, represent between 18% and 38% of a typical eukaryotic proteome. To evaluate the contribution of this class of proteins to the diversity of eukaryotes, we performed a comparative analysis of the predicted proteomes derived from 10 different sequenced genomes, including budding and fission yeast, worm, fly, mosquito, Arabidopsis, rice, mouse, rat, and human. Results Only 1,650 protein groups were found to be conserved among these proteomes (BLAST E-value threshold of 10-6). Of these, only three were designated as POFs. Surprisingly, we found that, on average, 60% of the POFs identified in these 10 proteomes (44,236 in total) were species specific. In contrast, only 7.5% of the proteins with defined features (PDFs) were species specific (17,554 in total). As a group, POFs appear similar to PDFs in their relative contribution to biological functions, as indicated by their expression, participation in protein-protein interactions and association with mutant phenotypes. However, POF have more predicted disordered structure than PDFs, implying that they may exhibit preferential involvement in species-specific regulatory and signaling networks. Conclusion Because the majority of eukaryotic POFs are not well conserved, and by definition do not have defined domains or motifs upon which to formulate a functional working hypothesis, understanding their biochemical and biological functions will require species-specific investigations.
Collapse
Affiliation(s)
- Martin Gollery
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - Jeff Harper
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - John Cushman
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - Taliah Mittler
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - Thomas Girke
- Center for Plant Cell Biology, University Of California, Riverside, CA 92521, USA
| | - Jian-Kang Zhu
- Center for Plant Cell Biology, University Of California, Riverside, CA 92521, USA
| | - Julia Bailey-Serres
- Center for Plant Cell Biology, University Of California, Riverside, CA 92521, USA
| | - Ron Mittler
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| |
Collapse
|
22
|
Koike R, Kinoshita K, Kidera A. Probabilistic alignment detects remote homology in a pair of protein sequences without homologous sequence information. Proteins 2007; 66:655-63. [PMID: 17152080 DOI: 10.1002/prot.21240] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Dynamic programming (DP) and its heuristic algorithms are the most fundamental methods for similarity searches of amino acid sequences. Their detection power has been improved by including supplemental information, such as homologous sequences in the profile method. Here, we describe a method, probabilistic alignment (PA), that gives improved detection power, but similarly to the original DP, uses only a pair of amino acid sequences. Receiver operating characteristic (ROC) analysis demonstrated that the PA method is far superior to BLAST, and that its sensitivity and selectivity approach to those of PSI-BLAST. Particularly for orphan proteins having few homologues in the database, PA exhibits much better performance than PSI-BLAST. On the basis of this observation, we applied the PA method to a homology search of two orphan proteins, Latexin and Resuscitation-promoting factor domain. Their molecular functions have been described based on structural similarities, but sequence homologues have not been identified by PSI-BLAST. PA successfully detected sequence homologues for the two proteins and confirmed that the observed structural similarities are the result of an evolutional relationship.
Collapse
Affiliation(s)
- Ryotaro Koike
- Global Scientific Information and Computing Center, Tokyo Institute of Technology, Ookayama, Tokyo 152-8550, Japan
| | | | | |
Collapse
|
23
|
Abstract
Background Bacterial genomes develop new mechanisms to tide them over the imposing conditions they encounter during the course of their evolution. Acquisition of new genes by lateral gene transfer may be one of the dominant ways of adaptation in bacterial genome evolution. Lateral gene transfer provides the bacterial genome with a new set of genes that help it to explore and adapt to new ecological niches. Methods A maximum likelihood analysis was done on the five sequenced corynebacterial genomes to model the rates of gene insertions/deletions at various depths of the phylogeny. Results The study shows that most of the laterally acquired genes are transient and the inferred rates of gene movement are higher on the external branches of the phylogeny and decrease as the phylogenetic depth increases. The newly acquired genes are under relaxed selection and evolve faster than their older counterparts. Analysis of some of the functionally characterised LGTs in each species has indicated that they may have a possible adaptive role. Conclusion The five Corynebacterial genomes sequenced to date have evolved by acquiring between 8 – 14% of their genomes by LGT and some of these genes may have a role in adaptation.
Collapse
Affiliation(s)
- Pradeep Reddy Marri
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Weilong Hao
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| |
Collapse
|
24
|
Renesto P, Abergel C, Decloquement P, Moinier D, Azza S, Ogata H, Fourquet P, Gorvel JP, Claverie JM. Mimivirus giant particles incorporate a large fraction of anonymous and unique gene products. J Virol 2006; 80:11678-85. [PMID: 16971431 PMCID: PMC1642625 DOI: 10.1128/jvi.00940-06] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Acanthamoeba polyphaga mimivirus is the largest known virus in both particle size and genome complexity. Its 1.2-Mb genome encodes 911 proteins, among which only 298 have predicted functions. The composition of purified isolated virions was analyzed by using a combined electrophoresis/mass spectrometry approach allowing the identification of 114 proteins. Besides the expected major structural components, the viral particle packages 12 proteins unambiguously associated with transcriptional machinery, 3 proteins associated with DNA repair, and 2 topoisomerases. Other main functional categories represented in the virion include oxidative pathways and protein modification. More than half of the identified virion-associated proteins correspond to anonymous genes of unknown function, including 45 "ORFans." As demonstrated by both Western blotting and immunogold staining, some of these "ORFans," which lack any convincing similarity in the sequence databases, are endowed with antigenic properties. Thus, anonymous and unique genes constituting the majority of the mimivirus gene complement encode bona fide proteins that are likely to participate in well-integrated processes.
Collapse
Affiliation(s)
- Patricia Renesto
- Unité des Rickettsies, CNRS UMR 6020, IFR-48, Faculté de Médecine, 27 Boulevard Jean Moulin, 13385 Marseille, France.
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Yin Y, Fischer D. On the origin of microbial ORFans: quantifying the strength of the evidence for viral lateral transfer. BMC Evol Biol 2006; 6:63. [PMID: 16914045 PMCID: PMC1559721 DOI: 10.1186/1471-2148-6-63] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2006] [Accepted: 08/16/2006] [Indexed: 11/10/2022] Open
Abstract
Background: The origin of microbial ORFans, ORFs having no detectable homology to other ORFs in the databases, is one of the unexplained puzzles of the post-genomic era. Several hypothesis on the origin of ORFans have been suggested in the last few years, most of which based on selected, relatively small, subsets of ORFans. One of the hypotheses for the origin of ORFans is that they have been acquired thru lateral transfer from viruses. Here we carry out a comprehensive, genome-wide study on the origins of ORFans to quantify the strength of current evidence supporting this hypothesis. Results: We performed similarity searches by querying all current ORFans against the public virus protein database. Surprisingly, we found that only 2.8% of all microbial ORFans have detectable homologs in viruses, while the percentage of non-ORFans with detectable homologs in viruses is 7.9%, a significantly higher figure. This suggests that the current evidence for the origin of ORFans from lateral transfer from viruses is at best weak. However, an analysis of individual genomes revealed a number of organisms with much higher percentages, many of them belonging to the Firmicutes and Gamma-proteobacteria. We provide evidence suggesting that the current virus database may be biased towards those viruses attacking Firmicutes and Gamma-proteobacteria. Conclusion: We conclude that as more viral genomes are sequenced, more microbial ORFans will find homologs in viruses, but this trend may vary much for individual genomes. Thus, lateral transfer from viruses alone is unlikely to explain the origin of the majority of ORFans in the majority of prokaryotes and consequently, other, not necessarily exclusive, mechanisms are likely to better explain the origin of the increasing number of ORFans.
Collapse
Affiliation(s)
- Yanbin Yin
- Computer Science and Engineering Dept. 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, US
| | - Daniel Fischer
- Computer Science and Engineering Dept. 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, US
- Bioinformatics/Dept. of Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
| |
Collapse
|
26
|
Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D. Prediction of RNA binding sites in proteins from amino acid sequence. RNA (NEW YORK, N.Y.) 2006; 12:1450-62. [PMID: 16790841 PMCID: PMC1524891 DOI: 10.1261/rna.2197306] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/18/2005] [Accepted: 05/13/2006] [Indexed: 05/10/2023]
Abstract
RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.).
Collapse
Affiliation(s)
- Michael Terribilini
- Bioinformatics and Computationa Biology, Graduate Program, Iowa State University, Ames, Iowa 50010, USA.
| | | | | | | | | | | |
Collapse
|
27
|
Hao W, Golding GB. The fate of laterally transferred genes: life in the fast lane to adaptation or death. Genome Res 2006; 16:636-43. [PMID: 16651664 PMCID: PMC1457040 DOI: 10.1101/gr.4746406] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Large-scale genome arrangement plays an important role in bacterial genome evolution. A substantial number of genes can be inserted into, deleted from, or rearranged within genomes during evolution. Detecting or inferring gene insertions/deletions is of interest because such information provides insights into bacterial genome evolution and speciation. However, efficient inference of genome events is difficult because genome comparisons alone do not generally supply enough information to distinguish insertions, deletions, and other rearrangements. In this study, homologous genes from the complete genomes of 13 closely related bacteria were examined. The presence or absence of genes from each genome was cataloged, and a maximum likelihood method was used to infer insertion/deletion rates according to the phylogenetic history of the taxa. It was found that whole gene insertions/deletions in genomes occur at rates comparable to or greater than the rate of nucleotide substitution and that higher insertion/deletion rates are often inferred to be present at the tips of the phylogeny with lower rates on more ancient interior branches. Recently transferred genes are under faster and relaxed evolution compared with more ancient genes. Together, this implies that many of the lineage-specific insertions are lost quickly during evolution and that perhaps a few of the genes inserted by lateral transfer are niche specific.
Collapse
Affiliation(s)
- Weilong Hao
- Department of Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1
- Corresponding author.E-mail ; fax (905) 522-6066
| |
Collapse
|
28
|
Skowronek KJ, Kosinski J, Bujnicki JM. Theoretical model of restriction endonuclease HpaI in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis. Proteins 2006; 63:1059-68. [PMID: 16498623 DOI: 10.1002/prot.20920] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Type II restriction enzymes are commercially important deoxyribonucleases and very attractive targets for protein engineering of new specificities. At the same time they are a very challenging test bed for protein structure prediction methods. Typically, enzymes that recognize different sequences show little or no amino acid sequence similarity to each other and to other proteins. Based on crystallographic analyses that revealed the same PD-(D/E)XK fold for more than a dozen case studies, they were nevertheless considered to be related until the combination of bioinformatics and mutational analyses has demonstrated that some of these proteins belong to other, unrelated folds PLD, HNH, and GIY-YIG. As a part of a large-scale project aiming at identification of a three-dimensional fold for all type II REases with known sequences (currently approximately 1000 proteins), we carried out preliminary structure prediction and selected candidates for experimental validation. Here, we present the analysis of HpaI REase, an ORFan with no detectable homologs, for which we detected a structural template by protein fold recognition, constructed a model using the FRankenstein monster approach and identified a number of residues important for the DNA binding and catalysis. These predictions were confirmed by site-directed mutagenesis and in vitro analysis of the mutant proteins. The experimentally validated model of HpaI will serve as a low-resolution structural platform for evolutionary considerations in the subgroup of blunt-cutting REases with different specificities. The research protocol developed in the course of this work represents a streamlined version of the previously used techniques and can be used in a high-throughput fashion to build and validate models for other enzymes, especially ORFans that exhibit no sequence similarity to any other protein in the database.
Collapse
|
29
|
Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 2006; 34:1066-80. [PMID: 16481312 PMCID: PMC1373602 DOI: 10.1093/nar/gkj494] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
30
|
Renesto P, Azza S, Dolla A, Fourquet P, Vestris G, Gorvel JP, Raoult D. Proteome analysis of Rickettsia conorii by two-dimensional gel electrophoresis coupled with mass spectrometry. FEMS Microbiol Lett 2005; 245:231-8. [PMID: 15837377 DOI: 10.1016/j.femsle.2005.03.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2005] [Revised: 03/04/2005] [Accepted: 03/04/2005] [Indexed: 10/25/2022] Open
Abstract
The availability of genome sequence offers the opportunity to further expand our knowledge about proteins expressed by Rickettsia conorii, strictly intracellular bacterium responsible for Mediterranean spotted fever. Using two-dimensional polyacrylamide gel electrophoresis combined with MALDI-TOF mass spectrometry, we established the first reference map of R. conorii proteome. This approach also allowed identification of GroEL as the major antigen recognized by rabbit serum and sera of infected patients. Altogether, this work opens the way to characterize the proteome of R. conorii, to compare protein profiles of different isolates or of bacteria maintained under different experimental conditions and to identify immunogenic proteins as potential vaccine targets.
Collapse
Affiliation(s)
- Patricia Renesto
- Unité des Rickettsies, CNRS UMR 6020, IFR-48, Faculté de Médecine, 27 Boulevard Jean Moulin, 13385 Marseille, France.
| | | | | | | | | | | | | |
Collapse
|
31
|
Doolittle RF. Evolutionary aspects of whole-genome biology. Curr Opin Struct Biol 2005; 15:248-53. [PMID: 15963888 DOI: 10.1016/j.sbi.2005.04.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2005] [Revised: 02/08/2005] [Accepted: 04/12/2005] [Indexed: 11/28/2022]
Abstract
A decade of access to whole-genome sequences has been increasingly revealing about the informational network relating all living organisms. Although at one point there was concern that extensive horizontal gene transfer might hopelessly muddle phylogenies, it has not proved a severe hindrance. The melding of sequence and structural information is being used to great advantage, and the prospect exists that some of the earliest aspects of life on Earth can be reconstructed, including the invention of biosynthetic and metabolic pathways. Still, some fundamental phylogenetic problems remain, including determining the root--if there is one--of the historical relationship between Archaea, Bacteria and Eukarya.
Collapse
Affiliation(s)
- Russell F Doolittle
- Department of Chemistry & Biochemistry, University of California San Diego, La Jolla, CA 92093-0314, USA.
| |
Collapse
|
32
|
Saunders NFW, Goodchild A, Raftery M, Guilhaus M, Curmi PMG, Cavicchioli R. Predicted roles for hypothetical proteins in the low-temperature expressed proteome of the Antarctic archaeon Methanococcoides burtonii. J Proteome Res 2005; 4:464-72. [PMID: 15822923 DOI: 10.1021/pr049797+] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Using liquid chromatography-mass spectrometry, 528 proteins were identified that are expressed during growth at 4 degrees C in the cold adapted archaeon, Methanococcoides burtonii. Of those, 135 were annotated previously as unique or conserved hypothetical proteins. We have performed a comprehensive, integrated analysis of the latter proteins using threading, InterProScan, predicted subcellular localization and visualization of conserved gene context across multiple prokaryotic genomes. Functional information was obtained for 55 proteins, providing new insight into the physiology of M. burtonii. Many of the proteins were predicted to be involved in DNA/RNA binding or modification and cell signaling, suggesting a complex, uncharacterized regulatory network controlling cellular processes during growth at low-temperature. Novel enzymatic functions were predicted for several proteins, including a putative candidate gene for the posttranslational modification of the key methanogenesis enzyme coenzyme M methyl reductase. A bacterial-like CRISPR locus was identified as a strong candidate for archaeal-bacterial lateral gene transfer. Gene context analysis proved a valuable augmentation to the other predictive methods in several cases, by revealing conserved gene associations and annotations in other microbial genomes. Our results underscore the importance of addressing the "hypothetical protein problem" for a complete understanding of cell physiology.
Collapse
Affiliation(s)
- Neil F W Saunders
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, 2052, NSW, Australia
| | | | | | | | | | | |
Collapse
|
33
|
Siew N, Saini HK, Fischer D. A putative novel alpha/beta hydrolase ORFan family in Bacillus. FEBS Lett 2005; 579:3175-82. [PMID: 15922334 DOI: 10.1016/j.febslet.2005.04.030] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2004] [Revised: 03/25/2005] [Accepted: 04/11/2005] [Indexed: 10/25/2022]
Abstract
A large number of sequences in each newly sequenced genome correspond to lineage and species-specific proteins, also known as ORFans. Amongst these ORFans, a large number are sequences with unknown structures and functions. We have identified a family of sequences, annotated as hypothetical proteins, which are specific to Bacillus and have carried out a computational study aimed at characterizing this family. Fold-recognition methods predict that these sequences belong to the alpha/beta hydrolase fold. We suggest possible catalytic triads for the ORFans and propose a hypothesis regarding the possible families within the alpha/beta hydrolase superfamily to which they may belong.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University, Beer-Sheva 84105, Israel
| | | | | |
Collapse
|
34
|
Todd AE, Marsden RL, Thornton JM, Orengo CA. Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures. J Mol Biol 2005; 348:1235-60. [PMID: 15854658 DOI: 10.1016/j.jmb.2005.03.037] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2004] [Revised: 02/28/2005] [Accepted: 03/15/2005] [Indexed: 11/27/2022]
Abstract
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.
Collapse
Affiliation(s)
- Annabel E Todd
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| | | | | | | |
Collapse
|
35
|
Ekman D, Björklund AK, Frey-Skött J, Elofsson A. Multi-domain Proteins in the Three Kingdoms of Life: Orphan Domains and Other Unassigned Regions. J Mol Biol 2005; 348:231-43. [PMID: 15808866 DOI: 10.1016/j.jmb.2005.02.007] [Citation(s) in RCA: 169] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2004] [Revised: 01/31/2005] [Accepted: 02/02/2005] [Indexed: 11/17/2022]
Abstract
Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. Here, we have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, MAS, in addition to Pfam-A and SCOP domains. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we show that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have approximately 65% multi-domain proteins, while the prokaryotes consist of approximately 40% multi-domain proteins. However, these numbers are strongly dependent on the exact choice of cut-off for domains in unassigned regions. In conclusion, all eukaryotes have similar fractions of multi-domain proteins and disorder, whereas a high fraction of repeating domain is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions.
Collapse
Affiliation(s)
- Diana Ekman
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | | | |
Collapse
|