1
|
Koenig N, Baa-Puyoulet P, Lafont A, Lorenzo-Colina I, Navratil V, Leprêtre M, Sugier K, Delorme N, Garnero L, Queau H, Gaillard JC, Kielbasa M, Ayciriex S, Calevro F, Chaumot A, Charles H, Armengaud J, Geffard O, Degli Esposti D. Proteogenomic reconstruction of organ-specific metabolic networks in an environmental sentinel species, the amphipod Gammarus fossarum. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY. PART D, GENOMICS & PROTEOMICS 2024; 52:101323. [PMID: 39276751 DOI: 10.1016/j.cbd.2024.101323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/03/2024] [Accepted: 09/06/2024] [Indexed: 09/17/2024]
Abstract
Metabolic pathways are affected by the impacts of environmental contaminants underlying a large variability of toxic effects across different species. However, the systematic reconstruction of metabolic pathways remains limited in environmental sentinel species due to the lack of available genomic data in many taxa of animal diversity. In this study we used a multi-omics approach to reconstruct the most comprehensive map of metabolic pathways for a crustacean model in biomonitoring, the amphipod Gammarus fossarum in order to improve the knowledge of the metabolism of this sentinel species. We revisited the assembly of RNA-seq data by de novo approaches to reduce RNA contaminants and transcript redundancy. We also acquired extensive mass spectrometry shotgun proteomic data on several organs from a reference population of G. fossarum males and females to identify organ-specific metabolic profiles. The G. fossarum metabolic pathway reconstruction (available through the metabolic database GamfoCyc) was performed by adapting the genomic tool CycADS and we identified 377 pathways representing 7630 annotated enzymes, 2610 enzymatic reactions and the expression of 858 enzymes was experimentally validated by proteomics. To our knowledge, our analysis provides for the first time a systematic metabolic pathway reconstruction and the proteome profiles of these pathways at the organ level in this sentinel species. As an example, we show an elevated abundance in enzymes involved in ATP biosynthesis and fatty acid beta-oxidation indicative of the high-energy requirement of the gills, or the key anabolic and detoxification role of the hepatopancreatic caeca, as exemplified by the specific expression of the retinoid biosynthetic pathways and glutathione synthesis. In conclusion, the multi-omics data integration performed in this study provides new resources to investigate metabolic processes in crustacean amphipods and their role in mediating the effects of environmental contaminant exposures in sentinel species. SYNOPSIS: This study provide the first evidence that it is possible to combine multiple omics data to exhaustively describe the metabolic network of a model species in ecotoxicology, Gammarus fossarum, for which a reference genome is not yet available.
Collapse
Affiliation(s)
- Natacha Koenig
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | | | - Amélie Lafont
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Isis Lorenzo-Colina
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Vincent Navratil
- PRABI, Rhône-Alpes Bioinformatics Center, Université Lyon 1, Villeurbanne, France, UMS 3601, Institut Français de Bioinformatique, IFB-Core, Évry, France
| | - Maxime Leprêtre
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Kevin Sugier
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Nicolas Delorme
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Laura Garnero
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Hervé Queau
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Jean-Charles Gaillard
- Université Paris-Saclay, Département Médicaments et Technologies pour la Santé (DMTS), CEA, INRAE, SPI-Li2D, F-30207 Bagnols-sur-Céze, France
| | - Mélodie Kielbasa
- Université Paris-Saclay, Département Médicaments et Technologies pour la Santé (DMTS), CEA, INRAE, SPI-Li2D, F-30207 Bagnols-sur-Céze, France
| | - Sophie Ayciriex
- University of Lyon, CNRS, Institut des Sciences Analytiques, UMR 5280, 5 rue de la Doua, F-69100 Villeurbanne, France
| | | | - Arnaud Chaumot
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Hubert Charles
- INRAE, INSA Lyon, BF2I, UMR203, 69621 Villeurbanne, France
| | - Jean Armengaud
- Université Paris-Saclay, Département Médicaments et Technologies pour la Santé (DMTS), CEA, INRAE, SPI-Li2D, F-30207 Bagnols-sur-Céze, France
| | - Olivier Geffard
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France
| | - Davide Degli Esposti
- INRAE, UR RiverLy, Ecotoxicology Team, Centre de Lyon-Grenoble Auvergne Rhône Alpes, 5 rue de la Doua CS 20244, 69625 Villeurbanne, France.
| |
Collapse
|
2
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
3
|
Erban T, Sopko B, Bodrinova M, Talacko P, Chalupnikova J, Markovic M, Kamler M. Proteomic insight into the interaction of Paenibacillus larvae with honey bee larvae before capping collected from an American foulbrood outbreak: Pathogen proteins within the host, lysis signatures and interaction markers. Proteomics 2023; 23:e2200146. [PMID: 35946602 DOI: 10.1002/pmic.202200146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/17/2022] [Accepted: 07/25/2022] [Indexed: 01/05/2023]
Abstract
American foulbrood (AFB) is a devastating disease of honey bees. There remains a gap in the understanding of the interactions between the causative agent and host, so we used shotgun proteomics to gain new insights. Nano-LC-MS/MS analysis preceded visual description and Paenibacillus larvae identification in the same individual sample. A further critical part of our methodology was that larvae before capping were used as the model stage. The identification of the virulence factors SplA, PlCBP49, enolase, and DnaK in all P. larvae-positive samples was consistent with previous studies. Furthermore, the results were consistent with the array of virulence factors identified in an in vitro study of P. larvae exoprotein fractions. Although an S-layer protein and a putative bacteriocin were highlighted as important, the microbial collagenase ColA and InhA were not found in our samples. The most important virulence factor identified was isoform of neutral metalloproteinase (UniProt: V9WB82), a major protein marker responsible for the shift in the PCA biplot. This protein is associated with larval decay and together with other virulence factors (bacteriocin) can play a key role in protection against secondary invaders. Overall, this study provides new knowledge on host-pathogen interactions and a new methodical approach to study the disease.
Collapse
Affiliation(s)
- Tomas Erban
- Proteomics and Metabolomics Laboratory, Crop Research Institute, Prague, Czechia
| | - Bruno Sopko
- Proteomics and Metabolomics Laboratory, Crop Research Institute, Prague, Czechia
| | - Miroslava Bodrinova
- Proteomics and Metabolomics Laboratory, Crop Research Institute, Prague, Czechia
| | - Pavel Talacko
- Proteomics Core Facility, Faculty of Science, Charles University, Prague, Czechia
| | - Julie Chalupnikova
- Proteomics and Metabolomics Laboratory, Crop Research Institute, Prague, Czechia
| | - Martin Markovic
- Proteomics and Metabolomics Laboratory, Crop Research Institute, Prague, Czechia
| | - Martin Kamler
- Bee Research Institute at Dol, Libcice nad Vltavou, Czechia
| |
Collapse
|
4
|
Rajczewski AT, Jagtap PD, Griffin TJ. An overview of technologies for MS-based proteomics-centric multi-omics. Expert Rev Proteomics 2022; 19:165-181. [PMID: 35466851 PMCID: PMC9613604 DOI: 10.1080/14789450.2022.2070476] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Mass spectrometry-based proteomics reveals dynamic molecular signatures underlying phenotypes reflecting normal and perturbed conditions in living systems. Although valuable on its own, the proteome has only one level of moleclar information, with the genome, epigenome, transcriptome, and metabolome, all providing complementary information. Multi-omic analysis integrating information from one or more of these other domains with proteomic information provides a more complete picture of molecular contributors to dynamic biological systems. AREAS COVERED Here, we discuss the improvements to mass spectrometry-based technologies, focused on peptide-based, bottom-up approaches that have enabled deep, quantitative characterization of complex proteomes. These advances are facilitating the integration of proteomics data with other 'omic information, providing a more complete picture of living systems. We also describe the current state of bioinformatics software and approaches for integrating proteomics and other 'omics data, critical for enabling new discoveries driven by multi-omics. EXPERT COMMENTARY Multi-omics, centered on the integration of proteomics information with other 'omic information, has tremendous promise for biological and biomedical studies. Continued advances in approaches for generating deep, reliable proteomic data and bioinformatics tools aimed at integrating data across 'omic domains will ensure the discoveries offered by these multi-omic studies continue to increase.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Coauthor, Research Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| |
Collapse
|
5
|
Lau WW, Hardt M, Zhang YH, Freire M, Ruhl S. The Human Salivary Proteome Wiki: A Community-Driven Research Platform. J Dent Res 2021; 100:1510-1519. [PMID: 34032471 DOI: 10.1177/00220345211014432] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Saliva has become an attractive body fluid for on-site, remote, and real-time monitoring of oral and systemic health. At the same time, the scientific community needs a saliva-centered information platform that keeps pace with the rapid accumulation of new data and knowledge by annotating, refining, and updating the salivary proteome catalog. We developed the Human Salivary Proteome (HSP) Wiki as a public data platform for researching and retrieving custom-curated data and knowledge on the saliva proteome. The HSP Wiki is dynamically compiled and updated based on published saliva proteome studies and up-to-date protein reference records. It integrates a wide range of available information by funneling in data from established external protein, genome, transcriptome, and glycome databases. In addition, the HSP Wiki incorporates data from human disease-related studies. Users can explore the proteome of saliva simply by browsing the database, querying the available data, performing comparisons of data sets, and annotating existing protein entries using a simple, intuitive interface. The annotation process includes both user feedback and curator committee review to ensure the quality and validity of each entry. Here, we present the first overview of features and functions the HSP Wiki offers. As a saliva proteome-centric, publicly accessible database, the HSP Wiki will advance the knowledge of saliva composition and function in health and disease for users across a wide range of disciplines. As a community-based data- and knowledgebase, the HSP Wiki will serve as a worldwide platform to exchange salivary proteome information, inspire novel research ideas, and foster cross-discipline collaborations. The HSP Wiki will pave the way for harnessing the full potential of the salivary proteome for diagnosis, risk prediction, therapy of oral and systemic diseases, and preparedness for emerging infectious diseases.Database URL: https://salivaryproteome.nidcr.nih.gov/.
Collapse
Affiliation(s)
- W W Lau
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, Bethesda, MD, USA.,Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - M Hardt
- Forsyth Center for Salivary Diagnostics, Department of Applied Oral Sciences, The Forsyth Institute, Cambridge, MA, USA.,Department of Developmental Biology, Harvard School of Dental Medicine, Boston, MA, USA
| | - Y H Zhang
- Department of Bioscience Research, College of Dentistry, The University of Tennessee Health Science Center, Memphis, TN, USA
| | - M Freire
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA, USA.,Department of Infectious Diseases and Global Health, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - S Ruhl
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
6
|
Muthu M, Deenadayalan A, Ramachandran D, Paul D, Gopal J, Chun S. A state-of-art review on the agility of quantitative proteomics in tuberculosis research. Trends Analyt Chem 2018. [DOI: 10.1016/j.trac.2018.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Zheng J, Chen L, Liu L, Li H, Liu B, Zheng D, Liu T, Dong J, Sun L, Zhu Y, Yang J, Zhang X, Jin Q. Proteogenomic Analysis and Discovery of Immune Antigens in Mycobacterium vaccae. Mol Cell Proteomics 2017; 16:1578-1590. [PMID: 28733429 DOI: 10.1074/mcp.m116.065813] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2016] [Revised: 07/05/2017] [Indexed: 11/06/2022] Open
Abstract
Tuberculosis (TB) is one of the leading causes of death worldwide, especially in developing countries. Neonatal BCG vaccination occurs in various regions, but the level of protection varies in different populations. Recently, Mycobacterium vaccae is found to be an immunomodulating therapeutic agent that could confer a significant level of protection against TB. It is the only vaccine in a phase III trial from WHO to assess its efficacy and safety in preventing TB disease in people with latent TB infection. However, the mechanism of immunotherapy of M. vaccae remains poorly understood. In this study, the full genome of M. vaccae was obtained by next-generation sequencing technology, and a proteogenomic approach was successfully applied to further perform genome annotation using high resolution and high accuracy MS data. A total of 3,387 proteins (22,508 unique peptides) were identified, and 581 proteins annotated as hypothetical proteins in the genome database were confirmed. Furthermore, 38 novel protein products not annotated at the genome level were detected and validated. Additionally, the translational start sites of 445 proteins were confirmed, and 98 proteins were validated through extension of their translational start sites based on N terminus-derived peptides. The physicochemical characteristics of the identified proteins were determined. Thirty-five immunogenic proteins of M. vaccae were identified by immunoproteomic analysis, and 20 of them were selected to be expressed and validated by Western blot for immunoreactivity to serum from patients infected with M. tuberculosis The results revealed that eight of them showed strong specific reactive signals on the immunoblots. Furthermore, cellular immune response was further examined and one protein displayed a higher cellular immune level in pulmonary TB patients. Twelve identified immunogenic proteins have orthologous in H37Rv and BCG. This is the first study to obtain the full genome and annotation of M. vaccae using a proteogenomic approach, and some immunogenic proteins that were validated by immunoproteomic analysis could contribute to the understanding of the mechanism of M. vaccae immunotherapy.
Collapse
Affiliation(s)
- Jianhua Zheng
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lihong Chen
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Liguo Liu
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Haifeng Li
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Bo Liu
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Dandan Zheng
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Tao Liu
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jie Dong
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lilian Sun
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yafang Zhu
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jian Yang
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiaobing Zhang
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Qi Jin
- ‡From the MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, and Centre for Tuberculosis, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
8
|
Yu JF, Guo J, Liu QB, Hou Y, Xiao K, Chen QL, Wang JH, Sun X. A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome. Genes Genomics 2015. [DOI: 10.1007/s13258-014-0263-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Cuadrat RRC, da Serra Cruz SM, Tschoeke DA, Silva E, Tosta F, Jucá H, Jardim R, Campos MLM, Mattoso M, Dávila AMR. An orthology-based analysis of pathogenic protozoa impacting global health: an improved comparative genomics approach with prokaryotes and model eukaryote orthologs. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:524-38. [PMID: 24960463 DOI: 10.1089/omi.2013.0172] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
Collapse
Affiliation(s)
- Rafael R C Cuadrat
- 1 Computational and Systems Biology Laboratory, Computational and Systems Biology Pole, Oswaldo Cruz Institute , Fiocruz, Brazil
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Alam CM, Singh AK, Sharfuddin C, Ali S. Incidence, complexity and diversity of simple sequence repeats across potexvirus genomes. Gene 2014; 537:189-96. [DOI: 10.1016/j.gene.2014.01.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 11/15/2013] [Accepted: 01/04/2014] [Indexed: 01/18/2023]
|
11
|
In-silico analysis of simple and imperfect microsatellites in diverse tobamovirus genomes. Gene 2013; 530:193-200. [DOI: 10.1016/j.gene.2013.08.046] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 08/10/2013] [Accepted: 08/13/2013] [Indexed: 11/20/2022]
|
12
|
Lyne M, Smith RN, Lyne R, Aleksic J, Hu F, Kalderimis A, Stepan R, Micklem G. metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat060. [PMID: 23935057 PMCID: PMC4438919 DOI: 10.1093/database/bat060] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first-time users, whereas the Application Programming Interface (API) and web services provide convenient data access and tools for bioinformaticians. metabolicMine is freely available online at http://www.metabolicmine.org Database URL: http://www.metabolicmine.org.
Collapse
Affiliation(s)
- Mike Lyne
- Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1QR, UK
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Abstract
Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era.
Collapse
Affiliation(s)
- Hai Fang
- Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK.
| | | |
Collapse
|
14
|
Systems metabolic engineering in an industrial setting. Appl Microbiol Biotechnol 2013; 97:2319-26. [DOI: 10.1007/s00253-013-4738-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Revised: 01/22/2013] [Accepted: 01/23/2013] [Indexed: 10/27/2022]
|
15
|
Improving N-terminal protein annotation of Plasmodium species based on signal peptide prediction of orthologous proteins. Malar J 2012; 11:375. [PMID: 23153225 PMCID: PMC3529677 DOI: 10.1186/1475-2875-11-375] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 10/31/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Signal peptide is one of the most important motifs involved in protein trafficking and it ultimately influences protein function. Considering the expected functional conservation among orthologs it was hypothesized that divergence in signal peptides within orthologous groups is mainly due to N-terminal protein sequence misannotation. Thus, discrepancies in signal peptide prediction of orthologous proteins were used to identify misannotated proteins in five Plasmodium species. METHODS Signal peptide (SignalP) and orthology (OrthoMCL) were combined in an innovative strategy to identify orthologous groups showing discrepancies in signal peptide prediction among their protein members (Mixed groups). In a comparative analysis, multiple alignments for each of these groups and gene models were visually inspected in search of misannotated proteins and, whenever possible, alternative gene models were proposed. Thresholds for signal peptide prediction parameters were also modified to reduce their impact as a possible source of discrepancy among orthologs. Validation of new gene models was based on RT-PCR (few examples) or on experimental evidence already published (ApiLoc). RESULTS The rate of misannotated proteins was significantly higher in Mixed groups than in Positive or Negative groups, corroborating the proposed hypothesis. A total of 478 proteins were reannotated and change of signal peptide prediction from negative to positive was the most common. Reannotations triggered the conversion of almost 50% of all Mixed groups, which were further reduced by optimization of signal peptide prediction parameters. CONCLUSIONS The methodological novelty proposed here combining orthology and signal peptide prediction proved to be an effective strategy for the identification of proteins showing wrongly N-terminal annotated sequences, and it might have an important impact in the available data for genome-wide searching of potential vaccine and drug targets and proteins involved in host/parasite interactions, as demonstrated for five Plasmodium species.
Collapse
|
16
|
Abstract
To help define the biological functions of nonessential genes of Francisella novicida, we measured the growth of arrayed members of a comprehensive transposon mutant library under a variety of nutrition and stress conditions. Mutant phenotypes were identified for 37% of the genes, corresponding to ten carbon source utilization pathways, nine amino acid- and nucleotide-biosynthetic pathways, ten intrinsic antibiotic resistance traits, and six other stress resistance traits. The greatest surprise of the analysis was the large number of genotype-phenotype relationships that were not predictable from studies of Escherichia coli and other model species. The study identified candidate genes for a missing glycolysis function (phosphofructokinase), an unusual proline-biosynthetic pathway, parallel outer membrane lipid asymmetry maintenance systems, and novel antibiotic resistance functions. The analysis provides an evaluation of annotation predictions, identifies cases in which fundamental processes differ from those in model species, and helps create an empirical foundation for understanding virulence and other complex processes. The value of genome sequences as foundations for analyzing complex traits in nonmodel organisms is limited by the need to rely almost exclusively on sequence similarities to predict gene functions in annotations. Many genes cannot be assigned functions, and some predictions are incorrect or incomplete. Due to these limitations, genome-scale experimental approaches that test and extend bioinformatics-based predictions are sorely needed. In this study, we describe such an approach based on phenotypic analysis of a comprehensive, sequence-defined transposon mutant library.
Collapse
|
17
|
Pavey SA, Sutherland BJG, Leong J, Robb A, von Schalburg K, Hamon TR, Koop BF, Nielsen JL. Ecological transcriptomics of lake-type and riverine sockeye salmon (Oncorhynchus nerka). BMC Ecol 2011; 11:31. [PMID: 22136247 PMCID: PMC3295673 DOI: 10.1186/1472-6785-11-31] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Accepted: 12/02/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are a growing number of genomes sequenced with tentative functions assigned to a large proportion of the individual genes. Model organisms in laboratory settings form the basis for the assignment of gene function, and the ecological context of gene function is lacking. This work addresses this shortcoming by investigating expressed genes of sockeye salmon (Oncorhynchus nerka) muscle tissue. We compared morphology and gene expression in natural juvenile sockeye populations related to river and lake habitats. Based on previously documented divergent morphology, feeding strategy, and predation in association with these distinct environments, we expect that burst swimming is favored in riverine population and continuous swimming is favored in lake-type population. In turn we predict that morphology and expressed genes promote burst swimming in riverine sockeye and continuous swimming in lake-type sockeye. RESULTS We found the riverine sockeye population had deep, robust bodies and lake-type had shallow, streamlined bodies. Gene expression patterns were measured using a 16 k microarray, discovering 141 genes with significant differential expression. Overall, the identity and function of these genes was consistent with our hypothesis. In addition, Gene Ontology (GO) enrichment analyses with a larger set of differentially expressed genes found the "biosynthesis" category enriched for the riverine population and the "metabolism" category enriched for the lake-type population. CONCLUSIONS This study provides a framework for understanding sockeye life history from a transcriptomic perspective and a starting point for more extensive, targeted studies determining the ecological context of genes.
Collapse
Affiliation(s)
- Scott A Pavey
- National Park Service, Katmai National Park; PO Box 7, King Salmon, AK 99613, USA.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Zhao L, Liu L, Leng W, Wei C, Jin Q. A proteogenomic analysis of Shigella flexneri using 2D LC-MALDI TOF/TOF. BMC Genomics 2011; 12:528. [PMID: 22032405 PMCID: PMC3219829 DOI: 10.1186/1471-2164-12-528] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/28/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. RESULTS We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using Shigella flexneri 2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in S. flexneri 2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. CONCLUSIONS Our findings demonstrate that current Shigella genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in Shigella to perform functional studies.
Collapse
Affiliation(s)
- Lina Zhao
- State Key Laboratory for Molecular Virology and Genetic Engineering, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China
| | | | | | | | | |
Collapse
|
19
|
Do we need new antibiotics? The search for new targets and new compounds. J Ind Microbiol Biotechnol 2010; 37:1241-8. [PMID: 21086099 DOI: 10.1007/s10295-010-0849-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Accepted: 08/16/2010] [Indexed: 02/01/2023]
Abstract
Resistance to antibiotics and other antimicrobial compounds continues to increase. There are several possibilities for protection against pathogenic microorganisms, for instance, preparation of new vaccines against resistant bacterial strains, use of specific bacteriophages, and searching for new antibiotics. The antibiotic search includes: (1) looking for new antibiotics from nontraditional or less traditional sources, (2) sequencing microbial genomes with the aim of finding genes specifying biosynthesis of antibiotics, (3) analyzing DNA from the environment (metagenomics), (4) re-examining forgotten natural compounds and products of their transformations, and (5) investigating new antibiotic targets in pathogenic bacteria.
Collapse
|
20
|
Lew JM, Kapopoulou A, Jones LM, Cole ST. TubercuList--10 years after. Tuberculosis (Edinb) 2010; 91:1-7. [PMID: 20980199 DOI: 10.1016/j.tube.2010.09.008] [Citation(s) in RCA: 310] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2010] [Revised: 09/06/2010] [Accepted: 09/30/2010] [Indexed: 01/10/2023]
Abstract
TubercuList (http://tuberculist.epfl.ch/), the relational database that presents genome-derived information about H37Rv, the paradigm strain of Mycobacterium tuberculosis, has been active for ten years and now presents its twentieth release. Here, we describe some of the recent changes that have resulted from manual annotation with information from the scientific literature. Through manual curation, TubercuList strives to provide current gene-based information and is thus distinguished from other online sources of genome sequence data for M. tuberculosis. New, mostly small, genes have been discovered and the coordinates of some existing coding sequences have been changed when bioinformatics or experimental data suggest that this is required. Nucleotides that are polymorphic between different sources of H37Rv are annotated and gene essentiality data have been updated. A host of functional information has been gleaned from the literature and many new activities of proteins and RNAs have been included. To facilitate basic and translational research, TubercuList also provides links to other specialized databases that present diverse datasets such as 3D-structures, expression profiles, drug development criteria and drug resistance information, in addition to direct access to PubMed articles pertinent to particular genes. TubercuList has been and remains a highly valuable tool for the tuberculosis research community with >75,000 visitors per month.
Collapse
Affiliation(s)
- Jocelyne M Lew
- Ecole Polytechnique Fédérale de Lausanne, Global Health Institute, Station 19, CH-1015 Lausanne, Switzerland.
| | | | | | | |
Collapse
|
21
|
Horst JA, Samudrala R. A protein sequence meta-functional signature for calcium binding residue prediction. Pattern Recognit Lett 2010; 31:2103-2112. [PMID: 20824111 PMCID: PMC2932634 DOI: 10.1016/j.patrec.2010.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The diversity of characterized protein functions found amongst experimentally interrogated proteins suggests that a vast array of unknown functions remains undiscovered. These protein functions are imparted by specific geometric distributions of amino acid residue chemical moieties, each contributing a functional interaction. We hypothesize that individual residue function contributions are predictable through sequence analytic knowledge based algorithms, and that they can be recombined to understand composite protein function by predicting spatial relation in tertiary structure. We assess the former by training a meta-functional signature algorithm to specifically predict calcium ion binding residues from protein sequence. We estimate the latter by testing for match between predictive contribution of positions in predicted secondary structures and patterns of side chain proximity forced by secondary structure moieties. Specific training for calcium binding results in 83% area under the receiver operator characteristic curve added value over random (AUCoR) and p<10(-300) significance as measured by Kendall's τ in ten fold cross validation for parallel sets of 811 residues in 336 proteins and 696 residues in 299 proteins. Training for generalized function results in 63% AUCoR and p≅10(-221) for the same tests. Including inference of side chain proximity improves predictive ability by 2% AUCoR consistently. The results demonstrate that protein meta-functional signatures can be trained to predict specific protein functions by considering amino acid identity and structural features accessible from sequence, laying the groundwork for composite sequence based function site prediction.
Collapse
Affiliation(s)
- Jeremy A Horst
- Department of Oral Biology, School of Dentistry, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
- Department of Microbiology, School of Medicine, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
| | - Ram Samudrala
- Department of Oral Biology, School of Dentistry, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
- Department of Microbiology, School of Medicine, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
| |
Collapse
|
22
|
Poptsova MS, Gogarten JP. Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology (Reading) 2010; 156:1909-1917. [DOI: 10.1099/mic.0.033811-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.
Collapse
Affiliation(s)
- Maria S. Poptsova
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| | - J. Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA
| |
Collapse
|
23
|
Vyas J, Nowling RJ, Meusburger T, Sargeant D, Kadaveru K, Gryk MR, Kundeti V, Rajasekaran S, Schiller MR. MimoSA: a system for minimotif annotation. BMC Bioinformatics 2010; 11:328. [PMID: 20565705 PMCID: PMC2905367 DOI: 10.1186/1471-2105-11-328] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Accepted: 06/16/2010] [Indexed: 11/11/2022] Open
Abstract
Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context.
Collapse
Affiliation(s)
- Jay Vyas
- Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Ansari HR, Raghava GPS. Identification of NAD interacting residues in proteins. BMC Bioinformatics 2010; 11:160. [PMID: 20353553 PMCID: PMC2853471 DOI: 10.1186/1471-2105-11-160] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2009] [Accepted: 03/30/2010] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Small molecular cofactors or ligands play a crucial role in the proper functioning of cells. Accurate annotation of their target proteins and binding sites is required for the complete understanding of reaction mechanisms. Nicotinamide adenine dinucleotide (NAD+ or NAD) is one of the most commonly used organic cofactors in living cells, which plays a critical role in cellular metabolism, storage and regulatory processes. In the past, several NAD binding proteins (NADBP) have been reported in the literature, which are responsible for a wide-range of activities in the cell. Attempts have been made to derive a rule for the binding of NAD+ to its target proteins. However, so far an efficient model could not be derived due to the time consuming process of structure determination, and limitations of similarity based approaches. Thus a sequence and non-similarity based method is needed to characterize the NAD binding sites to help in the annotation. In this study attempts have been made to predict NAD binding proteins and their interacting residues (NIRs) from amino acid sequence using bioinformatics tools. RESULTS We extracted 1556 proteins chains from 555 NAD binding proteins whose structure is available in Protein Data Bank. Then we removed all redundant protein chains and finally obtained 195 non-redundant NAD binding protein chains, where no two chains have more than 40% sequence identity. In this study all models were developed and evaluated using five-fold cross validation technique on the above dataset of 195 NAD binding proteins. While certain type of residues are preferred (e.g. Gly, Tyr, Thr, His) in NAD interaction, residues like Ala, Glu, Leu, Lys are not preferred. A support vector machine (SVM) based method has been developed using various window lengths of amino acid sequence for predicting NAD interacting residues and obtained maximum Matthew's correlation coefficient (MCC) 0.47 with accuracy 74.13% at window length 17. We also developed a SVM based method using evolutionary information in the form of position specific scoring matrix (PSSM) and obtained maximum MCC 0.75 with accuracy 87.25%. CONCLUSION For the first time a sequence-based method has been developed for the prediction of NAD binding proteins and their interacting residues, in the absence of any prior structural information. The present model will aid in the understanding of NAD+ dependent mechanisms of action in the cell. To provide service to the scientific community, we have developed a user-friendly web server, which is available from URL http://www.imtech.res.in/raghava/nadbinder/.
Collapse
Affiliation(s)
- Hifzur R Ansari
- Institute of Microbial Technology, Sector 39A, Chandigarh, 160036, India
| | | |
Collapse
|
25
|
Lagesen K, Ussery DW, Wassenaar TM. Genome update: the 1000th genome--a cautionary tale. MICROBIOLOGY-SGM 2010; 156:603-608. [PMID: 20093288 DOI: 10.1099/mic.0.038257-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
There are now more than 1000 sequenced prokaryotic genomes deposited in public databases and available for analysis. Currently, although the sequence databases GenBank, DNA Database of Japan and EMBL are synchronized continually, there are slight differences in content at the genomes level for a variety of logistical reasons, including differences in format and loading errors, such as those caused by file transfer protocol interruptions. This means that the 1000th genome will be different in the various databases. Some of the data on the highly accessed web pages are inaccurate, leading to false conclusions for example about the largest bacterial genome sequenced. Biological diversity is far greater than many have thought. For example, analysis of multiple Escherichia coli genomes has led to an estimate of around 45 000 gene families - more genes than are recognized in the human genome. Moreover, of the 1000 genomes available, not a single protein is conserved across all genomes. Excluding the members of the Archaea, only a total of four genes are conserved in all bacteria: two protein genes and two RNA genes.
Collapse
Affiliation(s)
- Karin Lagesen
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, NO-0027, Oslo, Norway, and Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316, Oslo, Norway.,Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark
| | - Dave W Ussery
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark
| | - Trudy M Wassenaar
- Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany.,Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark
| |
Collapse
|
26
|
Know your limits: Assumptions, constraints and interpretation in systems biology. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2009; 1794:1280-7. [DOI: 10.1016/j.bbapap.2009.05.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2009] [Accepted: 05/04/2009] [Indexed: 12/20/2022]
|
27
|
Alternative splicing of transcription factors' genes: beyond the increase of proteome diversity. Comp Funct Genomics 2009:905894. [PMID: 19609452 PMCID: PMC2709715 DOI: 10.1155/2009/905894] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2008] [Revised: 04/06/2009] [Accepted: 05/18/2009] [Indexed: 11/29/2022] Open
Abstract
Functional modification of transcription regulators may lead to developmental changes and phenotypical differences between species. In this work, we study the influence of alternative splicing on transcription factors in human and mouse. Our results show that the impact of alternative splicing on transcription factors is similar in both species, meaning that the ways to increase variability should also be similar. However, when looking at the expression patterns of transcription factors, we observe that they tend to diverge regardless of the role of alternative splicing. Finally, we hypothesise that transcription regulation of alternatively spliced transcription factors could play an important role in the phenotypical differences between species, without discarding other phenomena or functional families.
Collapse
|
28
|
Armengaud J. A perfect genome annotation is within reach with the proteomics and genomics alliance. Curr Opin Microbiol 2009; 12:292-300. [PMID: 19410500 DOI: 10.1016/j.mib.2009.03.005] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Revised: 03/26/2009] [Accepted: 03/26/2009] [Indexed: 11/17/2022]
Abstract
High-throughput identification of proteins and their accurate partial sequencing by shotgun nanoLC-MS/MS are now feasible for any cellular model at a full genomic scale. Proteogenomics is the integration of these data with the genome. Mining microbial proteomes allows validation of predicted orphan genes and correction of genome annotation errors such as discovery of unannotated genes, reversal of reading frames and identification of translational start sites, stop codon read-throughs or programmed frameshifts. Recent advances have been achieved in database searches, N-terminal oriented proteomics and homology-driven proteogenomics. From now on, proteogenomics on newly sequenced model genomes can be carried out at the earliest stage of the genome project as already exemplified by Mycoplasma mobile and Deinococcus deserti genomes. The proteomics and genomics alliance produces almost complete and accurate gene catalogues for small microbial genomes, a comprehensiveness which is essential for efficient systems biology.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, France.
| |
Collapse
|
29
|
Sadowski MI, Jones DT. The sequence-structure relationship and protein function prediction. Curr Opin Struct Biol 2009; 19:357-62. [PMID: 19406632 DOI: 10.1016/j.sbi.2009.03.008] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 03/16/2009] [Indexed: 11/28/2022]
Abstract
An incomplete understanding of protein sequence/structure/function relationships causes many difficulties for prediction methods. The highly complex nature of these relationships is a consequence of the interplay between physics and evolution that has been studied using a wide array of experimental and theoretical techniques. We review recent findings relating to conservation of sequence, structure and function and discuss their use in developing improved prediction methods.
Collapse
Affiliation(s)
- M I Sadowski
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA UK
| | | |
Collapse
|
30
|
Talavera D, Laskowski RA, Thornton JM. WSsas: a web service for the annotation of functional residues through structural homologues. Bioinformatics 2009; 25:1192-4. [PMID: 19251774 DOI: 10.1093/bioinformatics/btp116] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Annotation tools help scientists to traverse the gap between characterized and uncharacterized proteins. Tools for the prediction of protein function include those which predict the function of entire proteins or complexes, those annotating functional domains and those which predict specific residues within the domain. We have developed WSsas, a web service focused on the annotation of essential functional residues. WSsas uses similarity searches and pairwise alignments to transfer functional information about binding, catalytic and protein-protein interaction residues from solved structures to query sequences. In addition, WSsas can supply information about the relevant functional atoms. The web service definition (WSDL) file and a Perl client are freely available at http://www.ebi.ac.uk/thornton-srv/databases/WSsas/.
Collapse
Affiliation(s)
- David Talavera
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|