1
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
2
|
Proteogenomic Identification of a Novel Protein-Encoding Gene in Bovine Herpesvirus 1 That Is Expressed during Productive Infection. Viruses 2018; 10:v10090499. [PMID: 30223481 PMCID: PMC6164122 DOI: 10.3390/v10090499] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 09/07/2018] [Accepted: 09/12/2018] [Indexed: 12/12/2022] Open
Abstract
Bovine herpesvirus 1 (BoHV-1) is one of several microbes that contributes to the development of the bovine respiratory disease (BRD) and can also induce abortions in cattle. As other alpha-herpesvirinae subfamily members, BoHV-1 efficiently replicates in many cell types and subsequently establishes a life-long latent infection in sensory neurons. BoHV-1 encodes more than 70 proteins that are expressed in a well-defined manner during productive infection. However, in silico open reading frame (ORF) prediction of the BoHV-1 genome suggests that the virus may encode more than one hundred proteins. In this study we used mass spectrometry followed by proteogenomic mapping to reveal the existence of 92 peptides that map to previously un-annotated regions of the viral genome. Twenty-one of the newly termed “intergenic peptides” were predicted to have a viable ORF around them. Twelve of these produced an mRNA transcript as demonstrated by strand-specific RT-PCR. We further characterized the 5′ and 3′ termini of one mRNA transcript, ORF-A, and detected a 55 kDa protein produced during active infection using a custom-synthesized antibody. We conclude that the coding potential of BoHV-1 is underestimated.
Collapse
|
3
|
Armengaud J. In Vino Veritas: An Invitation for Ambitious, Collaborative Proteogenomics Campaigns on Plant and Animal Models. Proteomics 2018; 17. [PMID: 28994197 DOI: 10.1002/pmic.201700324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 10/02/2017] [Indexed: 11/06/2022]
Abstract
Vitis vinifera has been an emblematic plant for humans since the Neolithic period. Human civilization has been shaped by its domestication as both its medicinal and nutritional values were exploited. It is now cultivated on all habitable continents, and more than 5000 varieties have been developed. A global passion for the art of wine fuels innovation and a profound desire for knowledge on this plant. The genome sequence of a homozygotic cultivar and several RNA-seq datasets on other varieties have been released paving the way to gaining further insight into its biology and tailoring improvements to varieties. However, its genome annotation remains unpolished. In this issue of Proteomics, Chapman and Bellgard (Proteomics 2017, 17, 1700197) discuss how proteogenomics can help improve genome annotation. By mining shotgun proteomics data, they defined new protein-coding genes, refined gene structures, and corrected numerous mRNA splicing events. This stimulating study shows how large international consortia could work together to improve plant and animal genome annotation on a large scale. To achieve this aim, time should be invested to generate comprehensive, high-quality experimental datasets for a wide range of well-defined lineages and exploit them with pipelines capable of handling giant datasets.
Collapse
Affiliation(s)
- Jean Armengaud
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Service de Pharmacologie et Immunoanalyse (SPI), CEA, INRA, France
| |
Collapse
|
4
|
Gomis-Cebolla J, Scaramal Ricietto AP, Ferré J. A Genomic and Proteomic Approach to Identify and Quantify the Expressed Bacillus thuringiensis Proteins in the Supernatant and Parasporal Crystal. Toxins (Basel) 2018; 10:toxins10050193. [PMID: 29748494 PMCID: PMC5983249 DOI: 10.3390/toxins10050193] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 04/30/2018] [Accepted: 05/07/2018] [Indexed: 11/16/2022] Open
Abstract
The combined analysis of genomic and proteomic data allowed us to determine which cry and vip genes are present in a Bacillus thuringiensis (Bt) isolate and which ones are being expressed. Nine Bt isolates were selected from Spanish collections of Bt based on their vip1 and vip2 gene content. As a first step, nine isolates were analyzed by PCR to select those Bt isolates that contained genes with the lowest similarity to already described vip1 and vip2 genes (isolates E-SE10.2 and O-V84.2). Two selected isolates were subjected to a combined genomic and proteomic analysis. The results showed that the Bt isolate E-SE10.2 codifies for two new vegetative proteins, Vip2Ac-like_1 and Sip1Aa-like_1, that do not show expression differences at 24 h vs. 48 h and are expressed in a low amount. The Bt isolate O-V84.2 codifies for three new vegetative proteins, Vip4Aa-like_1, Vip4Aa-like_2, and Vip2Ac-like_2, that are marginally expressed. The Vip4Aa-like_1 protein was two-fold more abundant at 24 h vs. 48 h, while the Vip4Aa-like_2 was detected only at 24 h. For Vip2Ac-like_2, no differences in expression were found at 24 h vs. 48 h. Moreover, the parasporal crystal of the E-SE10.2 isolate contains a single type of crystal protein, Cry23Aa-like, while the parasporal crystal from O-V84.2 contains three kinds of crystal proteins: 7.0–9.8% weight of Cry45Aa-like proteins, 35–37% weight of Cry32-like proteins and 2.8–4.3% weight of Cry73-like protein.
Collapse
Affiliation(s)
- Joaquín Gomis-Cebolla
- ERI de Biotecnología y Biomedicina (BIOTECMED), Department of Genetics, Universitat de València, 46100 Burjassot, Spain.
| | - Ana Paula Scaramal Ricietto
- ERI de Biotecnología y Biomedicina (BIOTECMED), Department of Genetics, Universitat de València, 46100 Burjassot, Spain.
- Departamento de Biologia Geral, Universidade Estadual de Londrina, Londrina 86057-970, Paraná, Brazil.
| | - Juan Ferré
- ERI de Biotecnología y Biomedicina (BIOTECMED), Department of Genetics, Universitat de València, 46100 Burjassot, Spain.
| |
Collapse
|
5
|
Sandrin TR, Demirev PA. Characterization of microbial mixtures by mass spectrometry. MASS SPECTROMETRY REVIEWS 2018; 37:321-349. [PMID: 28509357 DOI: 10.1002/mas.21534] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 03/09/2017] [Accepted: 03/09/2017] [Indexed: 05/27/2023]
Abstract
MS applications in microbiology have increased significantly in the past 10 years, due in part to the proliferation of regulator-approved commercial MALDI MS platforms for rapid identification of clinical infections. In parallel, with the expansion of MS technologies in the "omics" fields, novel MS-based research efforts to characterize organismal as well as environmental microbiomes have emerged. Successful characterization of microorganisms found in complex mixtures of other organisms remains a major challenge for researchers and clinicians alike. Here, we review recent MS advances toward addressing that challenge. These include sample preparation methods and protocols, and established, for example, MALDI, as well as newer, for example, atmospheric pressure ionization (API) techniques. MALDI mass spectra of intact cells contain predominantly information on the highly expressed house-keeping proteins used as biomarkers. The API methods are applicable for small biomolecule analysis, for example, phospholipids and lipopeptides, and facilitate species differentiation. MS hardware and techniques, for example, tandem MS, including diverse ion source/mass analyzer combinations are discussed. Relevant examples for microbial mixture characterization utilizing these combinations are provided. Chemometrics and bioinformatics methods and algorithms, including those applied to large scale MS data acquisition in microbial metaproteomics and MS imaging of biofilms, are highlighted. Select MS applications for polymicrobial culture analysis in environmental and clinical microbiology are reviewed as well.
Collapse
Affiliation(s)
- Todd R Sandrin
- School of Mathematical and Natural Sciences, Arizona State University, Phoenix, Arizona
| | - Plamen A Demirev
- Applied Physics Laboratory, Johns Hopkins University, Laurel, Maryland
| |
Collapse
|
6
|
Sotillo J, Toledo R, Mulvenna J, Loukas A. Exploiting Helminth-Host Interactomes through Big Data. Trends Parasitol 2017; 33:875-888. [PMID: 28734897 DOI: 10.1016/j.pt.2017.06.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 06/26/2017] [Accepted: 06/28/2017] [Indexed: 12/19/2022]
Abstract
Helminths facilitate their parasitic existence through the production and secretion of different molecules, including proteins. Some helminth proteins can manipulate the host's immune system, a phenomenon that is now being exploited with a view to developing therapeutics for inflammatory diseases. In recent years, hundreds of helminth genomes have been sequenced, but as a community we are still taking baby steps when it comes to identifying proteins that govern host-helminth interactions. The information generated from genomic, immunomic, and proteomic studies, as well as from cutting-edge approaches such as proteogenomics, is leading to a substantial volume of big data that can be utilised to shed light on fundamental biology and provide solutions for the development of bioactive-molecule-based therapeutics.
Collapse
Affiliation(s)
- Javier Sotillo
- Centre for Biodiscovery and Molecular Development of Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia.
| | - Rafael Toledo
- Departament de Farmacia, Tecnologia Farmacéutica y Parasitologia, Facultat de Farmacia, Universitat de Valencia, Spain
| | - Jason Mulvenna
- QIMR-Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | - Alex Loukas
- Centre for Biodiscovery and Molecular Development of Therapeutics, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia.
| |
Collapse
|
7
|
Keppanan R, Sivaperumal S, Chadra Kanta D, Akutse KS, Wang L. Molecular docking of protease from Metarhizium anisopliae and their toxic effect against model insect Galleria mellonella. PESTICIDE BIOCHEMISTRY AND PHYSIOLOGY 2017; 138:8-14. [PMID: 28456309 DOI: 10.1016/j.pestbp.2017.01.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 01/16/2017] [Accepted: 01/28/2017] [Indexed: 06/07/2023]
Abstract
Fungal virulence has been mostly associated with cuticle-degrading enzymes, which form the first formidable barrier to pathogens and pass through certain discrete stages before breaching the insect cuticle. The present study was conducted to extract and purify the extracellular protease enzyme from three isolates from Metarhizium anisopliae. The molecular weight of protease enzyme from each isolate was identified using sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and found to be 35-40kDa. The partially purified enzymes were tested to identify its toxic effects against the developmental stages of IVth instar larvae of Galleria mellonella and the mortality of larvae among the three isolates was observed. The Tk6 isolate showed an ascending effect after 48h of exposure, with highest mortality at 120h post inoculation. It also showed more virulence against the model insect compared to other strains. Tk6 isolate's active protein band was analyzed by MALDI-TOF and docking study was carried out to find the interaction between the fungal and insect proteins.
Collapse
Affiliation(s)
- Ravindran Keppanan
- Fujian-Taiwan Joint Innovation Centre for Ecological Control of Crop Pests - Vegetable subcenter, Fujian Agriculture and Forestry University, Fuzhou 350002, PR China; Department of Biotechnology and Genetic Engineering, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620024, India
| | - Sivaramakrishnan Sivaperumal
- Department of Biotechnology and Genetic Engineering, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620024, India
| | - Dash Chadra Kanta
- Fujian-Taiwan Joint Innovation Centre for Ecological Control of Crop Pests - Vegetable subcenter, Fujian Agriculture and Forestry University, Fuzhou 350002, PR China
| | - Komivi Senyo Akutse
- Key Laboratory of Biopesticide and Chemical Biology, MOE., Faculty of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, PR China
| | - Liande Wang
- Fujian-Taiwan Joint Innovation Centre for Ecological Control of Crop Pests - Vegetable subcenter, Fujian Agriculture and Forestry University, Fuzhou 350002, PR China; Key Laboratory of Biopesticide and Chemical Biology, MOE., Faculty of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, PR China.
| |
Collapse
|
8
|
Seligmann H. Natural mitochondrial proteolysis confirms transcription systematically exchanging/deleting nucleotides, peptides coded by expanded codons. J Theor Biol 2016; 414:76-90. [PMID: 27899286 DOI: 10.1016/j.jtbi.2016.11.021] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 11/11/2016] [Accepted: 11/22/2016] [Indexed: 12/19/2022]
Abstract
Protein sequences have higher linguistic complexities than human languages. This indicates undeciphered multilayered, overprinted information/genetic codes. Some superimposed genetic information is revealed by detections of transcripts systematically (a) exchanging nucleotides (nine symmetric, e.g. A<->C, fourteen asymmetric, e.g. A->C->G->A, swinger RNAs) translated according to tri-, tetra- and pentacodons, and (b) deleting mono-, dinucleotides after each trinucleotide (delRNAs). Here analyses of two independent proteomic datasets considering natural proteolysis confirm independently translation of these non-canonical RNAs, also along tetra- and pentacodons, increasing coverage of putative, cryptically encoded proteins. Analyses assuming endoproteinase GluC and elastase digestions (cleavages after residues D, E, and A, L, I, V, respectively) detect additional peptides colocalizing with detected non-canonical RNAs. Analyses detect fewer peptides matching GluC-, elastase- than trypsin-digestions: artificial trypsin-digestion outweighs natural proteolysis. Results suggest occurrences of complete proteins entirely matching non-canonical, superimposed encoding(s). Protein-coding after bijective transformations could explain genetic code symmetries, such as along Rumer's transformation.
Collapse
Affiliation(s)
- Hervé Seligmann
- Unité de Recherche sur les Maladies Infectieuses et Tropicales Émergentes, Faculté de Médecine, URMITE CNRS-IRD 198 UMER 6236, IHU (Institut Hospitalo-Universitaire), Aix-Marseille University, Marseille, France.
| |
Collapse
|
9
|
Prasad TSK, Mohanty AK, Kumar M, Sreenivasamurthy SK, Dey G, Nirujogi RS, Pinto SM, Madugundu AK, Patil AH, Advani J, Manda SS, Gupta MK, Dwivedi SB, Kelkar DS, Hall B, Jiang X, Peery A, Rajagopalan P, Yelamanchi SD, Solanki HS, Raja R, Sathe GJ, Chavan S, Verma R, Patel KM, Jain AP, Syed N, Datta KK, Khan AA, Dammalli M, Jayaram S, Radhakrishnan A, Mitchell CJ, Na CH, Kumar N, Sinnis P, Sharakhov IV, Wang C, Gowda H, Tu Z, Kumar A, Pandey A. Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes. Genome Res 2016; 27:133-144. [PMID: 28003436 PMCID: PMC5204337 DOI: 10.1101/gr.201368.115] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 11/10/2016] [Indexed: 01/05/2023]
Abstract
Complementing genome sequence with deep transcriptome and proteome data could enable more accurate assembly and annotation of newly sequenced genomes. Here, we provide a proof-of-concept of an integrated approach for analysis of the genome and proteome of Anopheles stephensi, which is one of the most important vectors of the malaria parasite. To achieve broad coverage of genes, we carried out transcriptome sequencing and deep proteome profiling of multiple anatomically distinct sites. Based on transcriptomic data alone, we identified and corrected 535 events of incomplete genome assembly involving 1196 scaffolds and 868 protein-coding gene models. This proteogenomic approach enabled us to add 365 genes that were missed during genome annotation and identify 917 gene correction events through discovery of 151 novel exons, 297 protein extensions, 231 exon extensions, 192 novel protein start sites, 19 novel translational frames, 28 events of joining of exons, and 76 events of joining of adjacent genes as a single gene. Incorporation of proteomic evidence allowed us to change the designation of more than 87 predicted “noncoding RNAs” to conventional mRNAs coded by protein-coding genes. Importantly, extension of the newly corrected genome assemblies and gene models to 15 other newly assembled Anopheline genomes led to the discovery of a large number of apparent discrepancies in assembly and annotation of these genomes. Our data provide a framework for how future genome sequencing efforts should incorporate transcriptomic and proteomic analysis in combination with simultaneous manual curation to achieve near complete assembly and accurate annotation of genomes.
Collapse
Affiliation(s)
- T S Keshava Prasad
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University, Mangalore 575018, India.,NIMHANS-IOB Proteomics and Bioinformatics Laboratory, Neurobiology Research Centre, National Institute of Mental Health and Neuro Sciences, Bangalore, Karnataka 560029, India
| | - Ajeet Kumar Mohanty
- National Institute of Malaria Research, Field Station, Goa 403001, India.,Department of Zoology, Goa University, Taleigao Plateau, Goa 403206, India
| | - Manish Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Sreelakshmi K Sreenivasamurthy
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Gourav Dey
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Raja Sekhar Nirujogi
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Centre for Bioinformatics, Pondicherry University, Puducherry 605014, India
| | - Sneha M Pinto
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University, Mangalore 575018, India
| | - Anil K Madugundu
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Centre for Bioinformatics, Pondicherry University, Puducherry 605014, India
| | - Arun H Patil
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Jayshree Advani
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Srikanth S Manda
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Centre for Bioinformatics, Pondicherry University, Puducherry 605014, India
| | - Manoj Kumar Gupta
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Sutopa B Dwivedi
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Dhanashree S Kelkar
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Brantley Hall
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Xiaofang Jiang
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Pavithra Rajagopalan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Soujanya D Yelamanchi
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Hitendra S Solanki
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Remya Raja
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Gajanan J Sathe
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Sandip Chavan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Renu Verma
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Krishna M Patel
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India
| | - Ankit P Jain
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Nazia Syed
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Department of Biochemistry and Molecular Biology, Pondicherry University, Puducherry 605014, India
| | - Keshava K Datta
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Aafaque Ahmed Khan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India
| | - Manjunath Dammalli
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Department of Biotechnology, Siddaganga Institute of Technology, Tumkur, Karnataka 572103, India
| | - Savita Jayaram
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India
| | - Aneesha Radhakrishnan
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,Department of Biochemistry and Molecular Biology, Pondicherry University, Puducherry 605014, India
| | - Christopher J Mitchell
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - Chan-Hyun Na
- Department of Neurology, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | - Nirbhay Kumar
- Department of Tropical Medicine, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana 70112, USA
| | - Photini Sinnis
- Malaria Research Institute, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Igor V Sharakhov
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Charles Wang
- Center for Genomics and Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, California 92350, USA
| | - Harsha Gowda
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,YU-IOB Center for Systems Biology and Molecular Medicine, Yenepoya University, Mangalore 575018, India
| | - Zhijian Tu
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Ashwani Kumar
- National Institute of Malaria Research, Field Station, Goa 403001, India
| | - Akhilesh Pandey
- Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka 560066, India.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.,Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.,Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| |
Collapse
|
10
|
Abstract
Omics approaches have become popular in biology as powerful discovery tools, and currently gain in interest for diagnostic applications. Establishing the accurate genome sequence of any organism is easy, but the outcome of its annotation by means of automatic pipelines remains imprecise. Some protein-encoding genes may be missed as soon as they are specific and poorly conserved in a given taxon, while important to explain the specific traits of the organism. Translational starts are also poorly predicted in a relatively important number of cases, thus impacting the protein sequence database used in proteomics, comparative genomics, and systems biology. The use of high-throughput proteomics data to improve genome annotation is an attractive option to obtain a more comprehensive molecular picture of a given organism. Here, protocols for reannotating prokaryote genomes are described based on shotgun proteomics and derivatization of protein N-termini with a positively charged reagent coupled to high-resolution tandem mass spectrometry.
Collapse
|
11
|
Ravindran K, Akutse KS, Sivaramakrishnan S, Wang L. Determination and characterization of destruxin production in Metarhizium anisopliae Tk6 and formulations for Aedes aegypti mosquitoes control at the field level. Toxicon 2016; 120:89-96. [DOI: 10.1016/j.toxicon.2016.07.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 07/15/2016] [Accepted: 07/20/2016] [Indexed: 10/21/2022]
|
12
|
Quan M, Xie J, Liu X, Li Y, Rang J, Zhang T, Zhou F, Xia L, Hu S, Sun Y, Ding X. Comparative Analysis of Genomics and Proteomics in the New Isolated Bacillus thuringiensis X022 Revealed the Metabolic Regulation Mechanism of Carbon Flux Following Cu(2+) Treatment. Front Microbiol 2016; 7:792. [PMID: 27303381 PMCID: PMC4882340 DOI: 10.3389/fmicb.2016.00792] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2016] [Accepted: 05/09/2016] [Indexed: 01/23/2023] Open
Abstract
Bacillus thuringiensis (Bt) X022 is a novel strain isolated from soil in China, and showed strong insecticidal activity against several Lepidopteran pests. In this work, we performed whole genome sequencing of this Bt strain using the next-generation sequencing technology, and further conducted a comparative analysis with the proteomics data of the specific spore-release period based on LC-MS/MS approach. The Bt X022 genome consisted of one circular chromosomal DNA and seven plasmids, which were further functionally annotated using the RAST server. Comparative analysis of insecticidal substances showed that X022 contained genes coding for three Cry proteins (Cry1Ac, Cry1Ia and Cry2Ab) and a vegetative insecticidal protein (Vip3A). However, three insecticidal crystal proteins (ICPs) (Cry1Ca, Cry1Ac and Cry1Da) were detected by proteomics in the spore-release period. Moreover, a putative biosynthetic gene cluster and the metabolic pathway for poly-β-hydroxybutyrate in Bt X022 were deduced based on the comparative analysis of genomic and proteomic data, which revealed the metabolic regulation mechanism of carbon flux correlated with increased production of ICPs caused by Cu2+. Hence, these results provided a deeper understanding of the genetic background and protein expression profile of Bt X022. This study established a foundation for directed genetic modification and further application of this new isolated Bt strain.
Collapse
Affiliation(s)
- Meifang Quan
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal UniversityChangsha, China; Laboratory of Medicine Engineering, College of Medicine, Hunan Normal UniversityChangsha, China
| | - Junyan Xie
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Xuemei Liu
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Yang Li
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Jie Rang
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Tong Zhang
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Fengjuan Zhou
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Liqiu Xia
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Shengbiao Hu
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Yunjun Sun
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| | - Xuezhi Ding
- Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, College of Life Science, Hunan Normal University Changsha, China
| |
Collapse
|
13
|
Broadbent JA, Broszczak DA, Tennakoon IUK, Huygens F. Pan-proteomics, a concept for unifying quantitative proteome measurements when comparing closely-related bacterial strains. Expert Rev Proteomics 2016; 13:355-65. [PMID: 26889693 DOI: 10.1586/14789450.2016.1155986] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The comparison of proteomes between genetically heterogeneous bacterial strains may offer valuable insights into physiological diversity and function, particularly where such variation aids in the survival and virulence of clinically-relevant strains. However, reports of such comparisons frequently fail to account for underlying genetic variance. As a consequence, the current knowledge regarding bacterial physiological diversity at the protein level may be incomplete or inaccurate. To address this, greater consideration must be given to the impact of genetic heterogeneity on proteome comparisons. This may be possible through the use of pan-proteomics, an analytical concept that permits the ability to qualitatively and quantitatively compare the proteomes of genetically heterogeneous organisms. Limited examples of this emerging technology highlight currently unmet analytical challenges. In this article we define pan-proteomics, where its value lies in microbiology, and discuss the technical considerations critical to its successful execution and potential future application.
Collapse
Affiliation(s)
- James A Broadbent
- a Tissue Repair and Regeneration Program, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Faculty of Health , Queensland University of Technology , Brisbane , Australia.,b Molecular Microbiological Pathogenesis Group, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Faculty of Health , Queensland University of Technology , Brisbane , Australia
| | - Daniel A Broszczak
- a Tissue Repair and Regeneration Program, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Faculty of Health , Queensland University of Technology , Brisbane , Australia
| | - Imalka U K Tennakoon
- b Molecular Microbiological Pathogenesis Group, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Faculty of Health , Queensland University of Technology , Brisbane , Australia
| | - Flavia Huygens
- b Molecular Microbiological Pathogenesis Group, Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Faculty of Health , Queensland University of Technology , Brisbane , Australia
| |
Collapse
|
14
|
Locard-Paulet M, Pible O, Gonzalez de Peredo A, Alpha-Bazin B, Almunia C, Burlet-Schiltz O, Armengaud J. Clinical implications of recent advances in proteogenomics. Expert Rev Proteomics 2016; 13:185-99. [DOI: 10.1586/14789450.2016.1132169] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
15
|
Abstract
In the past decade, proteogenomics has emerged as a valuable technique that contributes to the state-of-the-art in genome annotation; however, previous proteogenomic studies were limited to bottom-up mass spectrometry and did not take advantage of top-down approaches. We show that top-down proteogenomics allows one to address the problems that remained beyond the reach of traditional bottom-up proteogenomics. In particular, we show that top-down proteogenomics leads to the discovery of previously unannotated genes even in extensively studied bacterial genomes and present SpectroGene, a software tool for genome annotation using top-down tandem mass spectra. We further show that top-down proteogenomics searches (against the six-frame translation of a genome) identify nearly all proteoforms found in traditional top-down proteomics searches (against the annotated proteome). SpectroGene is freely available at http://github.com/fenderglass/SpectroGene .
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Department of Computer Science and Engineering, UCSD, 9500 Gilman Drive, La Jolla, CA, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, IUPUI, 719 Indiana Ave, Suite 304, Indianapolis, IN, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, UCSD, 9500 Gilman Drive, La Jolla, CA, USA
| |
Collapse
|
16
|
Abstract
Annotation of protein coding genes in sequenced genomes has been routinely carried out using gene prediction programs guided by available transcript data. The advent of mass spectrometry has enabled the identification of proteins in a high-throughput manner. In addition to searching proteins annotated in public databases, mass spectrometry data can also be searched against conceptually translated genome as well as transcriptome to identify novel protein coding regions. This proteogenomics approach has resulted in the identification of novel protein coding regions in both prokaryotic and eukaryotic genomes. These studies have also revealed that some of the annotated noncoding RNAs and pseudogenes code for proteins. This approach is likely to become a part of most genome annotation workflows in the future. Here we describe a general methodology and approach that can be used for proteogenomics.
Collapse
Affiliation(s)
- Keshava K Datta
- Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India
- School of Biotechnology, KIIT University, Bhubaneswar, 751024, Odisha, India
| | - Anil K Madugundu
- Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry, 605014, India
| | - Harsha Gowda
- Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India.
- School of Biotechnology, KIIT University, Bhubaneswar, 751024, Odisha, India.
| |
Collapse
|
17
|
Kumar D, Mondal AK, Kutum R, Dash D. Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes. Proteomics 2015; 16:226-40. [PMID: 26773550 DOI: 10.1002/pmic.201500263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/18/2015] [Accepted: 09/28/2015] [Indexed: 01/04/2023]
Abstract
Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Anupam Kumar Mondal
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Rintu Kutum
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| |
Collapse
|
18
|
Christie-Oleza JA, Armengaud J. Proteomics of theRoseobacterclade, a window to the marine microbiology landscape. Proteomics 2015; 15:3928-42. [DOI: 10.1002/pmic.201500222] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Revised: 08/24/2015] [Accepted: 09/22/2015] [Indexed: 11/07/2022]
Affiliation(s)
| | - Jean Armengaud
- CEA; DSV; IBiTec-S; SPI; Li2D; Laboratory “Innovative Technologies for Detection and Diagnostics”; Bagnols-sur-Cèze France
| |
Collapse
|
19
|
The Pacific Northwest National Laboratory library of bacterial and archaeal proteomic biodiversity. Sci Data 2015; 2:150041. [PMID: 26306205 PMCID: PMC4540001 DOI: 10.1038/sdata.2015.41] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 07/22/2015] [Indexed: 01/09/2023] Open
Abstract
This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.
Collapse
|
20
|
Wang X, Li Y, Xu G, Liu M, Xue L, Liu L, Hu S, Zhang Y, Nie Y, Liang S, Wang B, Ding J. Mechanism study of peptide GMBP1 and its receptor GRP78 in modulating gastric cancer MDR by iTRAQ-based proteomic analysis. BMC Cancer 2015; 15:358. [PMID: 25943993 PMCID: PMC4430905 DOI: 10.1186/s12885-015-1361-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 04/23/2015] [Indexed: 12/28/2022] Open
Abstract
Background Multidrug resistance (MDR) is a major obstacle to the treatment of gastric cancer (GC). Using a phage display approach, we previously obtained the peptide GMBP1, which specifically binds to the surface of MDR gastric cancer cells and is subsequently internalized. Furthermore, GMBP1 was shown to have the potential to reverse the MDR phenotype of gastric cancer cells, and GRP78 was identified as the receptor for this peptide. The present study aimed to investigate the mechanism of peptide GMBP1 and its receptor GRP78 in modulating gastric cancer MDR. Methods Fluorescence-activated cell sorting (FACS) and immunofluorescence staining were used to investigate the subcellular location and mechanism of GMBP1 internalization. iTRAQ was used to identify the MDR-associated downstream targets of GMBP1. Differentially expressed proteins were identified in GMBP1-treated compared to untreated SGC7901/ADR and SGC7901/VCR cells. GO and KEGG pathway analyses of the differentially expressed proteins revealed the interconnection of these proteins, the majority of which are involved in MDR. Two differentially expressed proteins were selected and validated by western blotting. Results GMBP1 and its receptor GRP78 were found to be localized in the cytoplasm of GC cells, and GRP78 can mediate the internalization of GMBP1 into MDR cells through the transferrin-related pathway. In total, 3,752 and 3,749 proteins were affected in GMBP1-treated SGC7901/ADR and SGC7901/VCR cells, respectively, involving 38 and 79 KEGG pathways. Two differentially expressed proteins, CTBP2 and EIF4E, were selected and validated by western blotting. Conclusion This study explored the role and downstream mechanism of GMBP1 in GC MDR, providing insight into the role of endoplasmic reticulum stress protein GRP78 in the MDR of cancer cells. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1361-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaojuan Wang
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Yani Li
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Guanghui Xu
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Muhan Liu
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Lin Xue
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Lijuan Liu
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Sijun Hu
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Ying Zhang
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Yongzhan Nie
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Shuhui Liang
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Biaoluo Wang
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| | - Jie Ding
- State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, 127 Changle Western Road, Xi'an, 710032, China.
| |
Collapse
|
21
|
Abstract
The concept of the minimal cell has fascinated scientists for a long time, from both fundamental and applied points of view. This broad concept encompasses extreme reductions of genomes, the last universal common ancestor (LUCA), the creation of semiartificial cells, and the design of protocells and chassis cells. Here we review these different areas of research and identify common and complementary aspects of each one. We focus on systems biology, a discipline that is greatly facilitating the classical top-down and bottom-up approaches toward minimal cells. In addition, we also review the so-called middle-out approach and its contributions to the field with mathematical and computational models. Owing to the advances in genomics technologies, much of the work in this area has been centered on minimal genomes, or rather minimal gene sets, required to sustain life. Nevertheless, a fundamental expansion has been taking place in the last few years wherein the minimal gene set is viewed as a backbone of a more complex system. Complementing genomics, progress is being made in understanding the system-wide properties at the levels of the transcriptome, proteome, and metabolome. Network modeling approaches are enabling the integration of these different omics data sets toward an understanding of the complex molecular pathways connecting genotype to phenotype. We review key concepts central to the mapping and modeling of this complexity, which is at the heart of research on minimal cells. Finally, we discuss the distinction between minimizing the number of cellular components and minimizing cellular complexity, toward an improved understanding and utilization of minimal and simpler cells.
Collapse
|
22
|
Rang J, He H, Wang T, Ding X, Zuo M, Quan M, Sun Y, Yu Z, Hu S, Xia L. Comparative analysis of genomics and proteomics in Bacillus thuringiensis 4.0718. PLoS One 2015; 10:e0119065. [PMID: 25781161 PMCID: PMC4363619 DOI: 10.1371/journal.pone.0119065] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 01/09/2015] [Indexed: 11/18/2022] Open
Abstract
Bacillus thuringiensis is a widely used biopesticide that produced various insecticidal active substances during its life cycle. Separation and purification of numerous insecticide active substances have been difficult because of the relatively short half-life of such substances. On the other hand, substances can be synthetized at different times during development, so samples at different stages have to be studied, further complicating the analysis. A dual genomic and proteomic approach would enhance our ability to identify such substances, and particularily using mass spectrometry-based proteomic methods. The comparative analysis for genomic and proteomic data have showed that not all of the products deduced from the annotated genome could be identified among the proteomic data. For instance, genome annotation results showed that 39 coding sequences in the whole genome were related to insect pathogenicity, including five cry genes. However, Cry2Ab, Cry1Ia, Cytotoxin K, Bacteriocin, Exoenzyme C3 and Alveolysin could not be detected in the proteomic data obtained. The sporulation-related proteins were also compared analysis, results showed that the great majority sporulation-related proteins can be detected by mass spectrometry. This analysis revealed Spo0A~P, SigF, SigE(+), SigK(+) and SigG(+), all known to play an important role in the process of spore formation regulatory network, also were displayed in the proteomic data. Through the comparison of the two data sets, it was possible to infer that some genes were silenced or were expressed at very low levels. For instance, found that cry2Ab seems to lack a functional promoter while cry1Ia may not be expressed due to the presence of transposons. With this comparative study a relatively complete database can be constructed and used to transform hereditary material, thereby prompting the high expression of toxic proteins. A theoretical basis is provided for constructing highly virulent engineered bacteria and for promoting the application of proteogenomics in the life sciences.
Collapse
Affiliation(s)
- Jie Rang
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Hao He
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Ting Wang
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Xuezhi Ding
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
- * E-mail: (XZD); (LQX)
| | - Mingxing Zuo
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Meifang Quan
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Yunjun Sun
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Ziquan Yu
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Shengbiao Hu
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
| | - Liqiu Xia
- College of Life Science, Hunan Normal University, Hunan Provincial Key Laboratory of Microbial Molecular Biology-State Key Laboratory Breeding Base of Microbial Molecular Biology, Changsha, China
- * E-mail: (XZD); (LQX)
| |
Collapse
|
23
|
Grobbler C, Virdis B, Nouwens A, Harnisch F, Rabaey K, Bond PL. Use of SWATH mass spectrometry for quantitative proteomic investigation of Shewanella oneidensis MR-1 biofilms grown on graphite cloth electrodes. Syst Appl Microbiol 2015; 38:135-9. [DOI: 10.1016/j.syapm.2014.11.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Revised: 11/17/2014] [Accepted: 11/19/2014] [Indexed: 11/27/2022]
|
24
|
Nesvizhskii AI. Proteogenomics: concepts, applications and computational strategies. Nat Methods 2015; 11:1114-25. [PMID: 25357241 DOI: 10.1038/nmeth.3144] [Citation(s) in RCA: 543] [Impact Index Per Article: 54.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 09/22/2014] [Indexed: 12/19/2022]
Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry-based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- 1] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
25
|
Kucharova V, Wiker HG. Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics. Proteomics 2014; 14:2360-675. [PMID: 25263021 DOI: 10.1002/pmic.201400168] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022]
Abstract
High-accuracy and high-throughput proteomic methods have completely changed the way we can identify and characterize proteins. MS-based proteomics can now provide a unique supplement to genomic data and add a new level of information to the interpretation of genomic sequences. Proteomics-driven genome annotation has become especially relevant in microbiology where genomes are sequenced on a daily basis and limitations of an in silico driven annotation process are well recognized. In this review paper, we outline different strategies on how one can design a proteogenomic experiment, for example on genome-sequenced (synonymous proteogenomics) versus unsequenced organisms (ortho-proteogenomics) or with the aid of other "omic" data such as RNA-seq. We touch upon many challenges that are encountered during a typical proteogenomic study, mostly concerning bioinformatics methods and downstream data analysis, but also related to creation and use of sequence databases. A large list of proteogenomic case studies of different microorganisms is provided to illustrate the mapping of MS/MS-derived peptide spectra to genomic DNA sequences. These investigations have led to accurate determination of translational initiation sites, pointed out eventual read-throughs or programmed frameshifts, detected signal peptide processing or other protein maturation events, removed questionable annotation assignments, and provided evidence for predicted hypothetical proteins.
Collapse
Affiliation(s)
- Veronika Kucharova
- Department of Clinical Science, The Gade Research Group for Infection and Immunity, University of Bergen, Norway
| | | |
Collapse
|
26
|
Abstract
An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
Collapse
|
27
|
Chen H, Xu L, Yin L, Xu Y, Han X, Qi Y, Zhao Y, Liu K, Peng J. iTRAQ-based proteomic analysis of dioscin on human HCT-116 colon cancer cells. Proteomics 2014; 14:51-73. [PMID: 24420967 DOI: 10.1002/pmic.201300101] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 10/24/2013] [Accepted: 10/28/2013] [Indexed: 12/27/2022]
Abstract
Dioscin shows various pharmacological effects. However, its activity on colorectal cancer is still unknown. The present work showed that dioscin significantly inhibited cell proliferation on human HCT-116 colon cancer cells, and affected Ca(2+) release and ROS generation. The content of nitric oxide (NO) and its producer inducible NO synthase (iNOS) associated with DNA damage and aberrant cell signaling were assayed using the kits. DNA damage and cell apoptosis caused by dioscin were also analyzed through single-cell gel electrophoresis and in situ terminal deoxynucleotidyl transferase dUTP nick-end labeling assays. The results showed that dioscin increased the levels of NO and inducible NO synthase. The comet length in dioscin-treated groups was much longer than that of the control group, and the number of terminal deoxynucleotidyl transferase dUTP nick-end labeling positive cells (apoptotic cells) was significantly increased by the compound (p < 0.01). Furthermore, dioscin caused mitochondrial damage and G2/M cell cycle arrest through transmission electron microscopy and flow cytometry analysis, respectively. To study the cytotoxic mechanism of dioscin, an iTRAQ-based proteomics approach was used. There were 288 significantly different proteins expressed in response to dioscin, which were connected with each other and were involved in different Kyoto Encyclopedia of Genes and Genomes pathways. Then, some differentially expressed proteins involved in oxidative phosphorylation, Wnt, p53, and calcium signaling pathways were validated by Western blotting and quantitative real-time PCR assays. Our work elucidates the molecular mechanism of dioscin-induced cytotoxicity in colon cancer cells, and the identified targets may be useful for treatment of colorectal cancer in future.
Collapse
Affiliation(s)
- Hao Chen
- College of Pharmacy, Dalian Medical University, Dalian, China
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Sakata K, Komatsu S. Plant proteomics: from genome sequencing to proteome databases and repositories. Methods Mol Biol 2014; 1072:29-42. [PMID: 24136512 DOI: 10.1007/978-1-62703-631-3_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Proteomic approaches are useful for the identification of functional proteins. These have been enhanced not only by the development of proteomic techniques but also in concert with genome sequencing. In this chapter, 30 databases and Web sites relating to plant proteomics are reviewed and recent technologies relating to data collection and annotation are surveyed.
Collapse
|
29
|
Sun H, Xing X, Li J, Zhou F, Chen Y, He Y, Li W, Wei G, Chang X, Jia J, Li Y, Xie L. Identification of gene fusions from human lung cancer mass spectrometry data. BMC Genomics 2013; 14 Suppl 8:S5. [PMID: 24564548 PMCID: PMC4042237 DOI: 10.1186/1471-2164-14-s8-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Tandem mass spectrometry (MS/MS) technology has been applied to identify proteins, as an ultimate approach to confirm the original genome annotation. To be able to identify gene fusion proteins, a special database containing peptides that cross over gene fusion breakpoints is needed. Methods It is impractical to construct a database that includes all possible fusion peptides originated from potential breakpoints. Focusing on 6259 reported and predicted gene fusion pairs from ChimerDB 2.0 and Cancer Gene Census, we for the first time created a database CanProFu that comprehensively annotates fusion peptides formed by exon-exon linkage between these pairing genes. Results Applying this database to mass spectrometry datasets of 40 human non-small cell lung cancer (NSCLC) samples and 39 normal lung samples with stringent searching criteria, we were able to identify 19 unique fusion peptides characterizing gene fusion events. Among them 11 gene fusion events were only found in NSCLC samples. And also, 4 alternative splicing events were characterized in cancerous or normal lung samples. Conclusions The database and workflow in this work can be flexibly applied to other MS/MS based human cancer experiments to detect gene fusions as potential disease biomarkers or drug targets.
Collapse
|
30
|
Pang CNI, Tay AP, Aya C, Twine NA, Harkness L, Hart-Smith G, Chia SZ, Chen Z, Deshpande NP, Kaakoush NO, Mitchell HM, Kassem M, Wilkins MR. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing. J Proteome Res 2013; 13:84-98. [PMID: 24152167 DOI: 10.1021/pr400820p] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.
Collapse
Affiliation(s)
- Chi Nam Ignatius Pang
- Systems Biology Initiative, The University of New South Wales , Sydney, New South Wales 2052, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Kumar D, Yadav AK, Kadimi PK, Nagaraj SH, Grimmond SM, Dash D. Proteogenomic analysis of Bradyrhizobium japonicum USDA110 using GenoSuite, an automated multi-algorithmic pipeline. Mol Cell Proteomics 2013; 12:3388-97. [PMID: 23882027 PMCID: PMC3820949 DOI: 10.1074/mcp.m112.027169] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Revised: 07/19/2013] [Indexed: 11/06/2022] Open
Abstract
We present GenoSuite, an integrated proteogenomic pipeline to validate, refine and discover protein coding genes using high-throughput mass spectrometry (MS) data from prokaryotes. To demonstrate the effectiveness of GenoSuite, we analyzed proteomics data of Bradyrhizobium japonicum (USDA110), a model organism to study agriculturally important rhizobium-legume symbiosis. Our analysis confirmed 31% of known genes, refined 49 gene models for their translation initiation site (TIS) and discovered 59 novel protein coding genes. Notably, a novel protein which redefined the boundary of a crucial cytochrome P450 system related operon was discovered, known to be highly expressed in the anaerobic symbiotic bacteroids. A focused analysis on N-terminally acetylated peptides indicated downstream TIS for gene blr0594. Finally, ortho-proteogenomic analysis revealed three novel genes in recently sequenced B. japonicum USDA6(T) genome. The discovery of large number of missing genes and correction of gene models have expanded the proteomic landscape of B. japonicum and presents an unparalleled utility of proteogenomic analyses and versatility of GenoSuite for annotating prokaryotic genomes including pathogens.
Collapse
Affiliation(s)
- Dhirendra Kumar
- From the ‡G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi 110025, India
| | - Amit Kumar Yadav
- From the ‡G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi 110025, India
| | - Puneet Kumar Kadimi
- From the ‡G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi 110025, India
| | - Shivashankar H. Nagaraj
- §Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Sean M. Grimmond
- §Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Debasis Dash
- From the ‡G.N. Ramachandran Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi 110025, India
| |
Collapse
|
32
|
Castellana NE, Shen Z, He Y, Walley JW, Cassidy CJ, Briggs SP, Bafna V. An automated proteogenomic method uses mass spectrometry to reveal novel genes in Zea mays. Mol Cell Proteomics 2013; 13:157-67. [PMID: 24142994 DOI: 10.1074/mcp.m113.031260] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
New technologies in genomics and proteomics have influenced the emergence of proteogenomics, a field at the confluence of genomics, transcriptomics, and proteomics. First generation proteogenomic toolkits employ peptide mass spectrometry to identify novel protein coding regions. We extend first generation proteogenomic tools to achieve greater accuracy and enable the analysis of large, complex genomes. We apply our pipeline to Zea mays, which has a genome comparable in size to human. Our pipeline begins with the comparison of mass spectra to a putative translation of the genome. We select novel peptides, those that match a region of the genome that was not previously known to be protein coding, for grouping into refinement events. We present a novel, probabilistic framework for evaluating the accuracy of each event. Our calculated event probability, or eventProb, considers the number of supporting peptides and spectra, and the quality of each supporting peptide-spectrum match. Our pipeline predicts 165 novel protein-coding genes and proposes updated models for 741 additional genes.
Collapse
Affiliation(s)
- Natalie E Castellana
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92092
| | | | | | | | | | | | | |
Collapse
|
33
|
Armengaud J, Hartmann EM, Bland C. Proteogenomics for environmental microbiology. Proteomics 2013; 13:2731-42. [PMID: 23636904 DOI: 10.1002/pmic.201200576] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 03/06/2013] [Accepted: 04/09/2013] [Indexed: 11/09/2022]
Abstract
Proteogenomics sensu stricto refers to the use of proteomic data to refine the annotation of genomes from model organisms. Because of the limitations of automatic annotation pipelines, a relatively high number of errors occur during the structural annotation of genes coding for proteins. Whether putative orphan sequences or short genes encoding low-molecular-weight proteins really exist is still frequently a mystery. Whether start codons are well defined is also an open debate. These problems are exacerbated for genomes of microorganisms belonging to poorly documented genera, as related sequences are not always available for homology-guided annotation. The functional annotation of a significant proportion of genes is also another well-known issue when annotating environmental microorganisms. High-throughput shotgun proteomics has recently greatly evolved, allowing the exploration of the proteome from any microorganism at an unprecedented depth. The structural and functional annotation process may be usefully complemented with experimental data. Indeed, proteogenomic mapping has been successfully performed for a wide variety of organisms. Specific approaches devoted to systematically establishing the N-termini of a large set of proteins are being developed. N-terminomics is giving rise to datasets of experimentally proven translational start codons as well as validated peptide signals for secreted proteins. By extension, combining genomic and proteomic data is becoming routine in many research projects. The proteomic analysis of organisms with unfinished genome sequences, the so-called composite proteomics, and the search for microbial biomarkers by bottom-up and top-down combined approaches are some examples of proteogenomic-flavored studies. They illustrate the advent of a new era of environmental microbiology where proteomics and genomics are intimately integrated to answer key biological questions.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, France
| | | | | |
Collapse
|
34
|
Costa EP, Menschaert G, Luyten W, De Grave K, Ramon J. PIUS: peptide identification by unbiased search. Bioinformatics 2013; 29:1913-4. [DOI: 10.1093/bioinformatics/btt298] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
35
|
Blein-Nicolas M, Albertin W, Valot B, Marullo P, Sicard D, Giraud C, Huet S, Bourgais A, Dillmann C, de Vienne D, Zivy M. Yeast proteome variations reveal different adaptive responses to grape must fermentation. Mol Biol Evol 2013; 30:1368-83. [PMID: 23493259 DOI: 10.1093/molbev/mst050] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Saccharomyces cerevisiae and S. uvarum are two domesticated species of the Saccharomyces sensu stricto clade that diverged around 100 Ma after whole-genome duplication. Both have retained many duplicated genes associated with glucose fermentation and are characterized by the ability to achieve grape must fermentation. Nevertheless, these two species differ for many other traits, indicating that they underwent different evolutionary histories. To determine how the evolutionary histories of S. cerevisiae and S. uvarum are mirrored on the proteome, we analyzed the genetic variability of the proteomes of domesticated strains of these two species by quantitative mass spectrometry. Overall, 445 proteins were quantified. Massive variations of protein abundances were found, that clearly differentiated the two species. Abundance variations in specific metabolic pathways could be related to phenotypic traits known to discriminate the two species. In addition, proteins encoded by duplicated genes were shown to be differently recruited in each species. Comparing the strain differentiation based on the proteome variability to those based on the phenotypic and genetic variations further revealed that the strains of S. uvarum and some strains of S. cerevisiae displayed similar fermentative performances despite strong proteomic and genomic differences. Altogether, these results indicate that the ability of S. cerevisae and S. uvarum to complete grape must fermentation arose through different evolutionary roads, involving different metabolic pathways and duplicated genes.
Collapse
|
36
|
Top-Down Characterization of the Post-Translationally Modified Intact Periplasmic Proteome from the Bacterium Novosphingobium aromaticivorans. INTERNATIONAL JOURNAL OF PROTEOMICS 2013; 2013:279590. [PMID: 23555055 PMCID: PMC3608174 DOI: 10.1155/2013/279590] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2012] [Revised: 01/31/2013] [Accepted: 02/04/2013] [Indexed: 11/17/2022]
Abstract
The periplasm of Gram-negative bacteria is a dynamic and physiologically important subcellular compartment where the constant exposure to potential environmental insults amplifies the need for proper protein folding and modifications. Top-down proteomics analysis of the periplasmic fraction at the intact protein level provides unrestricted characterization and annotation of the periplasmic proteome, including the post-translational modifications (PTMs) on these proteins. Here, we used single-dimension ultra-high pressure liquid chromatography coupled with the Fourier transform mass spectrometry (FTMS) to investigate the intact periplasmic proteome of Novosphingobium aromaticivorans. Our top-down analysis provided the confident identification of 55 proteins in the periplasm and characterized their PTMs including signal peptide removal, N-terminal methionine excision, acetylation, glutathionylation, pyroglutamate, and disulfide bond formation. This study provides the first experimental evidence for the expression and periplasmic localization of many hypothetical and uncharacterized proteins and the first unrestrictive, large-scale data on PTMs in the bacterial periplasm.
Collapse
|
37
|
Abstract
Historically many genome annotation strategies have lacked experimental evidence at the protein level, which and have instead relied heavily on ab initio gene prediction tools, which consequently resulted in many incorrectly annotated genomic sequences. Proteogenomics aims to address these issues using mass spectrometry (MS)-based proteomics, genomic mapping, and providing statistical significance measures such as false discovery rates (FDRs) to validate the mapped peptides. Presented here is a tool capable of meeting this goal, the UCSD proteogenomic pipeline, which maps peptide-spectrum matches (PSMs) to the genome using the Inspect MS/MS database search tool and assigns a statistical significance to the match using a target-decoy search approach to assign estimated FDRs. This pipeline also provides the option of using a more reliable approach to proteogenomics by determining the precise false-positive rates (FPRs) and p-values of each PSM by calculating their spectral probabilities and rescoring each PSM accordingly. In addition to the protein prediction challenges in the rapidly growing number of sequenced plant genomes, it is difficult to extract high-quality protein samples from many plant species. For that reason, this chapter contains methods for protein extraction and trypsin digestion that reliably produce samples suitable for proteogenomic analysis.
Collapse
|
38
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
39
|
Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res 2012; 11:5221-34. [PMID: 23025403 PMCID: PMC3703792 DOI: 10.1021/pr300411q] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Collapse
Affiliation(s)
- Paul Blakeley
- Faculty of Life Sciences, The University of Manchester, Manchester M13 9PT, UK
| | | | | |
Collapse
|
40
|
Pawar H, Sahasrabuddhe NA, Renuse S, Keerthikumar S, Sharma J, Kumar GSS, Venugopal A, Sekhar NR, Kelkar DS, Nemade H, Khobragade SN, Muthusamy B, Kandasamy K, Harsha HC, Chaerkady R, Patole MS, Pandey A. A proteogenomic approach to map the proteome of an unsequenced pathogen - Leishmania donovani. Proteomics 2012; 12:832-44. [DOI: 10.1002/pmic.201100505] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Harsh Pawar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Rajiv Gandhi University of Health Sciences; Bangalore Karnataka India
| | - Nandini A. Sahasrabuddhe
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Manipal University; Madhav Nagar Manipal Karnataka India
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
| | - Santosh Renuse
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biotechnology; Amrita Vishwa Vidyapeetham; Kollam Kerala India
| | | | - Jyoti Sharma
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Manipal University; Madhav Nagar Manipal Karnataka India
| | - Ghantasala. S. Sameer Kumar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Department of Biotechnology; Kuvempu University; Shimoga Karnataka India
| | - Abhilash Venugopal
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Department of Biotechnology; Kuvempu University; Shimoga Karnataka India
| | - Nirujogi Raja Sekhar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Bioinformatics Centre; School of Life Sciences; Pondicherry University; Puducherry India
| | - Dhanashree S. Kelkar
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Department of Biotechnology; Amrita Vishwa Vidyapeetham; Kollam Kerala India
| | - Harshal Nemade
- National Centre for Cell Sciences; Pune Maharashtra India
| | | | - Babylakshmi Muthusamy
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- Bioinformatics Centre; School of Life Sciences; Pondicherry University; Puducherry India
| | - Kumaran Kandasamy
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
| | - H. C. Harsha
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
| | - Raghothama Chaerkady
- Institute of Bioinformatics; International Technology Park; Bangalore Karnataka India
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
| | | | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Biological Chemistry; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Oncology; Johns Hopkins University School of Medicine; Baltimore MD USA
- Department of Pathology; Johns Hopkins University School of Medicine; Baltimore MD USA
| |
Collapse
|
41
|
Nagaraj SH, Harsha H, Reverter A, Colgrave ML, Sharma R, Andronicos N, Hunt P, Menzies M, Lees MS, Sekhar NR, Pandey A, Ingham A. Proteomic analysis of the abomasal mucosal response following infection by the nematode, Haemonchus contortus, in genetically resistant and susceptible sheep. J Proteomics 2012; 75:2141-52. [PMID: 22285630 DOI: 10.1016/j.jprot.2012.01.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Revised: 12/21/2011] [Accepted: 01/09/2012] [Indexed: 10/14/2022]
|
42
|
Schrimpe-Rutledge AC, Jones MB, Chauhan S, Purvine SO, Sanford JA, Monroe ME, Brewer HM, Payne SH, Ansong C, Frank BC, Smith RD, Peterson SN, Motin VL, Adkins JN. Comparative omics-driven genome annotation refinement: application across Yersiniae. PLoS One 2012; 7:e33903. [PMID: 22479471 PMCID: PMC3313959 DOI: 10.1371/journal.pone.0033903] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 02/19/2012] [Indexed: 02/03/2023] Open
Abstract
Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors inherent to automated computational annotations is to apply data from omics measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. Here, the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species. Transcriptomic and proteomic data derived from highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis Pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 incorrect (i.e., observed frameshifts, extended start sites, and translated pseudogenes) protein-coding sequences within the three current genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes, including the insertion-ablated argD, underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, a transcriptional regulator, and many hypothetical proteins that were missed during annotation.
Collapse
Affiliation(s)
| | - Marcus B. Jones
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Sadhana Chauhan
- University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Samuel O. Purvine
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - James A. Sanford
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Matthew E. Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Heather M. Brewer
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Samuel H. Payne
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Charles Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Bryan C. Frank
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Richard D. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Scott N. Peterson
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Vladimir L. Motin
- University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Joshua N. Adkins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- * E-mail:
| |
Collapse
|
43
|
Christie-Oleza JA, Miotello G, Armengaud J. High-throughput proteogenomics of Ruegeria pomeroyi: seeding a better genomic annotation for the whole marine Roseobacter clade. BMC Genomics 2012; 13:73. [PMID: 22336032 PMCID: PMC3305630 DOI: 10.1186/1471-2164-13-73] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 02/15/2012] [Indexed: 11/10/2022] Open
Abstract
Background The structural and functional annotation of genomes is now heavily based on data obtained using automated pipeline systems. The key for an accurate structural annotation consists of blending similarities between closely related genomes with biochemical evidence of the genome interpretation. In this work we applied high-throughput proteogenomics to Ruegeria pomeroyi, a member of the Roseobacter clade, an abundant group of marine bacteria, as a seed for the annotation of the whole clade. Results A large dataset of peptides from R. pomeroyi was obtained after searching over 1.1 million MS/MS spectra against a six-frame translated genome database. We identified 2006 polypeptides, of which thirty-four were encoded by open reading frames (ORFs) that had not previously been annotated. From the pool of 'one-hit-wonders', i.e. those ORFs specified by only one peptide detected by tandem mass spectrometry, we could confirm the probable existence of five additional new genes after proving that the corresponding RNAs were transcribed. We also identified the most-N-terminal peptide of 486 polypeptides, of which sixty-four had originally been wrongly annotated. Conclusions By extending these re-annotations to the other thirty-six Roseobacter isolates sequenced to date (twenty different genera), we propose the correction of the assigned start codons of 1082 homologous genes in the clade. In addition, we also report the presence of novel genes within operons encoding determinants of the important tricarboxylic acid cycle, a feature that seems to be characteristic of some Roseobacter genomes. The detection of their corresponding products in large amounts raises the question of their function. Their discoveries point to a possible theory for protein evolution that will rely on high expression of orphans in bacteria: their putative poor efficiency could be counterbalanced by a higher level of expression. Our proteogenomic analysis will increase the reliability of the future annotation of marine bacterial genomes.
Collapse
|
44
|
Venter E, Smith RD, Payne SH. Proteogenomic analysis of bacteria and archaea: a 46 organism case study. PLoS One 2011; 6:e27587. [PMID: 22114679 PMCID: PMC3219674 DOI: 10.1371/journal.pone.0027587] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 10/20/2011] [Indexed: 11/19/2022] Open
Abstract
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.
Collapse
Affiliation(s)
- Eli Venter
- Department of Informatics, J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Richard D. Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
| | - Samuel H. Payne
- Department of Informatics, J. Craig Venter Institute, Rockville, Maryland, United States of America
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America
- * E-mail:
| |
Collapse
|
45
|
Liu X, Sirotkin Y, Shen Y, Anderson G, Tsai YS, Ting YS, Goodlett DR, Smith RD, Bafna V, Pevzner PA. Protein identification using top-down. Mol Cell Proteomics 2011; 11:M111.008524. [PMID: 22027200 DOI: 10.1074/mcp.m111.008524] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
In the last two years, because of advances in protein separation and mass spectrometry, top-down mass spectrometry moved from analyzing single proteins to analyzing complex samples and identifying hundreds and even thousands of proteins. However, computational tools for database search of top-down spectra against protein databases are still in their infancy. We describe MS-Align+, a fast algorithm for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. We also propose a method for evaluating statistical significance of top-down protein identifications and further benchmark various software tools on two top-down data sets from Saccharomyces cerevisiae and Salmonella typhimurium. We demonstrate that MS-Align+ significantly increases the number of identified spectra as compared with MASCOT and OMSSA on both data sets. Although MS-Align+ and ProSightPC have similar performance on the Salmonella typhimurium data set, MS-Align+ outperforms ProSightPC on the (more complex) Saccharomyces cerevisiae data set.
Collapse
Affiliation(s)
- Xiaowen Liu
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, San Diego, California 92093, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Ansong C, Tolić N, Purvine SO, Porwollik S, Jones M, Yoon H, Payne SH, Martin JL, Burnet MC, Monroe ME, Venepally P, Smith RD, Peterson SN, Heffron F, McClelland M, Adkins JN. Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium. BMC Genomics 2011; 12:433. [PMID: 21867535 PMCID: PMC3174948 DOI: 10.1186/1471-2164-12-433] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2011] [Accepted: 08/25/2011] [Indexed: 12/22/2022] Open
Abstract
Background Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. Results We experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function. Conclusion This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.
Collapse
Affiliation(s)
- Charles Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Chaerkady R, Kelkar DS, Muthusamy B, Kandasamy K, Dwivedi SB, Sahasrabuddhe NA, Kim MS, Renuse S, Pinto SM, Sharma R, Pawar H, Sekhar NR, Mohanty AK, Getnet D, Yang Y, Zhong J, Dash AP, MacCallum RM, Delanghe B, Mlambo G, Kumar A, Keshava Prasad TS, Okulate M, Kumar N, Pandey A. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res 2011; 21:1872-81. [PMID: 21795387 DOI: 10.1101/gr.127951.111] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.
Collapse
Affiliation(s)
- Raghothama Chaerkady
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Christie-Oleza JA, Fernandez B, Nogales B, Bosch R, Armengaud J. Proteomic insights into the lifestyle of an environmentally relevant marine bacterium. ISME JOURNAL 2011; 6:124-35. [PMID: 21776030 DOI: 10.1038/ismej.2011.86] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In terms of lifestyle, free-living bacteria are classified as either oligotrophic/specialist or opportunist/generalist. Heterogeneous marine environments such as coastal waters favour the establishment of marine generalist bacteria, which code for a large pool of functions. This is basically foreseen to cope with the heterogeneity of organic matter supplied to these systems. Nevertheless, it is not known what fraction of a generalist proteome is needed for house-keeping functions or what fraction is modified to cope with environmental changes. Here, we used high-throughput proteomics to define the proteome of Ruegeria pomeroyi DSS-3, a model marine generalist bacterium of the Roseobacter clade. We evaluated its genome expression under several natural environmental conditions, revealing the versatility of the bacterium to adapt to anthropogenic influence, poor nutrient concentrations or the presence of the natural microbial community. We also assayed 30 different laboratory incubations to increase proteome coverage and to dig further into the functional genomics of the bacterium. We established its core proteome and the proteome devoted to adaptation to general cellular physiological variations (almost 50%). We suggest that the other half of its theoretical proteome is the opportunist genetic pool devoted exclusively to very specific environmental conditions.
Collapse
|
49
|
Fisunov GY, Alexeev DG, Bazaleev NA, Ladygina VG, Galyamina MA, Kondratov IG, Zhukova NA, Serebryakova MV, Demina IA, Govorun VM. Core proteome of the minimal cell: comparative proteomics of three mollicute species. PLoS One 2011; 6:e21964. [PMID: 21818284 PMCID: PMC3139596 DOI: 10.1371/journal.pone.0021964] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Accepted: 06/14/2011] [Indexed: 11/19/2022] Open
Abstract
Mollicutes (mycoplasmas) have been recognized as highly evolved prokaryotes with an extremely small genome size and very limited coding capacity. Thus, they may serve as a model of a 'minimal cell': a cell with the lowest possible number of genes yet capable of autonomous self-replication. We present the results of a comparative analysis of proteomes of three mycoplasma species: A. laidlawii, M. gallisepticum, and M. mobile. The core proteome components found in the three mycoplasma species are involved in fundamental cellular processes which are necessary for the free living of cells. They include replication, transcription, translation, and minimal metabolism. The members of the proteome core seem to be tightly interconnected with a number of interactions forming core interactome whether or not additional species-specific proteins are located on the periphery. We also obtained a genome core of the respective organisms and compared it with the proteome core. It was found that the genome core encodes 73 more proteins than the proteome core. Apart of proteins which may not be identified due to technical limitations, there are 24 proteins that seem to not be expressed under the optimal conditions.
Collapse
Affiliation(s)
- Gleb Y Fisunov
- Scientific Research Institute of Physical-Chemical Medicine, Federal Bio-Medical Agency of Russia, Moscow, Russia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Jeong K, Kim S, Bandeira N, Pevzner PA. Gapped spectral dictionaries and their applications for database searches of tandem mass spectra. Mol Cell Proteomics 2011; 10:M110.002220. [PMID: 21444829 DOI: 10.1074/mcp.m110.002220] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.
Collapse
Affiliation(s)
- Kyowon Jeong
- Department of Electrical and Computer Engineering, University of California, San Diego, CA, USA
| | | | | | | |
Collapse
|