1
|
García-Estrada C, Martín JF, Cueto L, Barreiro C. Omics Approaches Applied to Penicillium chrysogenum and Penicillin Production: Revealing the Secrets of Improved Productivity. Genes (Basel) 2020; 11:genes11060712. [PMID: 32604893 PMCID: PMC7348727 DOI: 10.3390/genes11060712] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/07/2020] [Accepted: 06/24/2020] [Indexed: 12/20/2022] Open
Abstract
Penicillin biosynthesis by Penicillium chrysogenum is one of the best-characterized biological processes from the genetic, molecular, biochemical, and subcellular points of view. Several omics studies have been carried out in this filamentous fungus during the last decade, which have contributed to gathering a deep knowledge about the molecular mechanisms underlying improved productivity in industrial strains. The information provided by these studies is extremely useful for enhancing the production of penicillin or other bioactive secondary metabolites by means of Biotechnology or Synthetic Biology.
Collapse
Affiliation(s)
- Carlos García-Estrada
- INBIOTEC (Instituto de Biotecnología de León). Avda. Real 1—Parque Científico de León, 24006 León, Spain; (L.C.); (C.B.)
- Departamento de Ciencias Biomédicas, Universidad de León, Campus de Vegazana s/n, 24071 León, Spain
- Correspondence: or ; Tel.: +34-987210308
| | - Juan F. Martín
- Área de Microbiología, Departamento de Biología Molecular, Facultad de Ciencias Biológicas y Ambientales, Universidad de León, 24071 León, Spain;
| | - Laura Cueto
- INBIOTEC (Instituto de Biotecnología de León). Avda. Real 1—Parque Científico de León, 24006 León, Spain; (L.C.); (C.B.)
| | - Carlos Barreiro
- INBIOTEC (Instituto de Biotecnología de León). Avda. Real 1—Parque Científico de León, 24006 León, Spain; (L.C.); (C.B.)
- Departamento de Biología Molecular, Universidad de León, Campus de Ponferrada, Avda. Astorga s/n, 24401 Ponferrada, Spain
| |
Collapse
|
2
|
Korandla DR, Wozniak JM, Campeau A, Gonzalez DJ, Wright ES. AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions. Bioinformatics 2019; 36:1022-1029. [PMID: 31532487 PMCID: PMC7998711 DOI: 10.1093/bioinformatics/btz714] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 09/05/2019] [Accepted: 09/13/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION A core task of genomics is to identify the boundaries of protein coding genes, which may cover over 90% of a prokaryote's genome. Several programs are available for gene finding, yet it is currently unclear how well these programs perform and whether any offers superior accuracy. This is in part because there is no universal benchmark for gene finding and, therefore, most developers select their own benchmarking strategy. RESULTS Here, we introduce AssessORF, a new approach for benchmarking prokaryotic gene predictions based on evidence from proteomics data and the evolutionary conservation of start and stop codons. We applied AssessORF to compare gene predictions offered by GenBank, GeneMarkS-2, Glimmer and Prodigal on genomes spanning the prokaryotic tree of life. Gene predictions were 88-95% in agreement with the available evidence, with Glimmer performing the worst but no clear winner. All programs were biased towards selecting start codons that were upstream of the actual start. Given these findings, there remains considerable room for improvement, especially in the detection of correct start sites. AVAILABILITY AND IMPLEMENTATION AssessORF is available as an R package via the Bioconductor package repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deepank R Korandla
- Department of Biological Sciences, USA,Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA,Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15219, USA
| | - Jacob M Wozniak
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - Anaamika Campeau
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - David J Gonzalez
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
3
|
De León-Medina PM, Elizondo-González R, Damas-Buenrostro LC, Geertman JM, Van den Broek M, Galán-Wong LJ, Ortiz-López R, Pereyra-Alférez B. Genome annotation of a Saccharomyces sp. lager brewer's yeast. GENOMICS DATA 2016; 9:25-9. [PMID: 27330999 PMCID: PMC4909825 DOI: 10.1016/j.gdata.2016.05.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Revised: 02/13/2016] [Accepted: 05/19/2016] [Indexed: 11/25/2022]
Abstract
The genome of lager brewer's yeast is a hybrid, with Saccharomyces eubayanus and Saccharomyces cerevisiae as sub-genomes. Due to their specific use in the beer industry, relatively little information is available. The genome of brewing yeast was sequenced and annotated in this study. We obtained a genome size of 22.7 Mbp that consisted of 133 scaffolds, with 65 scaffolds larger than 10 kbp. With respect to the annotation, 9939 genes were obtained, and when they were submitted to a local alignment, we found that 53.93% of these genes corresponded to S. cerevisiae, while another 42.86% originated from S. eubayanus. Our results confirm that our strain is a hybrid of at least two different genomes.
Collapse
Affiliation(s)
- Patricia Marcela De León-Medina
- Instituto de Biotecnología, Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León. Pedro de Alba y Manuel L. Barragán S/N, Ciudad Universitaria, San Nicolás de los Garza, Nuevo León 66450, Mexico
- Centro de Investigación y Desarrollo en Ciencias de la Salud, Universidad Autónoma de Nuevo León, Avenida Carlos Canseco s/n esquina con Av. Gonzalitos, Mutualismo, Mitras Centro, 64460 Monterrey, Nuevo León, Mexico
- Laboratorio de Investigación y Desarrollo, Cervecería Cuauhtémoc Moctezuma S.A. de C.V., Alfonso Reyes Norte Col, Bella Vista, 2202 Monterrey, Nuevo León, Mexico
| | - Ramiro Elizondo-González
- Instituto de Biotecnología, Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León. Pedro de Alba y Manuel L. Barragán S/N, Ciudad Universitaria, San Nicolás de los Garza, Nuevo León 66450, Mexico
- Centro de Investigación y Desarrollo en Ciencias de la Salud, Universidad Autónoma de Nuevo León, Avenida Carlos Canseco s/n esquina con Av. Gonzalitos, Mutualismo, Mitras Centro, 64460 Monterrey, Nuevo León, Mexico
- Laboratorio de Investigación y Desarrollo, Cervecería Cuauhtémoc Moctezuma S.A. de C.V., Alfonso Reyes Norte Col, Bella Vista, 2202 Monterrey, Nuevo León, Mexico
| | - Luis Cástulo Damas-Buenrostro
- Laboratorio de Investigación y Desarrollo, Cervecería Cuauhtémoc Moctezuma S.A. de C.V., Alfonso Reyes Norte Col, Bella Vista, 2202 Monterrey, Nuevo León, Mexico
| | - Jan-Maarten Geertman
- Heineken Supply Chain, Global Research & Development, 2382 PH Zoeterwoude, The Netherlands
| | - Marcel Van den Broek
- Department of Biotechnology, Delft University of Technology, Julianalaan 67, 2628 BC Delft, The Netherlands
| | - Luis Jesús Galán-Wong
- Instituto de Biotecnología, Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León. Pedro de Alba y Manuel L. Barragán S/N, Ciudad Universitaria, San Nicolás de los Garza, Nuevo León 66450, Mexico
| | - Rocío Ortiz-López
- Centro de Investigación y Desarrollo en Ciencias de la Salud, Universidad Autónoma de Nuevo León, Avenida Carlos Canseco s/n esquina con Av. Gonzalitos, Mutualismo, Mitras Centro, 64460 Monterrey, Nuevo León, Mexico
| | - Benito Pereyra-Alférez
- Instituto de Biotecnología, Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León. Pedro de Alba y Manuel L. Barragán S/N, Ciudad Universitaria, San Nicolás de los Garza, Nuevo León 66450, Mexico
- Corresponding author at: Pedro de Alba y Manuel L. Barragán S/N, Ciudad Universitaria, San Nicolás de los Garza, Nuevo León 66450, Mexico.Pedro de Alba y Manuel L. Barragán S/N, Ciudad UniversitariaSan Nicolás de los GarzaNuevo León66450Mexico
| |
Collapse
|
4
|
Zhu X, Xie S, Armengaud J, Xie W, Guo Z, Kang S, Wu Q, Wang S, Xia J, He R, Zhang Y. Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline. Mol Cell Proteomics 2016; 15:1791-807. [PMID: 26902207 PMCID: PMC5083088 DOI: 10.1074/mcp.m115.050989] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2015] [Revised: 02/04/2016] [Indexed: 11/06/2022] Open
Abstract
The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest.
Collapse
Affiliation(s)
- Xun Zhu
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | | | - Jean Armengaud
- ¶CEA-Marcoule, DSV/IBITEC-S/SPI/Li2D, Laboratory, BP 17171, F-30200, Bagnols-sur-Cèze, F-30207, France
| | - Wen Xie
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhaojiang Guo
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shi Kang
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Qingjun Wu
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shaoli Wang
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jixing Xia
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Rongjun He
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Youjun Zhang
- From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China;
| |
Collapse
|
5
|
|
6
|
Kucharova V, Wiker HG. Proteogenomics in microbiology: taking the right turn at the junction of genomics and proteomics. Proteomics 2014; 14:2360-675. [PMID: 25263021 DOI: 10.1002/pmic.201400168] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022]
Abstract
High-accuracy and high-throughput proteomic methods have completely changed the way we can identify and characterize proteins. MS-based proteomics can now provide a unique supplement to genomic data and add a new level of information to the interpretation of genomic sequences. Proteomics-driven genome annotation has become especially relevant in microbiology where genomes are sequenced on a daily basis and limitations of an in silico driven annotation process are well recognized. In this review paper, we outline different strategies on how one can design a proteogenomic experiment, for example on genome-sequenced (synonymous proteogenomics) versus unsequenced organisms (ortho-proteogenomics) or with the aid of other "omic" data such as RNA-seq. We touch upon many challenges that are encountered during a typical proteogenomic study, mostly concerning bioinformatics methods and downstream data analysis, but also related to creation and use of sequence databases. A large list of proteogenomic case studies of different microorganisms is provided to illustrate the mapping of MS/MS-derived peptide spectra to genomic DNA sequences. These investigations have led to accurate determination of translational initiation sites, pointed out eventual read-throughs or programmed frameshifts, detected signal peptide processing or other protein maturation events, removed questionable annotation assignments, and provided evidence for predicted hypothetical proteins.
Collapse
Affiliation(s)
- Veronika Kucharova
- Department of Clinical Science, The Gade Research Group for Infection and Immunity, University of Bergen, Norway
| | | |
Collapse
|
7
|
Nagarajha Selvan LD, Kaviyil JE, Nirujogi RS, Muthusamy B, Puttamallesh VN, Subbannayya T, Syed N, Radhakrishnan A, Kelkar DS, Ahmad S, Pinto SM, Kumar P, Madugundu AK, Nair B, Chatterjee A, Pandey A, Ravikumar R, Gowda H, Prasad TSK. Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry. Clin Proteomics 2014; 11:5. [PMID: 24484775 PMCID: PMC3915034 DOI: 10.1186/1559-0275-11-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 12/17/2013] [Indexed: 12/30/2022] Open
Abstract
Background Cryptococcus neoformans, a basidiomycetous fungus of universal occurrence, is a significant opportunistic human pathogen causing meningitis. Owing to an increase in the number of immunosuppressed individuals along with emergence of drug-resistant strains, C. neoformans is gaining importance as a pathogen. Although, whole genome sequencing of three varieties of C. neoformans has been completed recently, no global proteomic studies have yet been reported. Results We performed a comprehensive proteomic analysis of C. neoformans var. grubii (Serotype A), which is the most virulent variety, in order to provide protein-level evidence for computationally predicted gene models and to refine the existing annotations. We confirmed the protein-coding potential of 3,674 genes from a total of 6,980 predicted protein-coding genes. We also identified 4 novel genes and corrected 104 predicted gene models. In addition, our studies led to the correction of translational start site, splice junctions and reading frame used for translation in a number of proteins. Finally, we validated a subset of our novel findings by RT-PCR and sequencing. Conclusions Proteogenomic investigation described here facilitated the validation and refinement of computationally derived gene models in the intron-rich genome of C. neoformans, an important fungal pathogen in humans.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Harsha Gowda
- Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India.
| | | |
Collapse
|
8
|
Ansong C, Deatherage BL, Hyduke D, Schmidt B, McDermott JE, Jones MB, Chauhan S, Charusanti P, Kim YM, Nakayasu ES, Li J, Kidwai A, Niemann G, Brown RN, Metz TO, McAteer K, Heffron F, Peterson SN, Motin V, Palsson BO, Smith RD, Adkins JN. Studying Salmonellae and Yersiniae host-pathogen interactions using integrated 'omics and modeling. Curr Top Microbiol Immunol 2013; 363:21-41. [PMID: 22886542 DOI: 10.1007/82_2012_247] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Salmonella and Yersinia are two distantly related genera containing species with wide host-range specificity and pathogenic capacity. The metabolic complexity of these organisms facilitates robust lifestyles both outside of and within animal hosts. Using a pathogen-centric systems biology approach, we are combining a multi-omics (transcriptomics, proteomics, metabolomics) strategy to define properties of these pathogens under a variety of conditions including those that mimic the environments encountered during pathogenesis. These high-dimensional omics datasets are being integrated in selected ways to improve genome annotations, discover novel virulence-related factors, and model growth under infectious states. We will review the evolving technological approaches toward understanding complex microbial life through multi-omic measurements and integration, while highlighting some of our most recent successes in this area.
Collapse
Affiliation(s)
- Charles Ansong
- Biological Separations and Mass Spectroscopy Group, Pacific Northwest National Laboratory, PO Box 999, MSIN: K8-98, Richland, WA, 99352, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Volkening JD, Bailey DJ, Rose CM, Grimsrud PA, Howes-Podoll M, Venkateshwaran M, Westphall MS, Ané JM, Coon JJ, Sussman MR. A proteogenomic survey of the Medicago truncatula genome. Mol Cell Proteomics 2012; 11:933-44. [PMID: 22774004 DOI: 10.1074/mcp.m112.019471] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Peptide sequencing by computational assignment of tandem mass spectra to a database of putative protein sequences provides an independent approach to confirming or refuting protein predictions based on large-scale DNA and RNA sequencing efforts. This use of mass spectrometrically-derived sequence data for testing and refining predicted gene models has been termed proteogenomics. We report herein the application of proteogenomic methodology to a database of 10.9 million tandem mass spectra collected over a period of two years from proteolytically generated peptides isolated from the model legume Medicago truncatula. These spectra were searched against a database of predicted M. truncatula protein sequences generated from public databases, in silico gene model predictions, and a whole-genome six-frame translation. This search identified 78,647 distinct peptide sequences, and a comparison with the publicly available proteome from the recently published M. truncatula genome supported translation of 9,843 existing gene models and identified 1,568 novel peptides suggesting corrections or additions to the current annotations. Each supporting and novel peptide was independently validated using mRNA-derived deep sequencing coverage and an overall correlation of 93% between the two data types was observed. We have additionally highlighted examples of several aspects of structural annotation for which tandem MS provides unique evidence not easily obtainable through typical DNA or RNA sequencing. Proteogenomic analysis is a valuable and unique source of information for the structural annotation of genomes and should be included in such efforts to ensure that the genome models used by biologists mirror as accurately as possible what is present in the cell.
Collapse
Affiliation(s)
- Jeremy D Volkening
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Helmy M, Sugiyama N, Tomita M, Ishihama Y. Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics. Genes Cells 2012; 17:633-44. [PMID: 22686349 DOI: 10.1111/j.1365-2443.2012.01615.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 04/14/2012] [Indexed: 01/18/2023]
Abstract
We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.
Collapse
Affiliation(s)
- Mohamed Helmy
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | | | | | | |
Collapse
|
11
|
Lim S, Chisholm K, Coffin RH, Peters RD, Al-Mughrabi KI, Wang-Pruski G, Pinto DM. Protein Profiling in Potato (Solanum tuberosum L.) Leaf Tissues by Differential Centrifugation. J Proteome Res 2012; 11:2594-601. [DOI: 10.1021/pr201004k] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Sanghyun Lim
- Department of Plant and Animal
Sciences, Nova Scotia Agricultural College, Truro, Nova Scotia, Canada
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Kenneth Chisholm
- National Research Council Institute for Marine Biosciences, Halifax, Nova
Scotia, Canada
| | | | | | | | - Gefu Wang-Pruski
- Department of Plant and Animal
Sciences, Nova Scotia Agricultural College, Truro, Nova Scotia, Canada
| | - Devanand M. Pinto
- National Research Council Institute for Marine Biosciences, Halifax, Nova
Scotia, Canada
| |
Collapse
|
12
|
Weisbrod CR, Eng JK, Hoopmann MR, Baker T, Bruce JE. Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J Proteome Res 2012; 11:1621-32. [PMID: 22288382 DOI: 10.1021/pr2008175] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Fourier transform-all reaction monitoring (FT-ARM) is a novel approach for the identification and quantification of peptides that relies upon the selectivity of high mass accuracy data and the specificity of peptide fragmentation patterns. An FT-ARM experiment involves continuous, data-independent, high mass accuracy MS/MS acquisition spanning a defined m/z range. Custom software was developed to search peptides against the multiplexed fragmentation spectra by comparing theoretical or empirical fragment ions against every fragmentation spectrum across the entire acquisition. A dot product score is calculated against each spectrum to generate a score chromatogram used for both identification and quantification. Chromatographic elution profile characteristics are not used to cluster precursor peptide signals to their respective fragment ions. FT-ARM identifications are demonstrated to be complementary to conventional data-dependent shotgun analysis, especially in cases where the data-dependent method fails because of fragmenting multiple overlapping precursors. The sensitivity, robustness, and specificity of FT-ARM quantification are shown to be analogous to selected reaction monitoring-based peptide quantification with the added benefit of minimal assay development. Thus, FT-ARM is demonstrated to be a novel and complementary data acquisition, identification, and quantification method for the large scale analysis of peptides.
Collapse
Affiliation(s)
- Chad R Weisbrod
- Department of Genome Sciences, University of Washington , 815 Mercer Street, Seattle, Washington 98109, United States
| | | | | | | | | |
Collapse
|
13
|
Ansong C, Tolić N, Purvine SO, Porwollik S, Jones M, Yoon H, Payne SH, Martin JL, Burnet MC, Monroe ME, Venepally P, Smith RD, Peterson SN, Heffron F, McClelland M, Adkins JN. Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium. BMC Genomics 2011; 12:433. [PMID: 21867535 PMCID: PMC3174948 DOI: 10.1186/1471-2164-12-433] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2011] [Accepted: 08/25/2011] [Indexed: 12/22/2022] Open
Abstract
Background Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. Results We experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function. Conclusion This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.
Collapse
Affiliation(s)
- Charles Ansong
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Kniemeyer O. Proteomics of eukaryotic microorganisms: The medically and biotechnologically important fungal genus Aspergillus. Proteomics 2011; 11:3232-43. [DOI: 10.1002/pmic.201100087] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/26/2011] [Accepted: 04/05/2011] [Indexed: 11/09/2022]
|
15
|
Helmy M, Tomita M, Ishihama Y. OryzaPG-DB: rice proteome database based on shotgun proteogenomics. BMC PLANT BIOLOGY 2011; 11:63. [PMID: 21486466 PMCID: PMC3094275 DOI: 10.1186/1471-2229-11-63] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2010] [Accepted: 04/12/2011] [Indexed: 05/21/2023]
Abstract
BACKGROUND Proteogenomics aims to utilize experimental proteome information for refinement of genome annotation. Since mass spectrometry-based shotgun proteomics approaches provide large-scale peptide sequencing data with high throughput, a data repository for shotgun proteogenomics would represent a valuable source of gene expression evidence at the translational level for genome re-annotation. DESCRIPTION Here, we present OryzaPG-DB, a rice proteome database based on shotgun proteogenomics, which incorporates the genomic features of experimental shotgun proteomics data. This version of the database was created from the results of 27 nanoLC-MS/MS runs on a hybrid ion trap-orbitrap mass spectrometer, which offers high accuracy for analyzing tryptic digests from undifferentiated cultured rice cells. Peptides were identified by searching the product ion spectra against the protein, cDNA, transcript and genome databases from Michigan State University, and were mapped to the rice genome. Approximately 3200 genes were covered by these peptides and 40 of them contained novel genomic features. Users can search, download or navigate the database per chromosome, gene, protein, cDNA or transcript and download the updated annotations in standard GFF3 format, with visualization in PNG format. In addition, the database scheme of OryzaPG was designed to be generic and can be reused to host similar proteogenomic information for other species. OryzaPG is the first proteogenomics-based database of the rice proteome, providing peptide-based expression profiles, together with the corresponding genomic origin, including the annotation of novelty for each peptide. CONCLUSIONS The OryzaPG database was constructed and is freely available at http://oryzapg.iab.keio.ac.jp/.
Collapse
Affiliation(s)
- Mohamed Helmy
- Institute for Advanced Biosciences, Keio University, 403-1 Daihoji, Tsuruoka, Yamagata 997-0017, Japan
- Systems Biology Program, Graduate School of Media and Governance, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-0882, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, 403-1 Daihoji, Tsuruoka, Yamagata 997-0017, Japan
| | - Yasushi Ishihama
- Institute for Advanced Biosciences, Keio University, 403-1 Daihoji, Tsuruoka, Yamagata 997-0017, Japan
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| |
Collapse
|
16
|
Braaksma M, Martens-Uzunova ES, Punt PJ, Schaap PJ. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data. BMC Genomics 2010; 11:584. [PMID: 20959013 PMCID: PMC3091731 DOI: 10.1186/1471-2164-11-584] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 10/19/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. RESULTS A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. CONCLUSIONS We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.
Collapse
|
17
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
18
|
Blakeley P, Siepen JA, Lawless C, Hubbard SJ. Investigating protein isoforms via proteomics: a feasibility study. Proteomics 2010; 10:1127-40. [PMID: 20077415 DOI: 10.1002/pmic.200900445] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Alternative splicing (AS) and processing of pre-messenger RNAs explains the discrepancy between the number of genes and proteome complexity in multicellular eukaryotic organisms. However, relatively few alternative protein isoforms have been experimentally identified, particularly at the protein level. In this study, we assess the ability of proteomics to inform on differently spliced protein isoforms in human and four other model eukaryotes. The number of Ensembl-annotated genes for which proteomic data exists that informs on AS exceeds 33% of the alternately spliced genes in the human and worm genomes. Examining AS in chicken via proteomics for the first time, we find support for over 600 AS genes. However, although peptide identifications support only a small fraction of alternative protein isoforms that are annotated in Ensembl, many more variants are amenable to proteomic identification. There remains a sizeable gap between these existing identifications (10-52% of AS genes) and those that are theoretically feasible (90-99%). We also compare annotations between Swiss-Prot and Ensembl, recommending use of both to maximize coverage of AS. We propose that targeted proteomic experiments using selected reactions and standards are essential to uncover further alternative isoforms and discuss the issues surrounding these strategies.
Collapse
Affiliation(s)
- Paul Blakeley
- Faculty of Life Sciences, Michael Smith Building, University of Manchester, Manchester, UK
| | | | | | | |
Collapse
|
19
|
Expression and export: recombinant protein production systems for Aspergillus. Appl Microbiol Biotechnol 2010; 87:1255-70. [DOI: 10.1007/s00253-010-2672-6] [Citation(s) in RCA: 118] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2010] [Revised: 05/07/2010] [Accepted: 05/08/2010] [Indexed: 11/26/2022]
|
20
|
Armengaud J. Proteogenomics and systems biology: quest for the ultimate missing parts. Expert Rev Proteomics 2010; 7:65-77. [DOI: 10.1586/epr.09.104] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
21
|
Abstract
The peptide identification problem lies at the heart of modern proteomic methodology, from which the presence of a particular protein or proteins in a sample may be inferred. The challenge is to find the most likely amino acid sequence, which corresponds to each tandem mass spectrum that has been collected, and produce some kind of score and associated statistical measure that the putative identification is correct. This approach assumes that the peptide (and parent protein) sequence in question is known and is present in the database which is to be searched, as opposed to de novo methods, which seek to identify the peptide ab initio. This chapter will provide an overview of the methods that common, popular software tools employ to search protein sequence databases to provide the non-expert reader with sufficient background to appreciate the choices they can make. This will cover the approaches used to compare experimental and theoretical spectra and some of the methods used to validate and provide higher confidence in the assignments.
Collapse
Affiliation(s)
- Simon J Hubbard
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, UK.
| |
Collapse
|
22
|
Krull R, Cordes C, Horn H, Kampen I, Kwade A, Neu TR, Nörtemann B. Morphology of filamentous fungi: linking cellular biology to process engineering using Aspergillus niger. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2010; 121:1-21. [PMID: 20490972 DOI: 10.1007/10_2009_60] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In various biotechnological processes, filamentous fungi, e.g. Aspergillus niger, are widely applied for the production of high value-added products due to their secretion efficiency. There is, however, a tangled relationship between the morphology of these microorganisms, the transport phenomena and the related productivity. The morphological characteristics vary between freely dispersed mycelia and distinct pellets of aggregated biomass. Hence, advantages and disadvantages for mycel or pellet cultivation have to be balanced out carefully. Due to this inadequate understanding of morphogenesis of filamentous microorganisms, fungal morphology, along with reproducibility of inocula of the same quality, is often a bottleneck of productivity in industrial production. To obtain an optimisation of the production process it is of great importance to gain a better understanding of the molecular and cell biology of these microorganisms as well as the approaches in biochemical engineering and particle technique, in particular to characterise the interactions between the growth conditions, cell morphology, spore-hyphae-interactions and product formation. Advances in particle and image analysis techniques as well as micromechanical devices and their applications to fungal cultivations have made available quantitative morphological data on filamentous cells. This chapter provides the ambitious aspects of this line of action, focussing on the control and characterisation of the morphology, the transport gradients and the approaches to understand the metabolism of filamentous fungi. Based on these data, bottlenecks in the morphogenesis of A. niger within the complex production pathways from gene to product should be identified and this may improve the production yield.
Collapse
Affiliation(s)
- Rainer Krull
- Institute of Biochemical Engineering, Technische Universität Braunschweig, Gaussstrasse 17, 38106, Braunschweig, Germany,
| | | | | | | | | | | | | |
Collapse
|
23
|
Baudet M, Ortet P, Gaillard JC, Fernandez B, Guérin P, Enjalbal C, Subra G, de Groot A, Barakat M, Dedieu A, Armengaud J. Proteomics-based refinement of Deinococcus deserti genome annotation reveals an unwonted use of non-canonical translation initiation codons. Mol Cell Proteomics 2009; 9:415-26. [PMID: 19875382 PMCID: PMC2830850 DOI: 10.1074/mcp.m900359-mcp200] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Deinococcaceae are a family of extremely radiation-tolerant bacteria that are currently subjected to numerous studies aimed at understanding the molecular mechanisms for such radiotolerance. To achieve a comprehensive and accurate annotation of the Deinococcus deserti genome, we performed an N terminus-oriented characterization of its proteome. For this, we used a labeling reagent, N-tris(2,4,6-trimethoxyphenyl)phosphonium acetyl succinimide, to selectively derivatize protein N termini. The large scale identification of N-tris(2,4,6-trimethoxyphenyl)phosphonium acetyl succinimide-modified N-terminal-most peptides by shotgun liquid chromatography-tandem mass spectrometry analysis led to the validation of 278 and the correction of 73 translation initiation codons in the D. deserti genome. In addition, four new genes were detected, three located on the main chromosome and one on plasmid P3. We also analyzed signal peptide cleavages on a genome-wide scale. Based on comparative proteogenomics analysis, we propose a set of 137 corrections to improve Deinococcus radiodurans and Deinococcus geothermalis gene annotations. Some of these corrections affect important genes involved in DNA repair mechanisms such as polA, ligA, and ddrB. Surprisingly, experimental evidences were obtained indicating that DnaA (the protein involved in the DNA replication initiation process) and RpsL (the S12 ribosomal conserved protein) translation is initiated in Deinococcaceae from non-canonical codons (ATC and CTG, respectively). Such use may be the basis of specific regulation mechanisms affecting replication and translation. We also report the use of non-conventional translation initiation codons for two other genes: Deide_03051 and infC. Whether such use of non-canonical translation initiation codons is much more frequent than for other previously reported bacterial phyla or restricted to Deinococcaceae remains to be investigated. Our results demonstrate that predicting translation initiation codons is still difficult for some bacteria and that proteomics-based refinement of genome annotations may be helpful in such cases.
Collapse
Affiliation(s)
- Mathieu Baudet
- Laboratoire de Biochimie des Systèmes Perturbés, Service de Biochimie et Toxicologie Nucléaire, Institut de Biologie Environnementale et Biotechnologie (iBEB), Direction des Sciences du Vivant (DSV), Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), F-30207 Bagnols-sur-Cèze, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum. BMC Bioinformatics 2009; 10:301. [PMID: 19772613 PMCID: PMC2753851 DOI: 10.1186/1471-2105-10-301] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 09/22/2009] [Indexed: 12/23/2022] Open
Abstract
Background Stagonospora nodorum, a fungal ascomycete in the class dothideomycetes, is a damaging pathogen of wheat. It is a model for necrotrophic fungi that cause necrotic symptoms via the interaction of multiple effector proteins with cultivar-specific receptors. A draft genome sequence and annotation was published in 2007. A second-pass gene prediction using a training set of 795 fully EST-supported genes predicted a total of 10762 version 2 nuclear-encoded genes, with an additional 5354 less reliable version 1 genes also retained. Results In this study, we subjected soluble mycelial proteins to proteolysis followed by 2D LC MALDI-MS/MS. Comparison of the detected peptides with the gene models validated 2134 genes. 62% of these genes (1324) were not supported by prior EST evidence. Of the 2134 validated genes, all but 188 were version 2 annotations. Statistical analysis of the validated gene models revealed a preponderance of cytoplasmic and nuclear localised proteins, and proteins with intracellular-associated GO terms. These statistical associations are consistent with the source of the peptides used in the study. Comparison with a 6-frame translation of the S. nodorum genome assembly confirmed 905 existing gene annotations (including 119 not previously confirmed) and provided evidence supporting 144 genes with coding exon frameshift modifications, 604 genes with extensions of coding exons into annotated introns or untranslated regions (UTRs), 3 new gene annotations which were supported by tblastn to NR, and 44 potential new genes residing within un-assembled regions of the genome. Conclusion We conclude that 2D LC MALDI-MS/MS is a powerful, rapid and economical tool to aid in the annotation of fungal genomic assemblies.
Collapse
|
25
|
Tan KC, Ipcho SVS, Trengove RD, Oliver RP, Solomon PS. Assessing the impact of transcriptomics, proteomics and metabolomics on fungal phytopathology. MOLECULAR PLANT PATHOLOGY 2009; 10:703-15. [PMID: 19694958 PMCID: PMC6640398 DOI: 10.1111/j.1364-3703.2009.00565.x] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
SUMMARY Peer-reviewed literature is today littered with exciting new tools and techniques that are being used in all areas of biology and medicine. Transcriptomics, proteomics and, more recently, metabolomics are three of these techniques that have impacted on fungal plant pathology. Used individually, each of these techniques can generate a plethora of data that could occupy a laboratory for years. When used in combination, they have the potential to comprehensively dissect a system at the transcriptional and translational level. Transcriptomics, or quantitative gene expression profiling, is arguably the most familiar to researchers in the field of fungal plant pathology. Microarrays have been the primary technique for the last decade, but others are now emerging. Proteomics has also been exploited by the fungal phytopathogen community, but perhaps not to its potential. A lack of genome sequence information has frustrated proteomics researchers and has largely contributed to this technique not fulfilling its potential. The coming of the genome sequencing era has partially alleviated this problem. Metabolomics is the most recent of these techniques to emerge and is concerned with the non-targeted profiling of all metabolites in a given system. Metabolomics studies on fungal plant pathogens are only just beginning to appear, although its potential to dissect many facets of the pathogen and disease will see its popularity increase quickly. This review assesses the impact of transcriptomics, proteomics and metabolomics on fungal plant pathology over the last decade and discusses their futures. Each of the techniques is described briefly with further reading recommended. Key examples highlighting the application of these technologies to fungal plant pathogens are also reviewed.
Collapse
Affiliation(s)
- Kar-Chun Tan
- Australian Centre for Necrotrophic Fungal Pathogens, SABC, Faculty of Health Sciences, Murdoch University, Murdoch 6150, Australia
| | | | | | | | | |
Collapse
|