1
|
Pevzner P, Vingron M, Reidys C, Sun F, Istrail S. Michael Waterman's Contributions to Computational Biology and Bioinformatics. J Comput Biol 2022; 29:601-615. [PMID: 35727100 DOI: 10.1089/cmb.2022.29066.pp] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
On the occasion of Dr. Michael Waterman's 80th birthday, we review his major contributions to the field of computational biology and bioinformatics including the famous Smith-Waterman algorithm for sequence alignment, the probability and statistics theory related to sequence alignment, algorithms for sequence assembly, the Lander-Waterman model for genome physical mapping, combinatorics and predictions of ribonucleic acid structures, word counting statistics in molecular sequences, alignment-free sequence comparison, and algorithms for haplotype block partition and tagSNP selection related to the International HapMap Project. His books Introduction to Computational Biology: Maps, Sequences and Genomes for graduate students and Computational Genome Analysis: An Introduction geared toward undergraduate students played key roles in computational biology and bioinformatics education. We also highlight his efforts of building the computational biology and bioinformatics community as the founding editor of the Journal of Computational Biology and a founding member of the International Conference on Research in Computational Molecular Biology (RECOMB).
Collapse
Affiliation(s)
- Pavel Pevzner
- Department of Computer Science and Engineering, University of California San Diego, San Diego, California, USA
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Christian Reidys
- Department of Mathematics, Biocomplexity Institute & Initiative, University of Virginia, Charlottesville, Virginia, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Sorin Istrail
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
2
|
Istrail S, Pevzner P, Sun F, Vingron M. Special Issue: Professor Michael Waterman's 80th Birthday, Part 1. J Comput Biol 2022. [PMID: 35704861 DOI: 10.1089/cmb.2022.29065.si] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
3
|
Abstract
The V(D)J recombination process rearranges the variable (V), diversity (D), and joining (J) genes in the immunoglobulin loci to generate antibody repertoires. Annotation of these loci across various species and predicting the V, D, and J genes (IG genes) is critical for studies of the adaptive immune system. However, since the standard gene finding algorithms are not suitable for predicting IG genes, they have been semi-manually annotated in very few species. We developed the IGDetective algorithm for predicting IG genes and applied it to species with the assembled IG loci. IGDetective generated the first large collection of IG genes across many species and enabled their evolutionary analysis, including the analysis of the "bat IG diversity" hypothesis. This analysis revealed extremely conserved V genes in evolutionary distant species indicating that these genes may be subjected to the same selective pressure, e.g., pressure driven by common pathogens. IGDetective also revealed extremely diverged V genes and a new family of evolutionary conserved V genes in bats with unusual noncanonical cysteines. Moreover, in difference from all other previously reported antibodies, these cysteines are located within complementarity-determining regions. Since cysteines form disulfide bonds, we hypothesize that these cysteine-rich V genes might generate antibodies with noncanonical conformations and could potentially form a unique part of the immune repertoire in bats. We also analyzed the diversity landscape of the recombination signal sequences and revealed their features that trigger the high/low usage of the IG genes.
Collapse
|
4
|
Chen Z, Pham L, Wu TC, Mo G, Xia Y, Chang PL, Porter D, Phan T, Che H, Tran H, Bansal V, Shaffer J, Belda-Ferre P, Humphrey G, Knight R, Pevzner P, Pham S, Wang Y, Lei M. Erratum: Ultralow-input single-tube linked-read library method enables short-read second-generation sequencing systems to routinely generate highly accurate and economical long-range sequencing information. Genome Res 2021; 31:934. [PMID: 33941606 DOI: 10.1101/gr.275614.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
5
|
Benler S, Yutin N, Antipov D, Rayko M, Shmakov S, Gussow AB, Pevzner P, Koonin EV. Thousands of previously unknown phages discovered in whole-community human gut metagenomes. Microbiome 2021; 9:78. [PMID: 33781338 PMCID: PMC8008677 DOI: 10.1186/s40168-021-01017-w] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 02/02/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND Double-stranded DNA bacteriophages (dsDNA phages) play pivotal roles in structuring human gut microbiomes; yet, the gut virome is far from being fully characterized, and additional groups of phages, including highly abundant ones, continue to be discovered by metagenome mining. A multilevel framework for taxonomic classification of viruses was recently adopted, facilitating the classification of phages into evolutionary informative taxonomic units based on hallmark genes. Together with advanced approaches for sequence assembly and powerful methods of sequence analysis, this revised framework offers the opportunity to discover and classify unknown phage taxa in the human gut. RESULTS A search of human gut metagenomes for circular contigs encoding phage hallmark genes resulted in the identification of 3738 apparently complete phage genomes that represent 451 putative genera. Several of these phage genera are only distantly related to previously identified phages and are likely to found new families. Two of the candidate families, "Flandersviridae" and "Quimbyviridae", include some of the most common and abundant members of the human gut virome that infect Bacteroides, Parabacteroides, and Prevotella. The third proposed family, "Gratiaviridae," consists of less abundant phages that are distantly related to the families Autographiviridae, Drexlerviridae, and Chaseviridae. Analysis of CRISPR spacers indicates that phages of all three putative families infect bacteria of the phylum Bacteroidetes. Comparative genomic analysis of the three candidate phage families revealed features without precedent in phage genomes. Some "Quimbyviridae" phages possess Diversity-Generating Retroelements (DGRs) that generate hypervariable target genes nested within defense-related genes, whereas the previously known targets of phage-encoded DGRs are structural genes. Several "Flandersviridae" phages encode enzymes of the isoprenoid pathway, a lipid biosynthesis pathway that so far has not been known to be manipulated by phages. The "Gratiaviridae" phages encode a HipA-family protein kinase and glycosyltransferase, suggesting these phages modify the host cell wall, preventing superinfection by other phages. Hundreds of phages in these three and other families are shown to encode catalases and iron-sequestering enzymes that can be predicted to enhance cellular tolerance to reactive oxygen species. CONCLUSIONS Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of Caudovirales and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse, and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes. Video abstract.
Collapse
Affiliation(s)
- Sean Benler
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894 USA
| | - Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894 USA
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004 Russia
| | - Mikhail Rayko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004 Russia
| | - Sergey Shmakov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894 USA
| | - Ayal B. Gussow
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894 USA
| | - Pavel Pevzner
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004 Russia
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093 USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894 USA
| |
Collapse
|
6
|
Chen Z, Pham L, Wu TC, Mo G, Xia Y, Chang PL, Porter D, Phan T, Che H, Tran H, Bansal V, Shaffer J, Belda-Ferre P, Humphrey G, Knight R, Pevzner P, Pham S, Wang Y, Lei M. Ultralow-input single-tube linked-read library method enables short-read second-generation sequencing systems to routinely generate highly accurate and economical long-range sequencing information. Genome Res 2020; 30:898-909. [PMID: 32540955 PMCID: PMC7370886 DOI: 10.1101/gr.260380.119] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/10/2020] [Indexed: 02/06/2023]
Abstract
Long-range sequencing information is required for haplotype phasing, de novo assembly, and structural variation detection. Current long-read sequencing technologies can provide valuable long-range information but at a high cost with low accuracy and high DNA input requirements. We have developed a single-tube Transposase Enzyme Linked Long-read Sequencing (TELL-seq) technology, which enables a low-cost, high-accuracy, and high-throughput short-read second-generation sequencer to generate over 100 kb of long-range sequencing information with as little as 0.1 ng input material. In a PCR tube, millions of clonally barcoded beads are used to uniquely barcode long DNA molecules in an open bulk reaction without dilution and compartmentation. The barcoded linked-reads are used to successfully assemble genomes ranging from microbes to human. These linked-reads also generate megabase-long phased blocks and provide a cost-effective tool for detecting structural variants in a genome, which are important to identify compound heterozygosity in recessive Mendelian diseases and discover genetic drivers and diagnostic biomarkers in cancers.
Collapse
Affiliation(s)
- Zhoutao Chen
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Long Pham
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Tsai-Chin Wu
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Guoya Mo
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Yu Xia
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Peter L Chang
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Devin Porter
- Universal Sequencing Technology Corporation, Carlsbad, California 92011, USA
| | - Tan Phan
- Bioturing Incorporated, San Diego, California 92121, USA
| | - Huu Che
- Bioturing Incorporated, San Diego, California 92121, USA
| | - Hao Tran
- Bioturing Incorporated, San Diego, California 92121, USA.,Faculty of Information Technology, University of Science, Vietnam National University, Ho Chi Minh City, 700 000 Vietnam
| | - Vikas Bansal
- Department of Pediatrics, University of California San Diego, La Jolla, California 92161, USA
| | - Justin Shaffer
- Center for Microbiome Innovation and Departments of Pediatrics, Bioengineering, and Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Pedro Belda-Ferre
- Center for Microbiome Innovation and Departments of Pediatrics, Bioengineering, and Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Greg Humphrey
- Center for Microbiome Innovation and Departments of Pediatrics, Bioengineering, and Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Rob Knight
- Center for Microbiome Innovation and Departments of Pediatrics, Bioengineering, and Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Son Pham
- Bioturing Incorporated, San Diego, California 92121, USA
| | - Yong Wang
- Universal Sequencing Technology Corporation, Canton, Massachusetts 02021, USA
| | - Ming Lei
- Universal Sequencing Technology Corporation, Canton, Massachusetts 02021, USA
| |
Collapse
|
7
|
Bhardwaj V, Safonova Y, Franceschetti M, Rao R, Pevzner P. Comparative analysis of immunoglobulin IGHD genes in vertebrates. The Journal of Immunology 2019. [DOI: 10.4049/jimmunol.202.supp.131.38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Abstract
Finding germline immunoglobulin (Ig) V, D, and J genes is a preliminary step in many antibody studies, including development of monoclonal antibody drugs. Recent studies show that the IMGT database, the state-of-the-art database of germline Ig genes, is incomplete for many species. Moreover, modern pharmaceutical studies often involve species not present in the IMGT database (e.g., hamsters, llamas). While V and J genes are conservative and can be inferred by homology, D genes among species are highly diverse. As modern sequencing technologies (Rep-Seq) have enabled scanning millions of antibody RNAs from a single organism, we developed a tool, MINING-D, for de novo inference of functional D genes using Rep-Seq data.
We applied MINING-D to 261 publicly available Rep-Seq datasets: 160 human and 101 non-human datasets, including mouse, rat, rabbit, camel, and rhesus macaque species. Most known D genes, their known and novel variations, and some D genes not presented in the IMGT database were inferred. We further validated a significant number of the novel genes and variations using genomic data.
Thus, our tool circumvents the need of an existing database and infers the functional D genes directly from expressed Ig RNAs. It can potentially speed up the process of antibody sequencing and design of antibody drugs for species with unknown genes. Results from multiple datasets enabled comparative analysis of D genes. For example, while the same germline D genes contribute to the diversity of antibody repertoires in healthy humans, only few of them are used in specific repertoires (e.g., hepatitis) and thus can be critical for recognition of their antigens. We suggest that the obtained results will help in disease monitoring and vaccine design.
Collapse
|
8
|
Rozanov DV, Rozanov ND, Chiotti KE, Reddy A, Wilmarth PA, David LL, Cha SW, Woo S, Pevzner P, Bafna V, Burrows GG, Rantala JK, Levin T, Anur P, Johnson-Camacho K, Tabatabaei S, Munson DJ, Bruno TC, Slansky JE, Kappler JW, Hirano N, Boegel S, Fox BA, Egelston C, Simons DL, Jimenez G, Lee PP, Gray JW, Spellman PT. MHC class I loaded ligands from breast cancer cell lines: A potential HLA-I-typed antigen collection. J Proteomics 2018; 176:13-23. [PMID: 29331515 DOI: 10.1016/j.jprot.2018.01.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 12/01/2017] [Accepted: 01/04/2018] [Indexed: 12/30/2022]
Abstract
To build a catalog of peptides presented by breast cancer cells, we undertook systematic MHC class I immunoprecipitation followed by elution of MHC class I-loaded peptides in breast cancer cells. We determined the sequence of 3196 MHC class I ligands representing 1921 proteins from a panel of 20 breast cancer cell lines. After removing duplicate peptides, i.e., the same peptide eluted from more than one cell line, the total number of unique peptides was 2740. Of the unique peptides eluted, more than 1750 had been previously identified, and of these, sixteen have been shown to be immunogenic. Importantly, half of these immunogenic peptides were shared between different breast cancer cell lines. MHC class I binding probability was used to plot the distribution of the eluted peptides in accordance with the binding score for each breast cancer cell line. We also determined that the tested breast cancer cells presented 89 mutation-containing peptides and peptides derived from aberrantly translated genes, 7 of which were shared between four or two different cell lines. Overall, the high throughput identification of MHC class I-loaded peptides is an effective strategy for systematic characterization of cancer peptides, and could be employed for design of multi-peptide anticancer vaccines. SIGNIFICANCE By employing proteomic analyses of eluted peptides from breast cancer cells, the current study has built an initial HLA-I-typed antigen collection for breast cancer research. It was also determined that immunogenic epitopes can be identified using established cell lines and that shared immunogenic peptides can be found in different cancer types such as breast cancer and leukemia. Importantly, out of 3196 eluted peptides that included duplicate peptides in different cells 89 peptides either contained mutation in their sequence or were derived from aberrant translation suggesting that mutation-containing epitopes are on the order of 2-3% in breast cancer cells. Finally, our results suggest that interfering with MHC class I function is one of the mechanisms of how tumor cells escape immune system attack.
Collapse
Affiliation(s)
- Dmitri V Rozanov
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, United States.
| | | | - Kami E Chiotti
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, United States
| | - Ashok Reddy
- Proteomics Shared Resource, Oregon Health and Science University, Portland, OR, United States
| | - Phillip A Wilmarth
- Proteomics Shared Resource, Oregon Health and Science University, Portland, OR, United States
| | - Larry L David
- Proteomics Shared Resource, Oregon Health and Science University, Portland, OR, United States
| | - Seung W Cha
- Electrical and Computer Engineering, University of California, San Diego, CA, United States
| | - Sunghee Woo
- School of Medicine, Johns Hopkins University, Baltimore, MD, United States
| | - Pavel Pevzner
- The NIH Center for Computational Mass Spectrometry, University of California, San Diego, San Diego, CA, United States
| | - Vineet Bafna
- Computer Science & Engineering, University of California, San Diego, CA, United States
| | - Gregory G Burrows
- Neurology and Biochemistry & Molecular Biology, Oregon Health and Science University, Portland, OR, United States
| | | | - Trevor Levin
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, United States
| | - Pavana Anur
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, United States
| | - Katie Johnson-Camacho
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, United States
| | - Shaadi Tabatabaei
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, United States
| | - Daniel J Munson
- Department of Immunology & Microbiology, University of Colorado, Denver, CO, United States
| | - Tullia C Bruno
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jill E Slansky
- Department of Immunology & Microbiology, University of Colorado, Denver, CO, United States
| | - John W Kappler
- National Jewish Medical and Research Center, Denver, CO, United States
| | - Naoto Hirano
- Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Sebastian Boegel
- University Medical Center, Johannes Gutenberg-University, Mainz, Germany
| | - Bernard A Fox
- Laboratory of Molecular and Tumor Immunology, Chiles Research Institute Providence PDX Medical Center, Portland, OR, United States
| | - Colt Egelston
- City of Hope National Medical Center, Duarte, CA, United States
| | - Diana L Simons
- City of Hope National Medical Center, Duarte, CA, United States
| | - Grecia Jimenez
- City of Hope National Medical Center, Duarte, CA, United States
| | - Peter P Lee
- City of Hope National Medical Center, Duarte, CA, United States
| | - Joe W Gray
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, United States; Center for Health & Healing, Oregon Health and Science University, Portland, OR, United States
| | - Paul T Spellman
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, United States
| |
Collapse
|
9
|
Wang M, Carver JJ, Phelan VV, Sanchez LM, Garg N, Peng Y, Nguyen DD, Watrous J, Kapono CA, Luzzatto-Knaan T, Porto C, Bouslimani A, Melnik AV, Meehan MJ, Liu WT, Crüsemann M, Boudreau PD, Esquenazi E, Sandoval-Calderón M, Kersten RD, Pace LA, Quinn RA, Duncan KR, Hsu CC, Floros DJ, Gavilan RG, Kleigrewe K, Northen T, Dutton RJ, Parrot D, Carlson EE, Aigle B, Michelsen CF, Jelsbak L, Sohlenkamp C, Pevzner P, Edlund A, McLean J, Piel J, Murphy BT, Gerwick L, Liaw CC, Yang YL, Humpf HU, Maansson M, Keyzers RA, Sims AC, Johnson AR, Sidebottom AM, Sedio BE, Klitgaard A, Larson CB, P CAB, Torres-Mendoza D, Gonzalez DJ, Silva DB, Marques LM, Demarque DP, Pociute E, O'Neill EC, Briand E, Helfrich EJN, Granatosky EA, Glukhov E, Ryffel F, Houson H, Mohimani H, Kharbush JJ, Zeng Y, Vorholt JA, Kurita KL, Charusanti P, McPhail KL, Nielsen KF, Vuong L, Elfeki M, Traxler MF, Engene N, Koyama N, Vining OB, Baric R, Silva RR, Mascuch SJ, Tomasi S, Jenkins S, Macherla V, Hoffman T, Agarwal V, Williams PG, Dai J, Neupane R, Gurr J, Rodríguez AMC, Lamsa A, Zhang C, Dorrestein K, Duggan BM, Almaliti J, Allard PM, Phapale P, Nothias LF, Alexandrov T, Litaudon M, Wolfender JL, Kyle JE, Metz TO, Peryea T, Nguyen DT, VanLeer D, Shinn P, Jadhav A, Müller R, Waters KM, Shi W, Liu X, Zhang L, Knight R, Jensen PR, Palsson BO, Pogliano K, Linington RG, Gutiérrez M, Lopes NP, Gerwick WH, Moore BS, Dorrestein PC, Bandeira N. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol 2017; 34:828-837. [PMID: 27504778 DOI: 10.1038/nbt.3597] [Citation(s) in RCA: 2254] [Impact Index Per Article: 322.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 05/10/2016] [Indexed: 12/14/2022]
Abstract
The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.
Collapse
Affiliation(s)
- Mingxun Wang
- Computer Science and Engineering, UC San Diego, La Jolla, United States.,Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States
| | - Jeremy J Carver
- Computer Science and Engineering, UC San Diego, La Jolla, United States.,Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States
| | - Vanessa V Phelan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Laura M Sanchez
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Neha Garg
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Yao Peng
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Don Duy Nguyen
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Jeramie Watrous
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Clifford A Kapono
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Tal Luzzatto-Knaan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Carla Porto
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Amina Bouslimani
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Alexey V Melnik
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Michael J Meehan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Wei-Ting Liu
- Department of Microbiology and Immunology, Stanford University, Palo Alto, United States
| | - Max Crüsemann
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Paul D Boudreau
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | | | | | | | - Laura A Pace
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Robert A Quinn
- Biology Department, San Diego State University, San Diego, United States
| | - Katherine R Duncan
- Scottish Association for Marine Science, Scottish Marine Institute, Oban, United Kingdom.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Cheng-Chih Hsu
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Dimitrios J Floros
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Ronnie G Gavilan
- Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | - Karin Kleigrewe
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Trent Northen
- Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, United States
| | - Rachel J Dutton
- FAS Center for Systems Biology, Harvard, Cambridge, United States
| | - Delphine Parrot
- Produits naturels - Synthèses - Chimie Médicinale, University of Rennes 1, Rennes Cedex, France
| | - Erin E Carlson
- Chemistry, University of Minnesota, Minneapolis, United States
| | - Bertrand Aigle
- Dynamique des Génomes et Adaptation Microbienne, University of Lorraine, Vandœuvre-lès-Nancy, France
| | | | - Lars Jelsbak
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Christian Sohlenkamp
- Centro de Ciencias Genómicas, Universidad Nacional Autonoma de Mexico, Cuernavaca, Mexico
| | - Pavel Pevzner
- Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States.,Computer Science and Engineering, UC San Diego, La Jolla, United States
| | - Anna Edlund
- Microbial and Environmental Genomics, J. Craig Venter Institute, La Jolla, United States.,School of Dentistry, UC Los Angeles, Los Angeles, United States
| | - Jeffrey McLean
- Department of Periodontics, University of Washington, Seattle, United States.,School of Dentistry, UC Los Angeles, Los Angeles, United States
| | - Jörn Piel
- Institute of Microbiology, ETH Zurich, Zurich, Switzerland
| | - Brian T Murphy
- Department of Medicinal Chemistry and Pharmacognosy, University of Illinois Chicago, Chicago, United States
| | - Lena Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Chih-Chuang Liaw
- Department of Marine Biotechnology and Resources, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Yu-Liang Yang
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Hans-Ulrich Humpf
- Institute of Food Chemistry, University of Münster, Münster, Germany
| | - Maria Maansson
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Robert A Keyzers
- School of Chemical & Physical Sciences, and Centre for Biodiscovery, Victoria University of Wellington, Wellington, New Zealand
| | - Amy C Sims
- Gillings School of Global Public Health, Department of Epidemiology, UNC Chapel Hill, Chapel Hill, United States
| | - Andrew R Johnson
- Department of Chemistry, Indiana University, Bloomington, United States
| | | | - Brian E Sedio
- Smithsonian Tropical Research Institute, Ancón, Panama.,Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | - Andreas Klitgaard
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Charles B Larson
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Cristopher A Boya P
- Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | | | - David J Gonzalez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Denise B Silva
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil.,Centro de Ciencias Biologicas e da Saude, Universidade Fderal de Mato Grosso do Sul, Campo Grande, Brazil
| | - Lucas M Marques
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Daniel P Demarque
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Egle Pociute
- Sirenas Marine Discovery, San Diego, United States
| | - Ellis C O'Neill
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Enora Briand
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,UMR CNRS 6553 ECOBIO, University of Rennes 1, Rennes Cedex, France
| | | | - Eve A Granatosky
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, United States
| | - Evgenia Glukhov
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Florian Ryffel
- Institute of Microbiology, ETH Zurich, Zurich, Switzerland
| | | | - Hosein Mohimani
- Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States
| | - Jenan J Kharbush
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Yi Zeng
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | | | - Kenji L Kurita
- PBSci-Chemistry & Biochemistry Department, UC Santa Cruz, Santa Cruz, United States
| | - Pep Charusanti
- Department of Bioengineering, UC San Diego, La Jolla, United States
| | - Kerry L McPhail
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, United States
| | | | - Lisa Vuong
- Sirenas Marine Discovery, San Diego, United States
| | - Maryam Elfeki
- Department of Medicinal Chemistry and Pharmacognosy, University of Illinois Chicago, Chicago, United States
| | - Matthew F Traxler
- Department of Plant and Microbial Biology, UC Berkeley, Berkeley, United States
| | - Niclas Engene
- Department of Biological Sciences, Florida International University, Miami, United States
| | - Nobuhiro Koyama
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Oliver B Vining
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, United States
| | - Ralph Baric
- Gillings School of Global Public Health, Department of Epidemiology, UNC Chapel Hill, Chapel Hill, United States
| | - Ricardo R Silva
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Samantha J Mascuch
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Sophie Tomasi
- Produits naturels - Synthèses - Chimie Médicinale, University of Rennes 1, Rennes Cedex, France
| | - Stefan Jenkins
- Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, United States
| | | | - Thomas Hoffman
- Department of Pharmaceutical Biotechnology, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany
| | - Vinayak Agarwal
- Center for Oceans and Human Health, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Philip G Williams
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Jingqui Dai
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Ram Neupane
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Joshua Gurr
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Andrés M C Rodríguez
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Anne Lamsa
- Division of Biological Sciences, UC San Diego, La Jolla, United States
| | - Chen Zhang
- Department of Nanoengineering, UC San Diego, La Jolla, United States
| | - Kathleen Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Brendan M Duggan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Jehad Almaliti
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Pierre-Marie Allard
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
| | - Prasad Phapale
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Louis-Felix Nothias
- Institut de Chimie des Substances Naturelles, CNRS-ICSN, UPR 2301, Labex CEBA, University of Paris-Saclay, Gif-sur-Yvette, France
| | - Theodore Alexandrov
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Marc Litaudon
- Institut de Chimie des Substances Naturelles, CNRS-ICSN, UPR 2301, Labex CEBA, University of Paris-Saclay, Gif-sur-Yvette, France
| | - Jean-Luc Wolfender
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
| | - Jennifer E Kyle
- Biological Sciences, Pacific Northwest National Laboratory, Richland, United States
| | - Thomas O Metz
- Biological Sciences, Pacific Northwest National Laboratory, Richland, United States
| | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Danielle VanLeer
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Paul Shinn
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Ajit Jadhav
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Rolf Müller
- Department of Pharmaceutical Biotechnology, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany
| | - Katrina M Waters
- Biological Sciences, Pacific Northwest National Laboratory, Richland, United States
| | - Wenyuan Shi
- School of Dentistry, UC Los Angeles, Los Angeles, United States
| | - Xueting Liu
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Lixin Zhang
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Rob Knight
- Department of Pediatrics, UC San Diego, La Jolla, United States
| | - Paul R Jensen
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | | | - Kit Pogliano
- Division of Biological Sciences, UC San Diego, La Jolla, United States
| | - Roger G Linington
- PBSci-Chemistry & Biochemistry Department, UC Santa Cruz, Santa Cruz, United States
| | - Marcelino Gutiérrez
- Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | - Norberto P Lopes
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - William H Gerwick
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Bradley S Moore
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,Center for Oceans and Human Health, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States.,Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| |
Collapse
|
10
|
Kennedy E, Kolmogorov M, Dong Z, Pevzner P, Timp G. Single Molecule Identification Against Proteomes using Sub-Nanometer Pores. Biophys J 2017. [DOI: 10.1016/j.bpj.2016.11.2629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
11
|
Wang M, Carver JJ, Phelan VV, Sanchez LM, Garg N, Peng Y, Nguyen DD, Watrous J, Kapono CA, Luzzatto-Knaan T, Porto C, Bouslimani A, Melnik AV, Meehan MJ, Liu WT, Crüsemann M, Boudreau PD, Esquenazi E, Sandoval-Calderón M, Kersten RD, Pace LA, Quinn RA, Duncan KR, Hsu CC, Floros DJ, Gavilan RG, Kleigrewe K, Northen T, Dutton RJ, Parrot D, Carlson EE, Aigle B, Michelsen CF, Jelsbak L, Sohlenkamp C, Pevzner P, Edlund A, McLean J, Piel J, Murphy BT, Gerwick L, Liaw CC, Yang YL, Humpf HU, Maansson M, Keyzers RA, Sims AC, Johnson AR, Sidebottom AM, Sedio BE, Klitgaard A, Larson CB, P CAB, Torres-Mendoza D, Gonzalez DJ, Silva DB, Marques LM, Demarque DP, Pociute E, O'Neill EC, Briand E, Helfrich EJN, Granatosky EA, Glukhov E, Ryffel F, Houson H, Mohimani H, Kharbush JJ, Zeng Y, Vorholt JA, Kurita KL, Charusanti P, McPhail KL, Nielsen KF, Vuong L, Elfeki M, Traxler MF, Engene N, Koyama N, Vining OB, Baric R, Silva RR, Mascuch SJ, Tomasi S, Jenkins S, Macherla V, Hoffman T, Agarwal V, Williams PG, Dai J, Neupane R, Gurr J, Rodríguez AMC, Lamsa A, Zhang C, Dorrestein K, Duggan BM, Almaliti J, Allard PM, Phapale P, Nothias LF, Alexandrov T, Litaudon M, Wolfender JL, Kyle JE, Metz TO, Peryea T, Nguyen DT, VanLeer D, Shinn P, Jadhav A, Müller R, Waters KM, Shi W, Liu X, Zhang L, Knight R, Jensen PR, Palsson BO, Pogliano K, Linington RG, Gutiérrez M, Lopes NP, Gerwick WH, Moore BS, Dorrestein PC, Bandeira N. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol 2016. [PMID: 27504778 DOI: 10.1038/nbt.3597.sharing] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.
Collapse
Affiliation(s)
- Mingxun Wang
- Computer Science and Engineering, UC San Diego, La Jolla, United States.,Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States
| | - Jeremy J Carver
- Computer Science and Engineering, UC San Diego, La Jolla, United States.,Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States
| | - Vanessa V Phelan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Laura M Sanchez
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Neha Garg
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Yao Peng
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Don Duy Nguyen
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Jeramie Watrous
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Clifford A Kapono
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Tal Luzzatto-Knaan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Carla Porto
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Amina Bouslimani
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Alexey V Melnik
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Michael J Meehan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Wei-Ting Liu
- Department of Microbiology and Immunology, Stanford University, Palo Alto, United States
| | - Max Crüsemann
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Paul D Boudreau
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | | | | | | | - Laura A Pace
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Robert A Quinn
- Biology Department, San Diego State University, San Diego, United States
| | - Katherine R Duncan
- Scottish Association for Marine Science, Scottish Marine Institute, Oban, United Kingdom.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Cheng-Chih Hsu
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Dimitrios J Floros
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | - Ronnie G Gavilan
- Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | - Karin Kleigrewe
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Trent Northen
- Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, United States
| | - Rachel J Dutton
- FAS Center for Systems Biology, Harvard, Cambridge, United States
| | - Delphine Parrot
- Produits naturels - Synthèses - Chimie Médicinale, University of Rennes 1, Rennes Cedex, France
| | - Erin E Carlson
- Chemistry, University of Minnesota, Minneapolis, United States
| | - Bertrand Aigle
- Dynamique des Génomes et Adaptation Microbienne, University of Lorraine, Vandœuvre-lès-Nancy, France
| | | | - Lars Jelsbak
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Christian Sohlenkamp
- Centro de Ciencias Genómicas, Universidad Nacional Autonoma de Mexico, Cuernavaca, Mexico
| | - Pavel Pevzner
- Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States.,Computer Science and Engineering, UC San Diego, La Jolla, United States
| | - Anna Edlund
- Microbial and Environmental Genomics, J. Craig Venter Institute, La Jolla, United States.,School of Dentistry, UC Los Angeles, Los Angeles, United States
| | - Jeffrey McLean
- Department of Periodontics, University of Washington, Seattle, United States.,School of Dentistry, UC Los Angeles, Los Angeles, United States
| | - Jörn Piel
- Institute of Microbiology, ETH Zurich, Zurich, Switzerland
| | - Brian T Murphy
- Department of Medicinal Chemistry and Pharmacognosy, University of Illinois Chicago, Chicago, United States
| | - Lena Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Chih-Chuang Liaw
- Department of Marine Biotechnology and Resources, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Yu-Liang Yang
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Hans-Ulrich Humpf
- Institute of Food Chemistry, University of Münster, Münster, Germany
| | - Maria Maansson
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Robert A Keyzers
- School of Chemical & Physical Sciences, and Centre for Biodiscovery, Victoria University of Wellington, Wellington, New Zealand
| | - Amy C Sims
- Gillings School of Global Public Health, Department of Epidemiology, UNC Chapel Hill, Chapel Hill, United States
| | - Andrew R Johnson
- Department of Chemistry, Indiana University, Bloomington, United States
| | | | - Brian E Sedio
- Smithsonian Tropical Research Institute, Ancón, Panama.,Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | - Andreas Klitgaard
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Charles B Larson
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Cristopher A Boya P
- Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | | | - David J Gonzalez
- Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Denise B Silva
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil.,Centro de Ciencias Biologicas e da Saude, Universidade Fderal de Mato Grosso do Sul, Campo Grande, Brazil
| | - Lucas M Marques
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Daniel P Demarque
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Egle Pociute
- Sirenas Marine Discovery, San Diego, United States
| | - Ellis C O'Neill
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Enora Briand
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,UMR CNRS 6553 ECOBIO, University of Rennes 1, Rennes Cedex, France
| | | | - Eve A Granatosky
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, United States
| | - Evgenia Glukhov
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Florian Ryffel
- Institute of Microbiology, ETH Zurich, Zurich, Switzerland
| | | | - Hosein Mohimani
- Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States
| | - Jenan J Kharbush
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Yi Zeng
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, United States
| | | | - Kenji L Kurita
- PBSci-Chemistry & Biochemistry Department, UC Santa Cruz, Santa Cruz, United States
| | - Pep Charusanti
- Department of Bioengineering, UC San Diego, La Jolla, United States
| | - Kerry L McPhail
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, United States
| | | | - Lisa Vuong
- Sirenas Marine Discovery, San Diego, United States
| | - Maryam Elfeki
- Department of Medicinal Chemistry and Pharmacognosy, University of Illinois Chicago, Chicago, United States
| | - Matthew F Traxler
- Department of Plant and Microbial Biology, UC Berkeley, Berkeley, United States
| | - Niclas Engene
- Department of Biological Sciences, Florida International University, Miami, United States
| | - Nobuhiro Koyama
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Oliver B Vining
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, United States
| | - Ralph Baric
- Gillings School of Global Public Health, Department of Epidemiology, UNC Chapel Hill, Chapel Hill, United States
| | - Ricardo R Silva
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Samantha J Mascuch
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Sophie Tomasi
- Produits naturels - Synthèses - Chimie Médicinale, University of Rennes 1, Rennes Cedex, France
| | - Stefan Jenkins
- Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, United States
| | | | - Thomas Hoffman
- Department of Pharmaceutical Biotechnology, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany
| | - Vinayak Agarwal
- Center for Oceans and Human Health, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Philip G Williams
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Jingqui Dai
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Ram Neupane
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Joshua Gurr
- Department of Chemistry, University of Hawaii at Manoa, Honolulu, United States
| | - Andrés M C Rodríguez
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - Anne Lamsa
- Division of Biological Sciences, UC San Diego, La Jolla, United States
| | - Chen Zhang
- Department of Nanoengineering, UC San Diego, La Jolla, United States
| | - Kathleen Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Brendan M Duggan
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Jehad Almaliti
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Pierre-Marie Allard
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
| | - Prasad Phapale
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Louis-Felix Nothias
- Institut de Chimie des Substances Naturelles, CNRS-ICSN, UPR 2301, Labex CEBA, University of Paris-Saclay, Gif-sur-Yvette, France
| | - Theodore Alexandrov
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Marc Litaudon
- Institut de Chimie des Substances Naturelles, CNRS-ICSN, UPR 2301, Labex CEBA, University of Paris-Saclay, Gif-sur-Yvette, France
| | - Jean-Luc Wolfender
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
| | - Jennifer E Kyle
- Biological Sciences, Pacific Northwest National Laboratory, Richland, United States
| | - Thomas O Metz
- Biological Sciences, Pacific Northwest National Laboratory, Richland, United States
| | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Danielle VanLeer
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Paul Shinn
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Ajit Jadhav
- National Center for Advancing Translational Sciences, National Institute of Health, Rockville, United States
| | - Rolf Müller
- Department of Pharmaceutical Biotechnology, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany
| | - Katrina M Waters
- Biological Sciences, Pacific Northwest National Laboratory, Richland, United States
| | - Wenyuan Shi
- School of Dentistry, UC Los Angeles, Los Angeles, United States
| | - Xueting Liu
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Lixin Zhang
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Rob Knight
- Department of Pediatrics, UC San Diego, La Jolla, United States
| | - Paul R Jensen
- Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | | | - Kit Pogliano
- Division of Biological Sciences, UC San Diego, La Jolla, United States
| | - Roger G Linington
- PBSci-Chemistry & Biochemistry Department, UC Santa Cruz, Santa Cruz, United States
| | - Marcelino Gutiérrez
- Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama
| | - Norberto P Lopes
- School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil
| | - William H Gerwick
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Bradley S Moore
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,Center for Oceans and Human Health, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, UC San Diego, La Jolla, United States.,Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, United States
| |
Collapse
|
12
|
Wu S, Brown JN, Tolić N, Meng D, Liu X, Zhang H, Zhao R, Moore RJ, Pevzner P, Smith RD, Paša-Tolić L. Quantitative analysis of human salivary gland-derived intact proteome using top-down mass spectrometry. Proteomics 2014; 14:1211-22. [DOI: 10.1002/pmic.201300378] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 01/10/2014] [Accepted: 02/25/2014] [Indexed: 01/08/2023]
Affiliation(s)
- Si Wu
- Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratories; Richland WA USA
| | - Joseph N. Brown
- Biological Sciences Division; Pacific Northwest National Laboratories; Richland WA USA
| | - Nikola Tolić
- Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratories; Richland WA USA
| | - Da Meng
- Computational Mathematics Division; Pacific Northwest National Laboratories; Richland WA USA
| | - Xiaowen Liu
- Department of BioHealth Informatics; Indiana University-Purdue University Indianapolis; Indianapolis IN USA
| | - Haizhen Zhang
- Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratories; Richland WA USA
| | - Rui Zhao
- Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratories; Richland WA USA
| | - Ronald J. Moore
- Biological Sciences Division; Pacific Northwest National Laboratories; Richland WA USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering; University of California, San Diego; La Jolla CA USA
| | - Richard D. Smith
- Biological Sciences Division; Pacific Northwest National Laboratories; Richland WA USA
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratories; Richland WA USA
| |
Collapse
|
13
|
Coates RC, Podell S, Korobeynikov A, Lapidus A, Pevzner P, Sherman DH, Allen EE, Gerwick L, Gerwick WH. Characterization of cyanobacterial hydrocarbon composition and distribution of biosynthetic pathways. PLoS One 2014; 9:e85140. [PMID: 24475038 PMCID: PMC3903477 DOI: 10.1371/journal.pone.0085140] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 11/22/2013] [Indexed: 12/20/2022] Open
Abstract
Cyanobacteria possess the unique capacity to naturally produce hydrocarbons from fatty acids. Hydrocarbon compositions of thirty-two strains of cyanobacteria were characterized to reveal novel structural features and insights into hydrocarbon biosynthesis in cyanobacteria. This investigation revealed new double bond (2- and 3-heptadecene) and methyl group positions (3-, 4- and 5-methylheptadecane) for a variety of strains. Additionally, results from this study and literature reports indicate that hydrocarbon production is a universal phenomenon in cyanobacteria. All cyanobacteria possess the capacity to produce hydrocarbons from fatty acids yet not all accomplish this through the same metabolic pathway. One pathway comprises a two-step conversion of fatty acids first to fatty aldehydes and then alkanes that involves a fatty acyl ACP reductase (FAAR) and aldehyde deformylating oxygenase (ADO). The second involves a polyketide synthase (PKS) pathway that first elongates the acyl chain followed by decarboxylation to produce a terminal alkene (olefin synthase, OLS). Sixty-one strains possessing the FAAR/ADO pathway and twelve strains possessing the OLS pathway were newly identified through bioinformatic analyses. Strains possessing the OLS pathway formed a cohesive phylogenetic clade with the exception of three Moorea strains and Leptolyngbya sp. PCC 6406 which may have acquired the OLS pathway via horizontal gene transfer. Hydrocarbon pathways were identified in one-hundred-forty-two strains of cyanobacteria over a broad phylogenetic range and there were no instances where both the FAAR/ADO and the OLS pathways were found together in the same genome, suggesting an unknown selective pressure maintains one or the other pathway, but not both.
Collapse
Affiliation(s)
- R. Cameron Coates
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - Sheila Podell
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - Anton Korobeynikov
- Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia
- Department of Mathematics and Mechanics, St. Petersburg State University, St. Petersburg, Russia
| | - Alla Lapidus
- Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia
- Theodosius Dobzhansky Center for Genome Bionformatics, St. Petersburg State University, St. Petersburg, Russia
| | - Pavel Pevzner
- Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America
| | - David H. Sherman
- Life Sciences Institute and Department of Medical Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Eric E. Allen
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - Lena Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - William H. Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
14
|
Terterov I, Vyatkina K, Kononikhin AS, Boitsov V, Vyazmin S, Popov IA, Nikolaev EN, Pevzner P, Dubina M. Application of de novo sequencing tools to study abiogenic peptide formations by tandem mass spectrometry. The case of homo-peptides from glutamic acid complicated by substitutions of hydrogen by sodium or potassium atoms. Rapid Commun Mass Spectrom 2014; 28:33-41. [PMID: 24285388 DOI: 10.1002/rcm.6757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 09/24/2013] [Accepted: 10/04/2013] [Indexed: 06/02/2023]
Abstract
RATIONALE Peptides and proteins are among the most important components of living systems. Different attempts have been made to experimentally model the formation of peptides from amino acid monomers in investigation of the origin of life. Detailed characterization of peptides formed under various conditions in such reactions is very important for understanding processes of abiogenic peptide formation. METHODS We used liquid chromatography coupled with tandem mass spectrometry (MS/MS) for an accurate study of homo-peptides formed in a model reaction: glutamic acid oligomerization catalyzed by 1,1'-carbonyldiimidazole in aqueous solution with 1 M of sodium or potassium chloride and without any salts. We used de novo sequencing software for peptide identification. In addition we propose an approach that uses more spectral information for de novo sequencing then standard methods. RESULTS Peptides up to 9 amino acids long were found in the experiments with KCl, while in experiments with NaCl and without salts only peptides of up to 7 amino acids were detected. Due to high salt concentrations in samples a high number of singly charged peptide ions with up to 4 substitutions of hydrogen atoms by sodium or potassium atoms were observed. De novo sequencing software provided correct identifications even for peptide ions with substitutions. CONCLUSIONS Multiple substitutions of hydrogen by alkali metal atoms in peptide ions strongly change their fragmentation patterns. Proposed approach for de novo sequencing was found very effective, even for ions with substitutions. So, it may be useful in more complicated cases like sequencing abiogenic peptides consisting of different amino acids.
Collapse
Affiliation(s)
- Ivan Terterov
- St. Petersburg Academic University Nanotechnology Research and Education Center RAS, 8/3 Khlopina st., St. Petersburg, 194021, Russia
| | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
MOTIVATION Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. RESULTS SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. AVAILABILITY SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.
Collapse
Affiliation(s)
- Roy Ronen
- Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | | | | | | |
Collapse
|
16
|
Chong KF, Ning K, Leong HW, Pevzner P. MODELING AND CHARACTERIZATION OF MULTI-CHARGE MASS SPECTRA FOR PEPTIDE SEQUENCING. J Bioinform Comput Biol 2011; 4:1329-52. [PMID: 17245817 DOI: 10.1142/s021972000600248x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2006] [Revised: 09/19/2006] [Accepted: 09/20/2006] [Indexed: 11/18/2022]
Abstract
Peptide sequencing using tandem mass spectrometry data is an important and challenging problem in proteomics. We address the problem of peptide sequencing for multi-charge spectra. Most peptide sequencing algorithms currently consider only charge one or two ions even for higher-charge spectra. We give a characterization of multi-charge spectra by generalizing existing models. Using our models, we analyzed spectra from Global Proteome Machine (GPM) [Craig R, Cortens JP, Beavis RC, J Proteome Res3:1234–1242, 2004.] (with charges 1–5), Institute for Systems Biology (ISB) [Keller A, Purvine S, Nesvizhskii AI, Stolyar S, Goodlett DR, Kolker E, OMICS6:207–212, 2002.] and Orbitrap (both with charges 1–3). Our analysis for the GPM dataset shows that higher charge peaks contribute significantly to prediction of the complete peptide. They also help to explain why existing algorithms do not perform well on multi-charge spectra. Based on these analyses, we claim that peptide sequencing algorithms can achieve higher sensitivity results if they also consider higher charge ions. We verify this claim by proposing a de novo sequencing algorithm called the greedy best strong tag (GBST) algorithm that is simple but considers higher charge ions based on our new model. Evaluation on multi-charge spectra shows that our simple GBST algorithm outperforms Lutefisk and PepNovo, especially for the GPM spectra of charge three or more.
Collapse
Affiliation(s)
- Ket Fah Chong
- Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore.
| | | | | | | |
Collapse
|
17
|
Anderson WA, Amasino RM, Ares M, Banerjee U, Bartel B, Corces VG, Drennan CL, Elgin SCR, Epstein IR, Fanning E, Guillette LJ, Handelsman J, Hatfull GF, Hoy RR, Kelley D, Leinwand LA, Losick R, Lu Y, Lynn DG, Neuhauser C, O'Dowd DK, Olivera T, Pevzner P, Richards-Kortum RR, Rine J, Sah RL, Strobel SA, Walker GC, Walt DR, Warner IM, Wessler S, Willard HF, Zare RN. Competencies: a cure for pre-med curriculum. Science 2011; 334:760-1. [PMID: 22076362 DOI: 10.1126/science.334.6057.760-b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
18
|
Abstract
MOTIVATION The continuing improvements to high-throughput sequencing (HTS) platforms have begun to unfold a myriad of new applications. As a result, error correction of sequencing reads remains an important problem. Though several tools do an excellent job of correcting datasets where the reads are sampled close to uniformly, the problem of correcting reads coming from drastically non-uniform datasets, such as those from single-cell sequencing, remains open. RESULTS In this article, we develop the method Hammer for error correction without any uniformity assumptions. Hammer is based on a combination of a Hamming graph and a simple probabilistic model for sequencing errors. It is a simple and adaptable algorithm that improves on other tools on non-uniform single-cell data, while achieving comparable results on normal multi-cell data. AVAILABILITY http://www.cs.toronto.edu/~pashadag. CONTACT pmedvedev@cs.ucsd.edu.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering, University of California, San Diego, CA, USA.
| | | | | | | |
Collapse
|
19
|
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J Comput Biol 2011; 18:1625-34. [PMID: 21999285 DOI: 10.1089/cmb.2011.0151] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering, University of California, San Diego, California, USA.
| | | | | | | | | |
Collapse
|
20
|
Grindberg RV, Ishoey T, Brinza D, Esquenazi E, Coates RC, Liu WT, Gerwick L, Dorrestein PC, Pevzner P, Lasken R, Gerwick WH. Single cell genome amplification accelerates identification of the apratoxin biosynthetic pathway from a complex microbial assemblage. PLoS One 2011; 6:e18565. [PMID: 21533272 PMCID: PMC3075265 DOI: 10.1371/journal.pone.0018565] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 03/08/2011] [Indexed: 01/11/2023] Open
Abstract
Filamentous marine cyanobacteria are extraordinarily rich sources of structurally novel, biomedically relevant natural products. To understand their biosynthetic origins as well as produce increased supplies and analog molecules, access to the clustered biosynthetic genes that encode for the assembly enzymes is necessary. Complicating these efforts is the universal presence of heterotrophic bacteria in the cell wall and sheath material of cyanobacteria obtained from the environment and those grown in uni-cyanobacterial culture. Moreover, the high similarity in genetic elements across disparate secondary metabolite biosynthetic pathways renders imprecise current gene cluster targeting strategies and contributes sequence complexity resulting in partial genome coverage. Thus, it was necessary to use a dual-method approach of single-cell genomic sequencing based on multiple displacement amplification (MDA) and metagenomic library screening. Here, we report the identification of the putative apratoxin. A biosynthetic gene cluster, a potent cancer cell cytotoxin with promise for medicinal applications. The roughly 58 kb biosynthetic gene cluster is composed of 12 open reading frames and has a type I modular mixed polyketide synthase/nonribosomal peptide synthetase (PKS/NRPS) organization and features loading and off-loading domain architecture never previously described. Moreover, this work represents the first successful isolation of a complete biosynthetic gene cluster from Lyngbya bouillonii, a tropical marine cyanobacterium renowned for its production of diverse bioactive secondary metabolites.
Collapse
Affiliation(s)
- Rashel V. Grindberg
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - Thomas Ishoey
- J. Craig Venter Institute, San Diego, California, United States of America
| | - Dumitru Brinza
- Department of Computer Science and Engineering, Center for Algorithmic and Systems Biology, University of California San Diego, La Jolla, California, United States of America
| | - Eduardo Esquenazi
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - R. Cameron Coates
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - Wei-ting Liu
- Departments of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
| | - Lena Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
| | - Pieter C. Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
- Departments of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
| | - Pavel Pevzner
- Department of Computer Science and Engineering, Center for Algorithmic and Systems Biology, University of California San Diego, La Jolla, California, United States of America
| | - Roger Lasken
- J. Craig Venter Institute, San Diego, California, United States of America
| | - William H. Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, United States of America
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. Lecture Notes in Computer Science 2011. [DOI: 10.1007/978-3-642-20036-6_22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
22
|
Gupta N, Bark SJ, Lu WD, Taupenot L, O'Connor DT, Pevzner P, Hook V. Mass spectrometry-based neuropeptidomics of secretory vesicles from human adrenal medullary pheochromocytoma reveals novel peptide products of prohormone processing. J Proteome Res 2010; 9:5065-75. [PMID: 20704348 DOI: 10.1021/pr100358b] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Neuropeptides are required for cell-cell communication in the regulation of physiological and pathological processes. While selected neuropeptides of known biological activities have been studied, global analyses of the endogenous profile of human peptide products derived from prohormones by proteolytic processing in vivo are largely unknown. Therefore, this study utilized the global, unbiased approach of mass spectrometry-based neuropeptidomics to define peptide profiles in secretory vesicles, isolated from human adrenal medullary pheochromocytoma of the sympathetic nervous system. The low molecular weight pool of secretory vesicle peptides was subjected to nano-LC-MS/MS with ion trap and QTOF mass spectrometry analyzed by different database search tools (InsPecT and Spectrum Mill). Peptides were generated by processing of prohormones at dibasic cleavage sites as well as at nonbasic residues. Significantly, peptide profiling provided novel insight into newly identified peptide products derived from proenkephalin, pro-NPY, proSAAS, CgA, CgB, and SCG2 prohormones. Previously unidentified intervening peptide domains of prohormones were observed, thus providing new knowledge of human neuropeptidomes generated from precursors. The global peptidomic approach of this study demonstrates the complexity of diverse neuropeptides present in human secretory vesicles for cell-cell communication.
Collapse
Affiliation(s)
- Nitin Gupta
- Bioinformatics Graduate Program, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA
| | | | | | | | | | | | | |
Collapse
|
23
|
Hook V, Bark S, Gupta N, Lortie M, Lu WD, Bandeira N, Funkelstein L, Wegrzyn J, O'Connor DT, Pevzner P. Neuropeptidomic components generated by proteomic functions in secretory vesicles for cell-cell communication. AAPS J 2010; 12:635-45. [PMID: 20734175 PMCID: PMC2976990 DOI: 10.1208/s12248-010-9223-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Accepted: 07/12/2010] [Indexed: 01/13/2023]
Abstract
Diverse neuropeptides participate in cell-cell communication to coordinate neuronal and endocrine regulation of physiological processes in health and disease. Neuropeptides are short peptides ranging in length from ~3 to 40 amino acid residues that are involved in biological functions of pain, stress, obesity, hypertension, mental disorders, cancer, and numerous health conditions. The unique neuropeptide sequences define their specific biological actions. Significantly, this review article discusses how the neuropeptide field is at the crest of expanding knowledge gained from mass-spectrometry-based neuropeptidomic studies, combined with proteomic analyses for understanding the biosynthesis of neuropeptidomes. The ongoing expansion in neuropeptide diversity lies in the unbiased and global mass-spectrometry-based approaches for identification and quantitation of peptides. Current mass spectrometry technology allows definition of neuropeptide amino acid sequence structures, profiling of multiple neuropeptides in normal and disease conditions, and quantitative peptide measures in biomarker applications to monitor therapeutic drug efficacies. Complementary proteomic studies of neuropeptide secretory vesicles provide valuable insight into the protein processes utilized for neuropeptide production, storage, and secretion. Furthermore, ongoing research in developing new computational tools will facilitate advancements in mass-spectrometry-based identification of small peptides. Knowledge of the entire repertoire of neuropeptides that regulate physiological systems will provide novel insight into regulatory mechanisms in health, disease, and therapeutics.
Collapse
Affiliation(s)
- Vivian Hook
- University of California, San Diego, La Jolla, 92093-0744, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Affiliation(s)
- Pavel Pevzner
- Department of Computer Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | | |
Collapse
|
25
|
Waridel P, Frank A, Thomas H, Surendranath V, Sunyaev S, Pevzner P, Shevchenko A. Sequence similarity-driven proteomics in organisms with unknown genomes by LC-MS/MS and automated de novo sequencing. Proteomics 2007; 7:2318-29. [PMID: 17623296 DOI: 10.1002/pmic.200700003] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
LC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, combined with data processing, stringent, and sequence-similarity database searching tools, was employed in a layered manner to identify proteins in organisms with unsequenced genomes. Highly specific stringent searches (MASCOT) were applied as a first layer screen to identify either known (i.e. present in a database) proteins, or unknown proteins sharing identical peptides with related database sequences. Once the confidently matched spectra were removed, the remainder was filtered against a nonannotated library of background spectra that cleaned up the dataset from spectra of common protein and chemical contaminants. The rectified spectral dataset was further subjected to rapid batch de novo interpretation by PepNovo software, followed by the MS BLAST sequence-similarity search that used multiple redundant and partially accurate candidate peptide sequences. Importantly, a single dataset was acquired at the uncompromised sensitivity with no need of manual selection of MS/MS spectra for subsequent de novo interpretation. This approach enabled a completely automated identification of novel proteins that were, otherwise, missed by conventional database searches.
Collapse
Affiliation(s)
- Patrice Waridel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | | | | | | | | | | | | |
Collapse
|
26
|
Sundquist A, Ronaghi M, Tang H, Pevzner P, Batzoglou S. Whole-genome sequencing and assembly with high-throughput, short-read technologies. PLoS One 2007; 2:e484. [PMID: 17534434 PMCID: PMC1871613 DOI: 10.1371/journal.pone.0000484] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2007] [Accepted: 05/05/2007] [Indexed: 11/18/2022] Open
Abstract
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.
Collapse
Affiliation(s)
- Andreas Sundquist
- Department of Computer Science, Stanford University, Stanford, California, United States of America.
| | | | | | | | | |
Collapse
|
27
|
Abstract
We present a novel scoring method for de novo interpretation of peptides from tandem mass spectrometry data. Our scoring method uses a probabilistic network whose structure reflects the chemical and physical rules that govern the peptide fragmentation. We use a likelihood ratio hypothesis test to determine whether the peaks observed in the mass spectrum are more likely to have been produced under our fragmentation model than under a model that treats peaks as random events. We tested our de novo algorithm PepNovo on ion trap data and achieved results that are superior to popular de novo peptide sequencing algorithms. PepNovo can be accessed via the URL http://www-cse.ucsd.edu/groups/bioinformatics/software.html.
Collapse
Affiliation(s)
- Ari Frank
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California 92093-0114, USA.
| | | |
Collapse
|
28
|
Zhi D, Keich U, Pevzner P, Heber S, Tang H. Correcting base-assignment errors in repeat regions of shotgun assembly. IEEE/ACM Trans Comput Biol Bioinform 2007; 4:54-64. [PMID: 17277413 DOI: 10.1109/tcbb.2007.1005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project.
Collapse
Affiliation(s)
- Degui Zhi
- Bioinformatics Program, University of California, San Diego, La Jolla 92093, USA.
| | | | | | | | | |
Collapse
|
29
|
Zhi D, Krishna SS, Cao H, Pevzner P, Godzik A. Representing and comparing protein structures as paths in three-dimensional space. BMC Bioinformatics 2006; 7:460. [PMID: 17052359 PMCID: PMC1626488 DOI: 10.1186/1471-2105-7-460] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 10/20/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. RESULTS We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. CONCLUSION Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver.
Collapse
Affiliation(s)
- Degui Zhi
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720-3102, USA
| | - S Sri Krishna
- Joint Center for Structural Genomics, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | - Haibo Cao
- Bioinformatics Program, Infectious and Inflammation Disease Center, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
- Bioinformatics Program, Infectious and Inflammation Disease Center, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| |
Collapse
|
30
|
Wielsch N, Thomas H, Surendranath V, Waridel P, Frank A, Pevzner P, Shevchenko A. Rapid Validation of Protein Identifications with the Borderline Statistical Confidence via De Novo Sequencing and MS BLAST Searches. J Proteome Res 2006; 5:2448-56. [PMID: 16944958 DOI: 10.1021/pr060200v] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.
Collapse
Affiliation(s)
- Natalie Wielsch
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | | | | | | | | | | | | |
Collapse
|
31
|
Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH, Bajsarowicz K, Paris PL, Tao Q, Kowbel D, Lapuk A, Shagin DA, Shagina IA, Gray JW, Cheng JF, de Jong PJ, Pevzner P, Collins C. Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res 2006; 16:394-404. [PMID: 16461635 PMCID: PMC1415204 DOI: 10.1101/gr.4247306] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A comprehensive understanding of cancer is predicated upon knowledge of the structure of malignant genomes underlying its many variant forms and the molecular mechanisms giving rise to them. It is well established that solid tumor genomes accumulate a large number of genome rearrangements during tumorigenesis. End Sequence Profiling (ESP) maps and clones genome breakpoints associated with all types of genome rearrangements elucidating the structural organization of tumor genomes. Here we extend the ESP methodology in several directions using the breast cancer cell line MCF-7. First, targeted ESP is applied to multiple amplified loci, revealing a complex process of rearrangement and co-amplification in these regions reminiscent of breakage/fusion/bridge cycles. Second, genome breakpoints identified by ESP are confirmed using a combination of DNA sequencing and PCR. Third, in vitro functional studies assign biological function to a rearranged tumor BAC clone, demonstrating that it encodes anti-apoptotic activity. Finally, ESP is extended to the transcriptome identifying four novel fusion transcripts and providing evidence that expression of fusion genes may be common in tumors. These results demonstrate the distinct advantages of ESP including: (1) the ability to detect all types of rearrangements and copy number changes; (2) straightforward integration of ESP data with the annotated genome sequence; (3) immortalization of the genome; (4) ability to generate tumor-specific reagents for in vitro and in vivo functional studies. Given these properties, ESP could play an important role in a tumor genome project.
Collapse
Affiliation(s)
- Stanislav Volik
- Department of Urology, and Cancer Research Institute, University of California San Francisco Comprehensive Cancer Center, San Francisco, California 94115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
|
33
|
Abstract
Filtration techniques in the form of rapid elimination of candidate sequences while retaining the true one are key ingredients of database searches in genomics. Although SEQUEST and Mascot perform a conceptually similar task to the tool BLAST, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that "genome vs genome" comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS database searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm. Our tag generating algorithm along with our de novo sequencing algorithm PepNovo can be accessed via the URL http://peptide.ucsd.edu/.
Collapse
Affiliation(s)
- Ari Frank
- Department of Computer Science & Engineering, University of California-San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0114, USA.
| | | | | | | |
Collapse
|
34
|
Abstract
We describe ABA (A-Bruijn alignment), a new method for multiple alignment of biological sequences. The major difference between ABA and existing multiple alignment methods is that ABA represents an alignment as a directed graph, possibly containing cycles. This representation provides more flexibility than does a traditional alignment matrix or the recently introduced partial order alignment (POA) graph by allowing a larger class of evolutionary relationships between the aligned sequences. Our graph representation is particularly well-suited to the alignment of protein sequences with shuffled and/or repeated domain structure, and allows one to construct multiple alignments of proteins containing (1) domains that are not present in all proteins, (2) domains that are present in different orders in different proteins, and (3) domains that are present in multiple copies in some proteins. In addition, ABA is useful in the alignment of genomic sequences that contain duplications and inversions. We provide several examples illustrating the applications of ABA.
Collapse
Affiliation(s)
- Benjamin Raphael
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114, USA.
| | | | | | | |
Collapse
|
35
|
Abstract
The analysis of mass spectrometry data is still largely based on identification of single MS/MS spectra and does not attempt to make use of the extra information available in multiple MS/MS spectra from partially or completely overlapping peptides. Analysis of MS/MS spectra from multiple overlapping peptides opens up the possibility of assembling MS/MS spectra into entire proteins, similarly to the assembly of overlapping DNA reads into entire genomes. In this paper, we present for the first time a way to detect, score, and interpret overlaps between uninterpreted MS/MS spectra in an attempt to sequence entire proteins rather than individual peptides. We show that this approach not only extends the length of reconstructed amino acid sequences but also dramatically improves the quality of de novo peptide sequencing, even for low mass accuracy MS/MS data.
Collapse
Affiliation(s)
- Nuno Bandeira
- Computer Science and Engineering Department, University of California, San Diego, Department 0114, 9500 Gilman Drive, La Jolla, California 92093-0114, USA.
| | | | | | | |
Collapse
|
36
|
Abstract
Alternative splicing essentially increases the diversity of the transcriptome and has important implications for physiology, development and the genesis of diseases. Conventionally, alternative splicing is investigated in a case-by-case fashion, but this becomes cumbersome and error prone if genes show a huge abundance of different splice variants. We use a different approach and integrate all transcripts derived from a gene into a single splicing graph. Each transcript corresponds to a path in the graph, and alternative splicing is displayed by bifurcations. This representation preserves the relationships between different splicing variants and allows us to investigate systematically all possible putative transcripts. We built a database of splicing graphs for human genes, using transcript information from various major sources (Ensembl, RefSeq, STACK, TIGR and UniGene). A Web interface allows users to display the splicing graphs, to interactively assemble transcripts and to access their sequences as well as neighboring genomic regions. We also provide for each gene an exhaustive pre-computed catalog of putative transcripts--in total more than 1.2 million sequences. We found that approximately 65% of the investigated genes show evidence for alternative splicing, and in 5% of the cases, a single gene might produce over 100 transcripts.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Department of Computer Science, College of Engineering, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | | | |
Collapse
|
37
|
Andelfinger G, Hitte C, Etter L, Guyon R, Bourque G, Tesler G, Pevzner P, Kirkness E, Galibert F, Benson DW. Detailed four-way comparative mapping and gene order analysis of the canine ctvm locus reveals evolutionary chromosome rearrangements. Genomics 2004; 83:1053-62. [PMID: 15177558 DOI: 10.1016/j.ygeno.2003.12.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2003] [Accepted: 12/17/2003] [Indexed: 11/26/2022]
Abstract
Canine tricuspid valve malformation (CTVM) maps to canine chromosome 9 (CFA9), in a region syntenic with gene-dense human chromosome 17q. To define synteny blocks, we analyzed 148 markers on CFA9 using radiation hybrid mapping and established a four-way comparative map for human, mouse, rat, and dog. We identified a large number of rearrangements, allowing us to reconstruct the evolutionary history of individual synteny blocks and large chromosomal segments. A most parsimonious rearrangement scenario for all four species reveals that human chromosome 17q differs from CFA9 and the syntenic rodent chromosomes through two macroreversals of 9.2 and 23 Mb. Compared to a recovered ancestral gene order, CFA9 has undergone 11 reversals of <3 Mb and 2 reversals of >3 Mb. Interspecies reuse of breakpoints for micro- and macrorearrangements was observed. Gene order and content of the ctvm interval are best extrapolated from murine data, showing that multispecies genome rearrangement scenarios contribute to identifying gene content in canine mapping studies.
Collapse
Affiliation(s)
- G Andelfinger
- Cardiovascular Genetics, Division of Cardiology, ML 7042, Cincinnati Children's Hospital, 3333 Burnet Avenue, Cincinnati, OH 45229, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Abstract
MOTIVATION Current DNA sequencing technology produces reads of about 500-750 bp, with typical coverage under 10x. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30x and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. RESULTS We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. AVAILABILITY Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html
Collapse
Affiliation(s)
- Mark Chaisson
- Bioinformatics Program, University of California San Diego, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
39
|
Murphy WJ, Bourque G, Tesler G, Pevzner P, O'Brien SJ. Reconstructing the genomic architecture of mammalian ancestors using multispecies comparative maps. Hum Genomics 2003; 1:30-40. [PMID: 15601531 PMCID: PMC3525001 DOI: 10.1186/1479-7364-1-1-30] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2003] [Accepted: 08/19/2003] [Indexed: 11/10/2022] Open
Abstract
Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes.
Collapse
Affiliation(s)
- William J Murphy
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, MD 21702, USA.
| | | | | | | | | |
Collapse
|
40
|
Pevzner P, Tesler G. Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc Natl Acad Sci U S A 2003; 100:7672-7. [PMID: 12810957 PMCID: PMC164646 DOI: 10.1073/pnas.1330369100] [Citation(s) in RCA: 233] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2003] [Accepted: 05/05/2003] [Indexed: 11/18/2022] Open
Abstract
The human and mouse genomic sequences provide evidence for a larger number of rearrangements than previously thought and reveal extensive reuse of breakpoints from the same short fragile regions. Breakpoint clustering in regions implicated in cancer and infertility have been reported in previous studies; we report here on breakpoint clustering in chromosome evolution. This clustering reveals limitations of the widely accepted random breakage theory that has remained unchallenged since the mid-1980s. The genome rearrangement analysis of the human and mouse genomes implies the existence of a large number of very short "hidden" synteny blocks that were invisible in the comparative mapping data and ignored in the random breakage model. These blocks are defined by closely located breakpoints and are often hard to detect. Our results suggest a model of chromosome evolution that postulates that mammalian genomes are mosaics of fragile regions with high propensity for rearrangements and solid regions with low propensity for rearrangements.
Collapse
Affiliation(s)
- Pavel Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0114, USA
| | | |
Collapse
|
41
|
Abstract
Although analysis of genome rearrangements was pioneered by Dobzhansky and Sturtevant 65 years ago, we still know very little about the rearrangement events that produced the existing varieties of genomic architectures. The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they reveal that a large number of microrearrangements is required to explain the differences in draft human and mouse sequences. Here we describe a new algorithm for constructing synteny blocks, study arrangements of synteny blocks in human and mouse, derive a most parsimonious human-mouse rearrangement scenario, and provide evidence that intrachromosomal rearrangements are more frequent than interchromosomal rearrangements. Our analysis is based on the human-mouse breakpoint graph, which reveals related breakpoints and allows one to find a most parsimonious scenario. Because these graphs provide important insights into rearrangement scenarios, we introduce a new visualization tool that allows one to view breakpoint graphs superimposed with genomic dot-plots.
Collapse
Affiliation(s)
- Pavel Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0114, USA.
| | | |
Collapse
|
42
|
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES. Initial sequencing and comparative analysis of the mouse genome. Nature 2002; 420:520-62. [PMID: 12466850 DOI: 10.1038/nature01262] [Citation(s) in RCA: 4791] [Impact Index Per Article: 217.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2002] [Accepted: 10/31/2002] [Indexed: 12/18/2022]
Abstract
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Collapse
MESH Headings
- Animals
- Base Composition
- Chromosomes, Mammalian/genetics
- Conserved Sequence/genetics
- CpG Islands/genetics
- Evolution, Molecular
- Gene Expression Regulation
- Genes/genetics
- Genetic Variation/genetics
- Genome
- Genome, Human
- Genomics
- Humans
- Mice/classification
- Mice/genetics
- Mice, Knockout
- Mice, Transgenic
- Models, Animal
- Multigene Family/genetics
- Mutagenesis
- Neoplasms/genetics
- Physical Chromosome Mapping
- Proteome/genetics
- Pseudogenes/genetics
- Quantitative Trait Loci/genetics
- RNA, Untranslated/genetics
- Repetitive Sequences, Nucleic Acid/genetics
- Selection, Genetic
- Sequence Analysis, DNA
- Sex Chromosomes/genetics
- Species Specificity
- Synteny
Collapse
|
43
|
Abstract
In light-directed synthesis of high-density oligonucleotide arrays for sequencing by hybridization, synthesis errors result from the unintended illumination of chip regions that should remain dark. Most synthesis errors occur at the borders of illuminated regions, where light diffraction, internal reflection, and scattering produce the most unintended illumination. A combinatorial synthesis strategy based on two-dimensional Gray codes was devised to reduce the overall lengths of these borders in masks for photolithographic chip design. This article describes an application of two-dimensional Gray codes and demonstrates that masks based on this approach are optimal for minimizing the border length in VLSIPS (very large scale immobilized polymer synthesis).
Collapse
Affiliation(s)
- W Feldman
- Department of Computer Science and Engineering, Pennsylvania State University, University Park 16802
| | | |
Collapse
|
44
|
Pevzner P. No immunity for Moscow. Nature 1989. [DOI: 10.1038/338009a0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|