1
|
Pan X, Ren L, Yang Y, Xu Y, Ning L, Zhang Y, Luo H, Zou Q, Zhang Y. MCSdb, a database of proteins residing in membrane contact sites. Sci Data 2024; 11:281. [PMID: 38459036 PMCID: PMC10923927 DOI: 10.1038/s41597-024-03104-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 02/29/2024] [Indexed: 03/10/2024] Open
Abstract
Organelles do not act as autonomous discrete units but rather as interconnected hubs that engage in extensive communication by forming close contacts called "membrane contact sites (MCSs)". And many proteins have been identified as residing in MCS and playing important roles in maintaining and fulfilling specific functions within these microdomains. However, a comprehensive compilation of these MCS proteins is still lacking. Therefore, we developed MCSdb, a manually curated resource of MCS proteins and complexes from publications. MCSdb documents 7010 MCS protein entries and 263 complexes, involving 24 organelles and 44 MCSs across 11 species. Additionally, MCSdb orchestrates all data into different categories with multitudinous information for presenting MCS proteins. In summary, MCSdb provides a valuable resource for accelerating MCS functional interpretation and interorganelle communication deciphering.
Collapse
Affiliation(s)
- Xianrun Pan
- College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Liping Ren
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Yu Yang
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Yi Xu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yibing Zhang
- Glasgow College, University of Electronic Science and Technology of China, Chengdu, China
| | - Huaichao Luo
- Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China.
| |
Collapse
|
2
|
Lukauskas S, Tvardovskiy A, Nguyen NV, Stadler M, Faull P, Ravnsborg T, Özdemir Aygenli B, Dornauer S, Flynn H, Lindeboom RGH, Barth TK, Brockers K, Hauck SM, Vermeulen M, Snijders AP, Müller CL, DiMaggio PA, Jensen ON, Schneider R, Bartke T. Decoding chromatin states by proteomic profiling of nucleosome readers. Nature 2024; 627:671-679. [PMID: 38448585 PMCID: PMC10954555 DOI: 10.1038/s41586-024-07141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 01/31/2024] [Indexed: 03/08/2024]
Abstract
DNA and histone modifications combine into characteristic patterns that demarcate functional regions of the genome1,2. While many 'readers' of individual modifications have been described3-5, how chromatin states comprising composite modification signatures, histone variants and internucleosomal linker DNA are interpreted is a major open question. Here we use a multidimensional proteomics strategy to systematically examine the interaction of around 2,000 nuclear proteins with over 80 modified dinucleosomes representing promoter, enhancer and heterochromatin states. By deconvoluting complex nucleosome-binding profiles into networks of co-regulated proteins and distinct nucleosomal features driving protein recruitment or exclusion, we show comprehensively how chromatin states are decoded by chromatin readers. We find highly distinctive binding responses to different features, many factors that recognize multiple features, and that nucleosomal modifications and linker DNA operate largely independently in regulating protein binding to chromatin. Our online resource, the Modification Atlas of Regulation by Chromatin States (MARCS), provides in-depth analysis tools to engage with our results and advance the discovery of fundamental principles of genome regulation by chromatin states.
Collapse
Affiliation(s)
- Saulius Lukauskas
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany
- MRC Laboratory of Medical Sciences (LMS), London, UK
- Department of Chemical Engineering, Imperial College London, London, UK
| | - Andrey Tvardovskiy
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Nhuong V Nguyen
- MRC Laboratory of Medical Sciences (LMS), London, UK
- Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London, UK
| | - Mara Stadler
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Statistics, Ludwig Maximilian University Munich, Munich, Germany
| | - Peter Faull
- MRC Laboratory of Medical Sciences (LMS), London, UK
- Proteomic Sciences Technology Platform, The Francis Crick Institute, London, UK
- Northwestern Proteomics Core Facility, Northwestern University, Chicago, IL, USA
| | - Tina Ravnsborg
- VILLUM Center for Bioanalytical Sciences and Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | | | - Scarlett Dornauer
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Helen Flynn
- Proteomic Sciences Technology Platform, The Francis Crick Institute, London, UK
| | - Rik G H Lindeboom
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, The Netherlands
- The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Teresa K Barth
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Munich, Germany
- Clinical Protein Analysis Unit (ClinZfP), Biomedical Center (BMC), Faculty of Medicine, Ludwig Maximilian University Munich, Martinsried, Germany
| | - Kevin Brockers
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Stefanie M Hauck
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Munich, Germany
| | - Michiel Vermeulen
- Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Oncode Institute, Radboud University Nijmegen, Nijmegen, The Netherlands
- The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | | | - Christian L Müller
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Statistics, Ludwig Maximilian University Munich, Munich, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, NY, USA
| | - Peter A DiMaggio
- Department of Chemical Engineering, Imperial College London, London, UK
| | - Ole N Jensen
- VILLUM Center for Bioanalytical Sciences and Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Robert Schneider
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany
- Faculty of Biology, Ludwig Maximilian University Munich, Martinsried, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Till Bartke
- Institute of Functional Epigenetics, Helmholtz Zentrum München, Neuherberg, Germany.
- MRC Laboratory of Medical Sciences (LMS), London, UK.
- Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
3
|
Wang H, Lim KP, Kong W, Gao H, Wong BJH, Phua SX, Guo T, Goh WWB. MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects. Sci Data 2023; 10:858. [PMID: 38042886 PMCID: PMC10693559 DOI: 10.1038/s41597-023-02779-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/23/2023] [Indexed: 12/04/2023] Open
Abstract
Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.
Collapse
Affiliation(s)
- He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Kai Peng Lim
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Huanhuan Gao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, 310030, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, 310030, China
- Research Center for Industries of the Future, Westlake University, 600 Dunyu Road, Hangzhou, Zhejiang, 310030, China
| | - Bertrand Jern Han Wong
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Ser Xian Phua
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore
| | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, 310030, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, 310030, China
- Research Center for Industries of the Future, Westlake University, 600 Dunyu Road, Hangzhou, Zhejiang, 310030, China
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, 636921, Singapore.
| |
Collapse
|
4
|
Claeys T, Van Den Bossche T, Perez-Riverol Y, Gevaert K, Vizcaíno JA, Martens L. lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation. Nat Commun 2023; 14:6743. [PMID: 37875519 PMCID: PMC10598006 DOI: 10.1038/s41467-023-42543-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 10/13/2023] [Indexed: 10/26/2023] Open
Abstract
Public proteomics data often lack essential metadata, limiting its potential. To address this, we present lesSDRF, a tool to simplify the process of metadata annotation, thereby ensuring that data leave a lasting, impactful legacy well beyond its initial publication.
Collapse
Affiliation(s)
- Tine Claeys
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UK
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UK.
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.
| |
Collapse
|
5
|
Mondal RK, Sen D, Arya A, Samanta SK. Developing anti-microbial peptide database version 1 to provide comprehensive and exhaustive resource of manually curated AMPs. Sci Rep 2023; 13:17843. [PMID: 37857659 PMCID: PMC10587344 DOI: 10.1038/s41598-023-45016-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/14/2023] [Indexed: 10/21/2023] Open
Abstract
Anti-Microbial Peptide Database version 1 (AMPDB v1) is a meticulously curated resource that aims to address the limitations of existing databases in the field of antimicrobial research. We have utilized the latest technology and put our best efforts into adding all relevant tools to cater to the needs of our users. AMPDB v1 is a derived database, built upon information gathered from the available resources and boasts a significant size of 59,122 entries which are classified into 88 classes. All the information in this resource was curated manually. Sequence alignment and protein feature calculation tools were integrated into the database in the form of web applications, to make them easy to use, quick, and responsive in real-time. We have included multiple types of browsing and searching options to enhance the user experience, from simple text search to a completely customizable advanced search page with intuitive options that let the user combine multiple options together to make a powerful search query. The database is accessible by a web browser at https://bblserver.org.in/ampdb/ .
Collapse
Affiliation(s)
- Rajat Kumar Mondal
- Biochemistry and Bioinformatics Laboratory, Department of Applied Sciences, Indian Institute of Information Technology Allahabad (IIIT-A), Uttar Pradesh, Devghat, Jhalwa, Prayagraj, 211012, India
| | - Debarup Sen
- Persistent Systems Ltd., Pune, Maharashtra, India
| | - Ankish Arya
- Biochemistry and Bioinformatics Laboratory, Department of Applied Sciences, Indian Institute of Information Technology Allahabad (IIIT-A), Uttar Pradesh, Devghat, Jhalwa, Prayagraj, 211012, India
| | - Sintu Kumar Samanta
- Biochemistry and Bioinformatics Laboratory, Department of Applied Sciences, Indian Institute of Information Technology Allahabad (IIIT-A), Uttar Pradesh, Devghat, Jhalwa, Prayagraj, 211012, India.
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad, 211012, India.
| |
Collapse
|
6
|
Jin H, Zhang C, Zwahlen M, von Feilitzen K, Karlsson M, Shi M, Yuan M, Song X, Li X, Yang H, Turkez H, Fagerberg L, Uhlén M, Mardinoglu A. Systematic transcriptional analysis of human cell lines for gene expression landscape and tumor representation. Nat Commun 2023; 14:5417. [PMID: 37669926 PMCID: PMC10480497 DOI: 10.1038/s41467-023-41132-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 08/24/2023] [Indexed: 09/07/2023] Open
Abstract
Cell lines are valuable resources as model for human biology and translational medicine. It is thus important to explore the concordance between the expression in various cell lines vis-à-vis human native and disease tissues. In this study, we investigate the expression of all human protein-coding genes in more than 1,000 human cell lines representing 27 cancer types by a genome-wide transcriptomics analysis. The cell line gene expression is compared with the corresponding profiles in various tissues, organs, single-cell types and cancers. Here, we present the expression for each cell line and give guidance for the most appropriate cell line for a given experimental study. In addition, we explore the cancer-related pathway and cytokine activity of the cell lines to aid human biology studies and drug development projects. All data are presented in an open access cell line section of the Human Protein Atlas to facilitate the exploration of all human protein-coding genes across these cell lines.
Collapse
Affiliation(s)
- Han Jin
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Cheng Zhang
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Martin Zwahlen
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Kalle von Feilitzen
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Max Karlsson
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Mengnan Shi
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Meng Yuan
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Xiya Song
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Xiangyu Li
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Hong Yang
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Hasan Turkez
- Department of Medical Biology, Faculty of Medicine, Atatürk University, Erzurum, Turkey
| | - Linn Fagerberg
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Mathias Uhlén
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden.
- Department of Neuroscience, Karolinska Institute, Stockholm, Sweden.
| | - Adil Mardinoglu
- Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden.
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King's College London, London, UK.
| |
Collapse
|
7
|
Lee EM, Srinivasan S, Purvine SO, Fiedler TL, Leiser OP, Proll SC, Minot SS, Deatherage Kaiser BL, Fredricks DN. Optimizing metaproteomics database construction: lessons from a study of the vaginal microbiome. mSystems 2023; 8:e0067822. [PMID: 37350639 PMCID: PMC10469846 DOI: 10.1128/msystems.00678-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 04/06/2023] [Indexed: 06/24/2023] Open
Abstract
Metaproteomics, a method for untargeted, high-throughput identification of proteins in complex samples, provides functional information about microbial communities and can tie functions to specific taxa. Metaproteomics often generates less data than other omics techniques, but analytical workflows can be improved to increase usable data in metaproteomic outputs. Identification of peptides in the metaproteomic analysis is performed by comparing mass spectra of sample peptides to a reference database of protein sequences. Although these protein databases are an integral part of the metaproteomic analysis, few studies have explored how database composition impacts peptide identification. Here, we used cervicovaginal lavage (CVL) samples from a study of bacterial vaginosis (BV) to compare the performance of databases built using six different strategies. We evaluated broad versus sample-matched databases, as well as databases populated with proteins translated from metagenomic sequencing of the same samples versus sequences from public repositories. Smaller sample-matched databases performed significantly better, driven by the statistical constraints on large databases. Additionally, large databases attributed up to 34% of significant bacterial hits to taxa absent from the sample, as determined orthogonally by 16S rRNA gene sequencing. We also tested a set of hybrid databases which included bacterial proteins from NCBI RefSeq and translated bacterial genes from the samples. These hybrid databases had the best overall performance, identifying 1,068 unique human and 1,418 unique bacterial proteins, ~30% more than a database populated with proteins from typical vaginal bacteria and fungi. Our findings can help guide the optimal identification of proteins while maintaining statistical power for reaching biological conclusions. IMPORTANCE Metaproteomic analysis can provide valuable insights into the functions of microbial and cellular communities by identifying a broad, untargeted set of proteins. The databases used in the analysis of metaproteomic data influence results by defining what proteins can be identified. Moreover, the size of the database impacts the number of identifications after accounting for false discovery rates (FDRs). Few studies have tested the performance of different strategies for building a protein database to identify proteins from metaproteomic data and those that have largely focused on highly diverse microbial communities. We tested a range of databases on CVL samples and found that a hybrid sample-matched approach, using publicly available proteins from organisms present in the samples, as well as proteins translated from metagenomic sequencing of the samples, had the best performance. However, our results also suggest that public sequence databases will continue to improve as more bacterial genomes are published.
Collapse
Affiliation(s)
- Elliot M. Lee
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
- University of Washington, Seattle, Washington, DC, USA
| | | | - Samuel O. Purvine
- Pacific Northwest National Laboratory, Richland, Washington, DC, USA
| | - Tina L. Fiedler
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
| | - Owen P. Leiser
- Pacific Northwest National Laboratory, Richland, Washington, DC, USA
| | - Sean C. Proll
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
| | - Samuel S. Minot
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
| | | | - David N. Fredricks
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
- University of Washington, Seattle, Washington, DC, USA
| |
Collapse
|
8
|
Miles AJ, Drew ED, Wallace BA. DichroIDP: a method for analyses of intrinsically disordered proteins using circular dichroism spectroscopy. Commun Biol 2023; 6:823. [PMID: 37553525 PMCID: PMC10409736 DOI: 10.1038/s42003-023-05178-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 07/25/2023] [Indexed: 08/10/2023] Open
Abstract
Intrinsically disordered proteins (IDPs) are comprised of significant numbers of residues that form neither helix, sheet, nor any other canonical type of secondary structure. They play important roles in a broad range of biological processes, such as molecular recognition and signalling, largely due to their chameleon-like ability to change structure from unordered when free in solution to ordered when bound to partner molecules. Circular dichroism (CD) spectroscopy is a widely-used method for characterising protein secondary structures, but analyses of IDPs using CD spectroscopy have suffered because the methods and reference datasets used for the empirical determination of secondary structures do not contain adequate representations of unordered structures. This work describes the creation, validation and testing of a standalone Windows-based application, DichroIDP, and a new reference dataset, IDP175, which is suitable for analyses of proteins containing significant amounts of disordered structure. DichroIDP enables secondary structure determinations of IDPs and proteins containing intrinsically disordered regions.
Collapse
Affiliation(s)
- Andrew J Miles
- Institute of Structural and Molecular Biology, Birkbeck University of London, London, WC1E 7HX, UK
| | - Elliot D Drew
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London, E1 4NS, UK
- Zappi, London, NW1 7JN, UK
| | - B A Wallace
- Institute of Structural and Molecular Biology, Birkbeck University of London, London, WC1E 7HX, UK.
| |
Collapse
|
9
|
Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, Mangan NM, Ovchinnikov S, Rocklin GJ. Mega-scale experimental analysis of protein folding stability in biology and design. Nature 2023; 620:434-444. [PMID: 37468638 PMCID: PMC10412457 DOI: 10.1038/s41586-023-06328-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 06/14/2023] [Indexed: 07/21/2023]
Abstract
Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale1. However, the energetics driving folding are invisible in these structures and remain largely unknown2. The hidden thermodynamics of folding can drive disease3,4, shape protein evolution5-7 and guide protein engineering8-10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40-72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.
Collapse
Affiliation(s)
- Kotaro Tsuboyama
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- PRESTO, Japan Science and Technology Agency, Tokyo, Japan
- Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
| | - Justas Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jonathan Chen
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- McCormick School of Engineering, Northwestern University, Evanston, IL, USA
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Yasser Mohseni Behbahani
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Jonathan J Weinstein
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Niall M Mangan
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
| | - Gabriel J Rocklin
- Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
10
|
Ye J, Li A, Zheng H, Yang B, Lu Y. Machine Learning Advances in Predicting Peptide/Protein-Protein Interactions Based on Sequence Information for Lead Peptides Discovery. Adv Biol (Weinh) 2023; 7:e2200232. [PMID: 36775876 DOI: 10.1002/adbi.202200232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/30/2022] [Indexed: 02/14/2023]
Abstract
Peptides have shown increasing advantages and significant clinical value in drug discovery and development. With the development of high-throughput technologies and artificial intelligence (AI), machine learning (ML) methods for discovering new lead peptides have been expanded and incorporated into rational drug design. Predictions of peptide-protein interactions (PepPIs) and protein-protein interactions (PPIs) are both opportunities and challenges in computational biology, which will help to better understand the mechanisms of disease and provide the impetus for the discovery of lead peptides. This paper comprehensively reviews computational models for PepPI and PPI predictions. It begins with an introduction of various databases of peptide ligands and target proteins. Then it discusses data formats and feature representations for proteins and peptides. Furthermore, classical ML methods and emerging deep learning (DL) methods that can be used to train prediction models of PepPI and PPI are classified into four categories, and their advantages and disadvantages are analyzed. To assess the relative performance of different models, different validation protocols and evaluation indexes are discussed. The goal of this review is to help researchers quickly get started to develop computational frameworks using these integrated resources and eventually promote the discovery of lead peptides.
Collapse
Affiliation(s)
- Jiahao Ye
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - An Li
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Hao Zheng
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Banghua Yang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Yiming Lu
- School of Medicine, Shanghai University, Shanghai, 200444, China
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| |
Collapse
|
11
|
Rostam N, Ghosh S, Chow CFW, Hadarovich A, Landerer C, Ghosh R, Moon H, Hersemann L, Mitrea DM, Klein IA, Hyman AA, Toth-Petroczy A. CD-CODE: crowdsourcing condensate database and encyclopedia. Nat Methods 2023; 20:673-676. [PMID: 37024650 PMCID: PMC10172118 DOI: 10.1038/s41592-023-01831-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 02/27/2023] [Indexed: 04/08/2023]
Abstract
The discovery of biomolecular condensates transformed our understanding of intracellular compartmentalization of molecules. To integrate interdisciplinary scientific knowledge about the function and composition of biomolecular condensates, we developed the crowdsourcing condensate database and encyclopedia ( cd-code.org ). CD-CODE is a community-editable platform, which includes a database of biomolecular condensates based on the literature, an encyclopedia of relevant scientific terms and a crowdsourcing web application. Our platform will accelerate the discovery and validation of biomolecular condensates, and facilitate efforts to understand their role in disease and as therapeutic targets.
Collapse
Affiliation(s)
- Nadia Rostam
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Soumyadeep Ghosh
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Chi Fung Willis Chow
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, Dresden, Germany
| | - Anna Hadarovich
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Cedric Landerer
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Rajat Ghosh
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - HongKee Moon
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Lena Hersemann
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | | | | | - Anthony A Hyman
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
- Center for Systems Biology Dresden, Dresden, Germany.
- Cluster of Excellence Physics of Life, TU Dresden, Dresden, Germany.
| |
Collapse
|
12
|
Ruffolo JA, Chu LS, Mahajan SP, Gray JJ. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat Commun 2023; 14:2389. [PMID: 37185622 PMCID: PMC10129313 DOI: 10.1038/s41467-023-38063-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open
Abstract
Antibodies have the capacity to bind a diverse set of antigens, and they have become critical therapeutics and diagnostic molecules. The binding of antibodies is facilitated by a set of six hypervariable loops that are diversified through genetic recombination and mutation. Even with recent advances, accurate structural prediction of these loops remains a challenge. Here, we present IgFold, a fast deep learning method for antibody structure prediction. IgFold consists of a pre-trained language model trained on 558 million natural antibody sequences followed by graph networks that directly predict backbone atom coordinates. IgFold predicts structures of similar or better quality than alternative methods (including AlphaFold) in significantly less time (under 25 s). Accurate structure prediction on this timescale makes possible avenues of investigation that were previously infeasible. As a demonstration of IgFold's capabilities, we predicted structures for 1.4 million paired antibody sequences, providing structural insights to 500-fold more antibodies than have experimentally determined structures.
Collapse
Affiliation(s)
- Jeffrey A Ruffolo
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Lee-Shin Chu
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Jeffrey J Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA.
| |
Collapse
|
13
|
Hekkelman ML, de Vries I, Joosten RP, Perrakis A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods 2023; 20:205-213. [PMID: 36424442 PMCID: PMC9911346 DOI: 10.1038/s41592-022-01685-y] [Citation(s) in RCA: 110] [Impact Index Per Article: 110.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 10/18/2022] [Indexed: 11/27/2022]
Abstract
Artificial intelligence-based protein structure prediction approaches have had a transformative effect on biomolecular sciences. The predicted protein models in the AlphaFold protein structure database, however, all lack coordinates for small molecules, essential for molecular structure or function: hemoglobin lacks bound heme; zinc-finger motifs lack zinc ions essential for structural integrity and metalloproteases lack metal ions needed for catalysis. Ligands important for biological function are absent too; no ADP or ATP is bound to any of the ATPases or kinases. Here we present AlphaFill, an algorithm that uses sequence and structure similarity to 'transplant' such 'missing' small molecules and ions from experimentally determined structures to predicted protein models. The algorithm was successfully validated against experimental structures. A total of 12,029,789 transplants were performed on 995,411 AlphaFold models and are available together with associated validation metrics in the alphafill.eu databank, a resource to help scientists make new hypotheses and design targeted experiments.
Collapse
Affiliation(s)
- Maarten L. Hekkelman
- grid.430814.a0000 0001 0674 1393Oncode Institute and Department of Biochemistry, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Ida de Vries
- grid.430814.a0000 0001 0674 1393Oncode Institute and Department of Biochemistry, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Robbie P. Joosten
- grid.430814.a0000 0001 0674 1393Oncode Institute and Department of Biochemistry, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Anastassis Perrakis
- Oncode Institute and Department of Biochemistry, The Netherlands Cancer Institute, Amsterdam, the Netherlands.
| |
Collapse
|
14
|
Abstract
Protein structural families are groups of homologous proteins defined by the organization of secondary structure elements (SSEs). Nowadays, many families contain vast numbers of structures, and the SSEs can help to orient within them. Communities around specific protein families have even developed specialized SSE annotations, always assigning the same name to the equivalent SSEs in homologous proteins. A detailed analysis of the groups of equivalent SSEs provides an overview of the studied family and enriches the analysis of any particular protein at hand. We developed a workflow for the analysis of the secondary structure anatomy of a protein family. We applied this analysis to the model family of cytochromes P450 (CYPs)-a family of important biotransformation enzymes with a community-wide used SSE annotation. We report the occurrence, typical length and amino acid sequence for the equivalent SSE groups, the conservation/variability of these properties and relationship to the substrate recognition sites. We also suggest a generic residue numbering scheme for the CYP family. Comparing the bacterial and eukaryotic part of the family highlights the significant differences and reveals a well-known anomalous group of bacterial CYPs with some typically eukaryotic features. Our workflow for SSE annotation for CYP and other families can be freely used at address https://sestra.ncbr.muni.cz .
Collapse
Affiliation(s)
- Adam Midlik
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, 625 00, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, 625 00, Czech Republic
| | - Veronika Navrátilová
- Department of Physical Chemistry, Faculty of Science, Palacký University, Olomouc, 771 46, Czech Republic
| | - Taraka Ramji Moturu
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, 625 00, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, 625 00, Czech Republic
| | - Jaroslav Koča
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, 625 00, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, 625 00, Czech Republic
| | - Radka Svobodová
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, 625 00, Czech Republic.
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, 625 00, Czech Republic.
| | - Karel Berka
- Department of Physical Chemistry, Faculty of Science, Palacký University, Olomouc, 771 46, Czech Republic.
| |
Collapse
|
15
|
Sorokina O, Mclean C, Croning MDR, Heil KF, Wysocka E, He X, Sterratt D, Grant SGN, Simpson TI, Armstrong JD. A unified resource and configurable model of the synapse proteome and its role in disease. Sci Rep 2021; 11:9967. [PMID: 33976238 DOI: 10.1038/s41598-021-88945-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/15/2021] [Indexed: 02/03/2023] Open
Abstract
Genes encoding synaptic proteins are highly associated with neuronal disorders many of which show clinical co-morbidity. We integrated 58 published synaptic proteomic datasets that describe over 8000 proteins and combined them with direct protein-protein interactions and functional metadata to build a network resource that reveals the shared and unique protein components that underpin multiple disorders. All the data are provided in a flexible and accessible format to encourage custom use.
Collapse
|
16
|
Abstract
A preprint by Tarke et al. suggests a neglible impact of SARS-CoV-2 variant mutations on T cell reactivity in convalescent individuals and mRNA vaccine recipients.
Collapse
Affiliation(s)
- Aljawharah Alrubayyi
- OxMS Preprint Journal Club, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK.
| | - Dimitra Peppa
- OxMS Preprint Journal Club, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK.
| |
Collapse
|
17
|
Lawson CL, Kryshtafovych A, Adams PD, Afonine PV, Baker ML, Barad BA, Bond P, Burnley T, Cao R, Cheng J, Chojnowski G, Cowtan K, Dill KA, DiMaio F, Farrell DP, Fraser JS, Herzik MA, Hoh SW, Hou J, Hung LW, Igaev M, Joseph AP, Kihara D, Kumar D, Mittal S, Monastyrskyy B, Olek M, Palmer CM, Patwardhan A, Perez A, Pfab J, Pintilie GD, Richardson JS, Rosenthal PB, Sarkar D, Schäfer LU, Schmid MF, Schröder GF, Shekhar M, Si D, Singharoy A, Terashi G, Terwilliger TC, Vaiana A, Wang L, Wang Z, Wankowicz SA, Williams CJ, Winn M, Wu T, Yu X, Zhang K, Berman HM, Chiu W. Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge. Nat Methods 2021; 18:156-164. [PMID: 33542514 PMCID: PMC7864804 DOI: 10.1038/s41592-020-01051-w] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 12/21/2020] [Indexed: 01/30/2023]
Abstract
This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density.
Collapse
Affiliation(s)
- Catherine L. Lawson
- grid.430387.b0000 0004 1936 8796Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ USA
| | - Andriy Kryshtafovych
- grid.27860.3b0000 0004 1936 9684Genome Center, University of California, Davis, CA USA
| | - Paul D. Adams
- grid.184769.50000 0001 2231 4551Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA USA ,grid.47840.3f0000 0001 2181 7878Department of Bioengineering, University of California Berkeley, Berkeley, CA USA
| | - Pavel V. Afonine
- grid.184769.50000 0001 2231 4551Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA USA
| | - Matthew L. Baker
- grid.267308.80000 0000 9206 2401Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Benjamin A. Barad
- grid.214007.00000000122199231Department of Integrated Computational Structural Biology, The Scripps Research Institute, La Jolla, CA USA
| | - Paul Bond
- grid.5685.e0000 0004 1936 9668York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Tom Burnley
- grid.465239.fScientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Renzhi Cao
- grid.261584.c0000 0001 0492 9915Department of Computer Science, Pacific Lutheran University, Tacoma, WA USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Grzegorz Chojnowski
- grid.475756.20000 0004 0444 5410European Molecular Biology Laboratory, c/o DESY, Hamburg, Germany
| | - Kevin Cowtan
- grid.5685.e0000 0004 1936 9668York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Ken A. Dill
- grid.36425.360000 0001 2216 9681Laufer Center, Stony Brook University, Stony Brook, NY USA
| | - Frank DiMaio
- grid.34477.330000000122986657Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA USA
| | - Daniel P. Farrell
- grid.34477.330000000122986657Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA USA
| | - James S. Fraser
- grid.266102.10000 0001 2297 6811Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA USA
| | - Mark A. Herzik
- grid.266100.30000 0001 2107 4242Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA USA
| | - Soon Wen Hoh
- grid.5685.e0000 0004 1936 9668York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Jie Hou
- grid.262962.b0000 0004 1936 9342Department of Computer Science, Saint Louis University, St. Louis, MO USA
| | - Li-Wei Hung
- grid.148313.c0000 0004 0428 3079Los Alamos National Laboratory, Los Alamos, NM USA
| | - Maxim Igaev
- grid.418140.80000 0001 2104 4211Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Agnel P. Joseph
- grid.465239.fScientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Daisuke Kihara
- grid.169077.e0000 0004 1937 2197Department of Biological Sciences, Purdue University, West Lafayette, IN USA ,grid.169077.e0000 0004 1937 2197Department of Computer Science, Purdue University, West Lafayette, IN USA
| | - Dilip Kumar
- grid.39382.330000 0001 2160 926XVerna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX USA
| | - Sumit Mittal
- grid.215654.10000 0001 2151 2636Biodesign Institute, Arizona State University, Tempe, AZ USA ,grid.411530.20000 0001 0694 3745School of Advanced Sciences and Languages, VIT Bhopal University, Bhopal, India
| | - Bohdan Monastyrskyy
- grid.27860.3b0000 0004 1936 9684Genome Center, University of California, Davis, CA USA
| | - Mateusz Olek
- grid.5685.e0000 0004 1936 9668York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Colin M. Palmer
- grid.465239.fScientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Ardan Patwardhan
- grid.225360.00000 0000 9709 7726The European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Alberto Perez
- grid.15276.370000 0004 1936 8091Department of Chemistry, University of Florida, Gainesville, FL USA
| | - Jonas Pfab
- grid.462982.30000 0000 8883 2602Division of Computing & Software Systems, University of Washington, Bothell, WA USA
| | - Grigore D. Pintilie
- grid.168010.e0000000419368956Department of Bioengineering, Stanford University, Stanford, CA USA
| | - Jane S. Richardson
- grid.26009.3d0000 0004 1936 7961Department of Biochemistry, Duke University, Durham, NC USA
| | - Peter B. Rosenthal
- grid.451388.30000 0004 1795 1830Structural Biology of Cells and Viruses Laboratory, Francis Crick Institute, London, UK
| | - Daipayan Sarkar
- grid.169077.e0000 0004 1937 2197Department of Biological Sciences, Purdue University, West Lafayette, IN USA ,grid.215654.10000 0001 2151 2636Biodesign Institute, Arizona State University, Tempe, AZ USA
| | - Luisa U. Schäfer
- grid.8385.60000 0001 2297 375XInstitute of Biological Information Processing (IBI-7: Structural Biochemistry) and Jülich Centre for Structural Biology (JuStruct), Forschungszentrum Jülich, Jülich, Germany
| | - Michael F. Schmid
- grid.168010.e0000000419368956Division of CryoEM and Biomaging, SSRL, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA USA
| | - Gunnar F. Schröder
- grid.8385.60000 0001 2297 375XInstitute of Biological Information Processing (IBI-7: Structural Biochemistry) and Jülich Centre for Structural Biology (JuStruct), Forschungszentrum Jülich, Jülich, Germany ,grid.411327.20000 0001 2176 9917Physics Department, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mrinal Shekhar
- grid.215654.10000 0001 2151 2636Biodesign Institute, Arizona State University, Tempe, AZ USA ,grid.66859.34Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Dong Si
- grid.462982.30000 0000 8883 2602Division of Computing & Software Systems, University of Washington, Bothell, WA USA
| | - Abishek Singharoy
- grid.215654.10000 0001 2151 2636Biodesign Institute, Arizona State University, Tempe, AZ USA
| | - Genki Terashi
- grid.418140.80000 0001 2104 4211Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | | | - Andrea Vaiana
- grid.418140.80000 0001 2104 4211Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Liguo Wang
- grid.34477.330000000122986657Department of Biological Structure, University of Washington, Seattle, WA USA
| | - Zhe Wang
- grid.225360.00000 0000 9709 7726The European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Stephanie A. Wankowicz
- grid.266102.10000 0001 2297 6811Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Biophysics Graduate Program, University of California, San Francisco, CA USA
| | | | - Martyn Winn
- grid.465239.fScientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Tianqi Wu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Xiaodi Yu
- grid.497530.c0000 0004 0389 4927SMPS, Janssen Research and Development, Spring House, PA USA
| | - Kaiming Zhang
- grid.168010.e0000000419368956Department of Bioengineering, Stanford University, Stanford, CA USA
| | - Helen M. Berman
- grid.430387.b0000 0004 1936 8796Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ USA ,grid.42505.360000 0001 2156 6853Department of Biological Sciences and Bridge Institute, University of Southern California, Los Angeles, CA USA
| | - Wah Chiu
- grid.168010.e0000000419368956Department of Bioengineering, Stanford University, Stanford, CA USA ,grid.168010.e0000000419368956Division of CryoEM and Biomaging, SSRL, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA USA
| |
Collapse
|
18
|
Dupree EJ, Crimmins BS, Holsen TM, Darie CC. Proteomic Analysis of the Lake Trout (Salvelinus namaycush) Liver Identifies Proteins from Evolutionarily Close and -Distant Fish Relatives. Proteomics 2019; 19:e1800429. [PMID: 31578773 DOI: 10.1002/pmic.201800429] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 09/12/2019] [Indexed: 12/20/2022]
Abstract
Lake trout are used as bioindicators for toxics exposure in the Great Lakes ecosystem. Here the first lake trout (Salvelinus namaycush) liver proteomics study is performed and searched against specific databases: (NCBI and UniProtKB) Salvelinus, Salmonidae, Actinopterygii, and Oncorhynchus mykiss, and the more distant relative, Danio rerio. In the biological replicate 1 (BR1), technical replicate 1 (TR1), (BR1TR1), a large number of lake trout liver proteins are not in the Salvelinus protein database, suggesting that lake trout liver proteins have homology to some proteins from the Salmonidae family and Actinopterygii class, and to Oncorhynchus mykiss and Danio rerio, two more highly studied fish. In the NCBI search, 4194 proteins are identified: 3069 proteins in Actinopterygii, 1617 in Salmonidae, 68 in Salvelinus, 568 in Oncorhynchus mykiss, and 946 in Danio rerio protein databases. Similar results are observed in the UniProtKB searches of BR1RT1, as well as in a technical replicate (BR1TR2), and then in a second biological replicate experiment, with two technical replicates (BR2TR1 and BR2TR2). This study opens the possibility of identifying evolutionary relationships (i.e., adaptive mutations) between various groups (i.e., zebrafish, rainbow trout, Salmonidae, Salvelinus and lake trout) through evolutionary proteomics. Data are available via the PRIDE Q2 (PXD011924).
Collapse
Affiliation(s)
- Emmalyn J Dupree
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, Potsdam, NY, 13699-5810, USA
| | - Bernard S Crimmins
- Department of Environmental Engineering, Clarkson University, Potsdam, NY, 13699-5708, USA.,AEACS, LLC, New Kensington, PA, 15068, USA
| | - Thomas M Holsen
- Department of Environmental Engineering, Clarkson University, Potsdam, NY, 13699-5708, USA
| | - Costel C Darie
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, Potsdam, NY, 13699-5810, USA
| |
Collapse
|
19
|
Mier P, Andrade-Navarro MA. Toward completion of the Earth's proteome: an update a decade later. Brief Bioinform 2019; 20:463-470. [PMID: 29040399 DOI: 10.1093/bib/bbx127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 09/08/2017] [Indexed: 12/13/2022] Open
Abstract
Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on increasing the level of annotation of its sequences, its counterpart TrEMBL, much less limited by curation steps, is still in a phase of accelerated growth. However, we predict that before 2020, almost all entries deposited in UniProtKB will be homologous to known proteins. We propose that new sequencing projects can be made more useful if they are driven to sequencing voids, parts of the tree of life far from already sequenced species or model organisms. We show these voids are present in the Archaea and Eukarya domains of life. The approach to the certainty of the redundancy of new protein sequence entries leads to the consideration that most of the protein diversity on Earth has already been described, which we estimate to be of around 3.75 million proteins, revising down the prediction we did a decade ago.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Gresemundweg, Mainz, Germany
| | | |
Collapse
|
20
|
Tabakmakher VM, Krylov NA, Kuzmenkov AI, Efremov RG, Vassilevski AA. Kalium 2.0, a comprehensive database of polypeptide ligands of potassium channels. Sci Data 2019; 6:73. [PMID: 31133708 PMCID: PMC6536513 DOI: 10.1038/s41597-019-0074-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 03/29/2019] [Indexed: 12/31/2022] Open
Abstract
Potassium channels are the most diverse group of ion channels in humans. They take vital parts in numerous physiological processes and their malfunction gives rise to a range of pathologies. In addition to small molecules, there is a wide selection of several hundred polypeptide ligands binding to potassium channels, the majority of which have been isolated from animal venoms. Until recently, only scorpion toxins received focused attention being systematically assembled in the manually curated Kalium database, but there is a diversity of well-characterized potassium channel ligands originating from other sources. To address this issue, here we present the updated and improved Kalium 2.0 that covers virtually all known polypeptide ligands of potassium channels and reviews all available pharmacological data. In addition to an expansion, we have introduced several new features to the database including posttranslational modification annotation, indication of ligand mode of action, BLAST search, and possibility of data export.
Collapse
Affiliation(s)
- Valentin M Tabakmakher
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
- School of Biomedicine, Far Eastern Federal University, Vladivostok, 690950, Russia
| | - Nikolay A Krylov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
- National Research University Higher School of Economics, Moscow, 101000, Russia
| | - Alexey I Kuzmenkov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
| | - Roman G Efremov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
- National Research University Higher School of Economics, Moscow, 101000, Russia
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow Oblast, 141700, Russia
| | - Alexander A Vassilevski
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia.
- Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow Oblast, 141700, Russia.
| |
Collapse
|
21
|
Shao C, Liu Z, Yang H, Wang S, Burley SK. Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach. Sci Data 2018; 5:180293. [PMID: 30532050 PMCID: PMC6289109 DOI: 10.1038/sdata.2018.293] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 11/12/2018] [Indexed: 02/02/2023] Open
Abstract
Outlier analyses are central to scientific data assessments. Conventional outlier identification methods do not work effectively for Protein Data Bank (PDB) data, which are characterized by heavy skewness and the presence of bounds and/or long tails. We have developed a data-driven nonparametric method to identify outliers in PDB data based on kernel probability density estimation. Unlike conventional outlier analyses based on location and scale, Probability Density Ranking can be used for robust assessments of distance from other observations. Analyzing PDB data from the vantage points of probability and frequency enables proper outlier identification, which is important for quality control during deposition-validation-biocuration of new three-dimensional structure data. Ranking of Probability Density also permits use of Most Probable Range as a robust measure of data dispersion that is more compact than Interquartile Range. The Probability-Density-Ranking approach can be employed to analyze outliers and data-spread on any large data set with continuous distribution.
Collapse
Affiliation(s)
- Chenghua Shao
- RCSB Protein Data Bank, Rutgers, The State University of New
Jersey, Piscataway,
NJ
08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State
University of New Jersey, Piscataway, NJ
08854, USA
| | - Zonghong Liu
- Department of Statistics and Biostatistics, Rutgers, The State
University of New Jersey, New
Brunswick, NJ,
08903, USA
| | - Huanwang Yang
- RCSB Protein Data Bank, Rutgers, The State University of New
Jersey, Piscataway,
NJ
08854, USA
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers, The State
University of New Jersey, Piscataway, NJ
08854, USA
- Department of Statistics and Biostatistics, Rutgers, The State
University of New Jersey, New
Brunswick, NJ,
08903, USA
| | - Stephen K. Burley
- RCSB Protein Data Bank, Rutgers, The State University of New
Jersey, Piscataway,
NJ
08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State
University of New Jersey, Piscataway, NJ
08854, USA
- Rutgers Cancer Institute of New Jersey, Rutgers, The State
University of New Jersey, New
Brunswick, NJ,
08903, USA
- RCSB Protein Data Bank, San Diego Supercomputer Center and
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California
San Diego, La Jolla,
CA
92093, USA
| |
Collapse
|
22
|
Sanchez-Garcia R, Sorzano COS, Carazo JM, Segura J. 3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures. Molecules 2017; 22:E2230. [PMID: 29244774 DOI: 10.3390/molecules22122230] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/11/2017] [Accepted: 12/13/2017] [Indexed: 11/16/2022] Open
Abstract
Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although the computational cost of calculating a single PSSM profile is affordable, many statistical studies or machine learning-based methods used thousands of profiles to achieve their goals, thereby leading to a substantial increase of the computational cost. In this work we present a new database compiling PSSM profiles for the proteins of the PDB. Currently, the database contains 333,532 protein chain profiles involving 123,135 different PDB entries.
Collapse
|
23
|
Dessimoz C, Škunca N, Thomas PD. CAFA and the open world of protein function predictions. Trends Genet 2013; 29:609-10. [PMID: 24138813 DOI: 10.1016/j.tig.2013.09.005] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Accepted: 09/17/2013] [Indexed: 11/22/2022]
|
24
|
Abstract
Prism (protein interactions by structural matching) is a system that employs a novel prediction algorithm for protein-protein interactions. It adopts a bottom-up approach that combines structure and sequence conservation in protein interfaces. The algorithm seeks possible binary interactions between proteins through structure similarity and evolutionary conservation of known interfaces. It is composed of a database containing protein interface structures derived from the Protein Data Bank (PDB) and predicted protein-protein interactions. It also provides related information about the proteins and an interactive protein interface viewer. In the current version, 3799 structurally nonredundant interfaces are used to predict the interactions among 6170 proteins. A substantial number of interactions are verified in two publicly available interaction databases (DIP and BIND). As the verified interactions demonstrate the suitability of our approach, unverified ones may point to undiscovered interactions. Prism can be accessed through a user-friendly website (http://prism.ccbb.ku.edu.tr) and it will be updated regularly as new protein structures become available in the PDB. Users may browse through the nonredundant dataset of representative interfaces on which the prediction algorithm depends, retrieve the list of structures similar to these interfaces, or see the results of interaction predictions for a particular protein. Another service provided is the interactive prediction. This is done by running the algorithm for the user input structures.
Collapse
|