1
|
Sicilia C, Corral-Lugo A, Smialowski P, McConnell MJ, Martín-Galiano AJ. Unsupervised Machine Learning Organization of the Functional Dark Proteome of Gram-Negative "Superbugs": Six Protein Clusters Amenable for Distinct Scientific Applications. ACS OMEGA 2022; 7:46131-46145. [PMID: 36570227 PMCID: PMC9774411 DOI: 10.1021/acsomega.2c04076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/06/2022] [Indexed: 06/17/2023]
Abstract
Uncharacterized proteins have been underutilized as targets for the development of novel therapeutics for difficult-to-treat bacterial infections. To facilitate the exploration of these proteins, 2819 predicted, uncharacterized proteins (19.1% of the total) from reference strains of multidrug Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa species were organized using an unsupervised k-means machine learning algorithm. Classification using normalized values for protein length, pI, hydrophobicity, degree of conservation, structural disorder, and %AT of the coding gene rendered six natural clusters. Cluster proteins showed different trends regarding operon membership, expression, presence of unknown function domains, and interactomic relevance. Clusters 2, 4, and 5 were enriched with highly disordered proteins, nonworkable membrane proteins, and likely spurious proteins, respectively. Clusters 1, 3, and 6 showed closer distances to known antigens, antibiotic targets, and virulence factors. Up to 21.8% of proteins in these clusters were structurally covered by modeling, which allowed assessment of druggability and discontinuous B-cell epitopes. Five proteins (4 in Cluster 1) were potential druggable targets for antibiotherapy. Eighteen proteins (11 in Cluster 6) were strong B-cell and T-cell immunogen candidates for vaccine development. Conclusively, we provide a feature-based schema to fractionate the functional dark proteome of critical pathogens for fundamental and biomedical purposes.
Collapse
Affiliation(s)
- Carlos Sicilia
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| | - Andrés Corral-Lugo
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| | - Pawel Smialowski
- Core
Facility Bioinformatics, Biomedical Center Munich, Faculty of Medicine, Ludwig Maximilians Universität München, Munich 80539, Germany
- Institute
of Stem Cell Research, Helmholtz Center Munich, Planegg-Martinsried 82152, Germany
| | - Michael J. McConnell
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| | - Antonio J. Martín-Galiano
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| |
Collapse
|
2
|
ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun 2022; 13:4348. [PMID: 35896542 PMCID: PMC9329459 DOI: 10.1038/s41467-022-32007-7] [Citation(s) in RCA: 96] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/13/2022] [Indexed: 11/29/2022] Open
Abstract
Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled the implementation of language models capable of generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a language model trained on the protein space that generates de novo protein sequences following the principles of natural ones. The generated proteins display natural amino acid propensities, while disorder predictions indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yields well-folded non-idealized structures with embodiments and large loops and reveals topologies not captured in current structure databases. ProtGPT2 generates sequences in a matter of seconds and is freely available. Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Here the authors apply some of the latest advances in natural language processing, generative Transformers, to train ProtGPT2, a language model that explores unseen regions of the protein space while designing proteins with nature-like properties.
Collapse
|
3
|
Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, Kottmann R, Mitchell A, Sánchez P, Siren K, Steinegger M, Gloeckner FO, Fernàndez-Guerra A. Unifying the known and unknown microbial coding sequence space. eLife 2022; 11:67667. [PMID: 35356891 PMCID: PMC9132574 DOI: 10.7554/elife.67667] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/30/2022] [Indexed: 12/02/2022] Open
Abstract
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data. It is estimated that scientists do not know what half of microbial genes actually do. When these genes are discovered in microorganisms grown in the lab or found in environmental samples, it is not possible to identify what their roles are. Many of these genes are excluded from further analyses for these reasons, meaning that the study of microbial genes tends to be limited to genes that have already been described. These limitations hinder research into microbiology, because information from newly discovered genes cannot be integrated to better understand how these organisms work. Experiments to understand what role these genes have in the microorganisms are labor-intensive, so new analytical strategies are needed. To do this, Vanni et al. developed a new framework to categorize genes with unknown roles, and a computational workflow to integrate them into traditional analyses. When this approach was applied to over 400 million microbial genes (both with known and unknown roles), it showed that the share of genes with unknown functions is only about 30 per cent, smaller than previously thought. The analysis also showed that these genes are very diverse, revealing a huge space for future research and potential applications. Combining their approach with experimental data, Vanni et al. were able to identify a gene with a previously unknown purpose that could be involved in antibiotic resistance. This system could be useful for other scientists studying microorganisms to get a more complete view of microbial systems. In future, it may also be used to analyze the genetics of other organisms, such as plants and animals.
Collapse
Affiliation(s)
- Chiara Vanni
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | | | - Silvia G Acinas
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar-CMIMA (CSIC), Barcelona, Spain
| | - Albert Barberán
- Department of Environmental Science, University of Arizona, Tucson, United States
| | - Pier Luigi Buttigieg
- Helmholtz Centre for Polar and Marine Research, Alfred Wegener Institute, Bremerhaven, Germany
| | - Emilio O Casamayor
- Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for Research, Blanes, Spain
| | - Tom O Delmont
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Paris, France
| | - Carlos M Duarte
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - A Murat Eren
- Department of Medicine, University of Chicago, Chicago, United States
| | - Robert D Finn
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, United Kingdom
| | - Renzo Kottmann
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Alex Mitchell
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, United Kingdom
| | - Pablo Sánchez
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar-CMIMA (CSIC), Barcelona, Spain
| | - Kimmo Siren
- Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Frank Oliver Gloeckner
- MARUM, Helmholtz Center for Polar and Marine Research, University of Bremen, Bremen, Germany
| | - Antonio Fernàndez-Guerra
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
4
|
Overall CM. A Flickering Light at the End of the Pandemic Tunnel. J Proteome Res 2021; 20:5223-5226. [PMID: 34856807 DOI: 10.1021/acs.jproteome.1c00866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Christopher M Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, The University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| |
Collapse
|
5
|
Affiliation(s)
- Christopher M Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, The University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| |
Collapse
|
6
|
Dark Proteome Database: Studies on Disorder. High Throughput 2020; 9:ht9030015. [PMID: 32629790 PMCID: PMC7563470 DOI: 10.3390/ht9030015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 06/17/2020] [Accepted: 06/18/2020] [Indexed: 12/17/2022] Open
Abstract
There is a misconception that intrinsic disorder in proteins is equivalent to darkness. The present study aims to establish, in the scope of the Swiss-Prot and Dark Proteome databases, the relationship between disorder and darkness. Three distinct predictors were used to calculate the disorder of Swiss-Prot proteins. The analysis of the results obtained with the used predictors and visualization paradigms resulted in the same conclusion that was reached before: disorder is mostly unrelated to darkness.
Collapse
|
7
|
Uversky VN. Torches, Candles, Lamps, Lanterns, Flashlights, Spotlights, Night Vision Goggles… You Need Them All to See in Darkness. Proteomics 2020; 19:e1900085. [PMID: 30829430 DOI: 10.1002/pmic.201900085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Articles assembled in the second part of this Special Issue describe some experimental and computational approaches for the structural and functional characterization of intrinsically disordered proteins. Since these tools represent specialized gear for the focused analysis of various aspects of dark proteome, they can be viewed as torches, candles, lamps, lanterns, flashlights, spotlights, night vision goggles, and other means needed to see in darkness.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow, 142290, Russia
| |
Collapse
|
8
|
Uversky VN. Bringing Darkness to Light: Intrinsic Disorder as a Means to Dig into the Dark Proteome. Proteomics 2019; 18:e1800352. [PMID: 30334344 DOI: 10.1002/pmic.201800352] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Moscow Region, Russia
| |
Collapse
|
9
|
Jauset T, Beaulieu ME. Bioactive cell penetrating peptides and proteins in cancer: a bright future ahead. Curr Opin Pharmacol 2019; 47:133-140. [PMID: 31048179 DOI: 10.1016/j.coph.2019.03.014] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/26/2019] [Accepted: 03/27/2019] [Indexed: 02/05/2023]
Abstract
Peptides and proteins bear an extraordinary therapeutic potential to effectively and selectively target many components of cells currently considered undruggable. However, their intracellular delivery remains a critical challenge. Cell penetrating peptides and protein domains (CPPs) can be employed to translocate therapeutic polypeptides through the cellular membrane. Here, we describe examples of linear peptides and proteins, byciclic macropeptides and nanobodies that target key players in cancer development, with intrinsic and engineered cell penetrating ability. We also describe current solutions to the main challenges to their clinical viability.
Collapse
Affiliation(s)
- Toni Jauset
- Peptomyc, Edifici Cellex, Hospital Vall d'Hebron, Barcelona, 08035, Spain
| | - Marie-Eve Beaulieu
- Peptomyc, Edifici Cellex, Hospital Vall d'Hebron, Barcelona, 08035, Spain.
| |
Collapse
|
10
|
Perdigão N, Rosa A. Dark Proteome Database: Studies on Dark Proteins. High Throughput 2019; 8:ht8020008. [PMID: 30934744 PMCID: PMC6630768 DOI: 10.3390/ht8020008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 03/12/2019] [Accepted: 03/15/2019] [Indexed: 12/27/2022] Open
Abstract
The dark proteome, as we define it, is the part of the proteome where 3D structure has not been observed either by homology modeling or by experimental characterization in the protein universe. From the 550.116 proteins available in Swiss-Prot (as of July 2016), 43.2% of the eukarya universe and 49.2% of the virus universe are part of the dark proteome. In bacteria and archaea, the percentage of the dark proteome presence is significantly less, at 12.6% and 13.3% respectively. In this work, we present a necessary step to complete the dark proteome picture by introducing the map of the dark proteome in the human and in other model organisms of special importance to mankind. The most significant result is that around 40% to 50% of the proteome of these organisms are still in the dark, where the higher percentages belong to higher eukaryotes (mouse and human organisms). Due to the amount of darkness present in the human organism being more than 50%, deeper studies were made, including the identification of ‘dark’ genes that are responsible for the production of so-called dark proteins, as well as the identification of the ‘dark’ tissues where dark proteins are over represented, namely, the heart, cervical mucosa, and natural killer cells. This is a step forward in the direction of gaining a deeper knowledge of the human dark proteome.
Collapse
Affiliation(s)
- Nelson Perdigão
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
- Instituto de Sistemas e Robótica, 1049-001 Lisbon, Portugal.
| | - Agostinho Rosa
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.
- Instituto de Sistemas e Robótica, 1049-001 Lisbon, Portugal.
| |
Collapse
|
11
|
Paik YK, Lane L, Kawamura T, Chen YJ, Cho JY, LaBaer J, Yoo JS, Domont G, Corrales F, Omenn GS, Archakov A, Encarnación-Guevara S, Lui S, Salekdeh GH, Cho JY, Kim CY, Overall CM. Launching the C-HPP neXt-CP50 Pilot Project for Functional Characterization of Identified Proteins with No Known Function. J Proteome Res 2018; 17:4042-4050. [PMID: 30269496 PMCID: PMC6693327 DOI: 10.1021/acs.jproteome.8b00383] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
An important goal of the Human Proteome Organization (HUPO) Chromosome-centric Human Proteome Project (C-HPP) is to correctly define the number of canonical proteins encoded by their cognate open reading frames on each chromosome in the human genome. When identified with high confidence of protein evidence (PE), such proteins are termed PE1 proteins in the online database resource, neXtProt. However, proteins that have not been identified unequivocally at the protein level but that have other evidence suggestive of their existence (PE2-4) are termed missing proteins (MPs). The number of MPs has been reduced from 5511 in 2012 to 2186 in 2018 (neXtProt 2018-01-17 release). Although the annotation of the human proteome has made significant progress, the "parts list" alone does not inform function. Indeed, 1937 proteins representing ∼10% of the human proteome have no function either annotated from experimental characterization or predicted by homology to other proteins. Specifically, these 1937 "dark proteins" of the so-called dark proteome are composed of 1260 functionally uncharacterized but identified PE1 proteins, designated as uPE1, plus 677 MPs from categories PE2-PE4, which also have no known or predicted function and are termed uMPs. At the HUPO-2017 Annual Meeting, the C-HPP officially adopted the uPE1 pilot initiative, with 14 participating international teams later committing to demonstrate the feasibility of the functional characterization of large numbers of dark proteins (CP), starting first with 50 uPE1 proteins, in a stepwise chromosome-centric organizational manner. The second aim of the feasibility phase to characterize protein (CP) functions of 50 uPE1 proteins, termed the neXt-CP50 initiative, is to utilize a variety of approaches and workflows according to individual team expertise, interest, and resources so as to enable the C-HPP to recommend experimentally proven workflows to the proteome community within 3 years. The results from this pilot will not only be the cornerstone of a larger characterization initiative but also enhance understanding of the human proteome and integrated cellular networks for the discovery of new mechanisms of pathology, mechanistically informative biomarkers, and rational drug targets.
Collapse
Affiliation(s)
- Young-Ki Paik
- Yonsei Proteome Research Center and Department of Integrative Omics, Yonsei University, Sudaemoon-ku, Seoul, Korea
| | - Lydie Lane
- CALIPHO group, Swiss Institute of Bioinformatics & Department of Microbiology and Molecular medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Takeshi Kawamura
- Proteomics Laboratory, Isotope Science Center, The University of Tokyo, Bunkyo-Ku, Tokyo 113-0032 Japan
| | - Yu-Ju Chen
- Institute of Chemistry Academia Sinica, 128 Academia Road Sec. 2, Nankang Taipei 115 Taiwan
| | - Je-Yoel Cho
- Research Institute for Veterinary Science, College of Veterinary Medicine, Seoul University, 1 Gwanak-, Gwanak-gu, 151-742 Seoul, South Korea
| | - Joshua LaBaer
- McAllister Ave. Arizona State University, Tempe, Arizona, 85287-5001, USA
| | - Jong Shin Yoo
- Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Korea
| | - Gilberto Domont
- Federal University of Rio de Janeiro Institute of Chemistry, Rio de Janeiro, RJ Brazil
| | - Fernando Corrales
- Functional Proteomics Laboratory National Center of Biotechnology, CSIC 28049 Madrid, Spain
| | - Gilbert S. Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| | | | | | - Siqi Lui
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China
| | - Ghasem Hosseini Salekdeh
- Department of Molecular Systems Biology, Royan Institute for Stem Cell Biology and Technology, 1665659911, Tehran, Iran
- Department of Molecular Sciences, Macquarie University, Sydney, Australia
| | - Jin-Young Cho
- Yonsei Proteome Research Center and Department of Integrative Omics, Yonsei University, Sudaemoon-ku, Seoul, Korea
| | - Chae-Yeon Kim
- Yonsei Proteome Research Center and Department of Integrative Omics, Yonsei University, Sudaemoon-ku, Seoul, Korea
| | - Christopher M. Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, University of British Columbia, Vancouver, Canada
| |
Collapse
|
12
|
Kulkarni P, Uversky VN. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome. Proteomics 2018; 18:e1800061. [DOI: 10.1002/pmic.201800061] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 09/07/2018] [Indexed: 12/27/2022]
Affiliation(s)
- Prakash Kulkarni
- Department of Medical Oncology and Therapeutics Research; City of Hope National Medical Center; Duarte CA 91010 USA
| | - Vladimir N. Uversky
- Department of Molecular Medicine; Morsani College of Medicine; University of South Florida; Tampa FL 33612 USA
- Laboratory of New methods in Biology; Institute for Biological Instrumentation; Russian Academy of Sciences; Pushchino Moscow Region 142290 Russia
| |
Collapse
|
13
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
14
|
Melaine N, Com E, Bellaud P, Guillot L, Lagarrigue M, Morrice NA, Guével B, Lavigne R, Velez de la Calle JF, Dojahn J, Pineau C. Deciphering the Dark Proteome: Use of the Testis and Characterization of Two Dark Proteins. J Proteome Res 2018; 17:4197-4210. [DOI: 10.1021/acs.jproteome.8b00387] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Nathalie Melaine
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| | - Emmanuelle Com
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| | - Pascale Bellaud
- H2P2 Core Facility, UMS BioSit, Univ Rennes, Rennes F-35040, France
| | - Laetitia Guillot
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| | - Mélanie Lagarrigue
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| | - Nick A. Morrice
- Sciex, Phoenix House Lakeside Drive Centre Park, Warrington WA1 1RX, U.K
| | - Blandine Guével
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| | - Régis Lavigne
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| | | | - Jörg Dojahn
- Sciex, Landwehrstr. 54, 64293 Darmstadt, Germany
| | - Charles Pineau
- Univ Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail)—UMR S 1085, F-35042 Rennes cedex, France
- Protim, Univ Rennes, F-35042 Rennes, France
| |
Collapse
|