1
|
Roterman I, Stapor K, Konieczny L. Engagement of intrinsic disordered proteins in protein-protein interaction. Front Mol Biosci 2023; 10:1230922. [PMID: 37583961 PMCID: PMC10423874 DOI: 10.3389/fmolb.2023.1230922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 07/11/2023] [Indexed: 08/17/2023] Open
Abstract
Proteins from the intrinsically disordered group (IDP) focus the attention of many researchers engaged in protein structure analysis. The main criteria used in their identification are lack of secondary structure and significant structural variability. This variability takes forms that cannot be identified in the X-ray technique. In the present study, different criteria were used to assess the status of IDP proteins and their fragments recognized as intrinsically disordered regions (IDRs). The status of the hydrophobic core in proteins identified as IDPs and in their complexes was assessed. The status of IDRs as components of the ordering structure resulting from the construction of the hydrophobic core was also assessed. The hydrophobic core is understood as a structure encompassing the entire molecule in the form of a centrally located high concentration of hydrophobicity and a shell with a gradually decreasing level of hydrophobicity until it reaches a level close to zero on the protein surface. It is a model assuming that the protein folding process follows a micellization pattern aiming at exposing polar residues on the surface, with the simultaneous isolation of hydrophobic amino acids from the polar aquatic environment. The use of the model of hydrophobicity distribution in proteins in the form of the 3D Gaussian distribution described on the protein particle introduces the possibility of assessing the degree of similarity to the assumed micelle-like distribution and also enables the identification of deviations and mismatch between the actual distribution and the idealized distribution. The FOD (fuzzy oil drop) model and its modified FOD-M version allow for the quantitative assessment of these differences and the assessment of the relationship of these areas to the protein function. In the present work, the sections of IDRs in protein complexes classified as IDPs are analyzed. The classification "disordered" in the structural sense (lack of secondary structure or high flexibility) does not always entail a mismatch with the structure of the hydrophobic core. Particularly, the interface area, often consisting of IDRs, in many analyzed complexes shows the compliance of the hydrophobicity distribution with the idealized distribution, which proves that matching to the structure of the hydrophobic core does not require secondary structure ordering.
Collapse
Affiliation(s)
- Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Kraków, Poland
| | - Katarzyna Stapor
- Department of Applied Informatics, Faculty of Automatic, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Medical College, Jagiellonian University, Kraków, Poland
| |
Collapse
|
2
|
Wilson DM, Deacon AM, Duncton MAJ, Pellicena P, Georgiadis MM, Yeh AP, Arvai AS, Moiani D, Tainer JA, Das D. Fragment- and structure-based drug discovery for developing therapeutic agents targeting the DNA Damage Response. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2021; 163:130-142. [PMID: 33115610 PMCID: PMC8666131 DOI: 10.1016/j.pbiomolbio.2020.10.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 10/13/2020] [Accepted: 10/23/2020] [Indexed: 12/12/2022]
Abstract
Cancer will directly affect the lives of over one-third of the population. The DNA Damage Response (DDR) is an intricate system involving damage recognition, cell cycle regulation, DNA repair, and ultimately cell fate determination, playing a central role in cancer etiology and therapy. Two primary therapeutic approaches involving DDR targeting include: combinatorial treatments employing anticancer genotoxic agents; and synthetic lethality, exploiting a sporadic DDR defect as a mechanism for cancer-specific therapy. Whereas, many DDR proteins have proven "undruggable", Fragment- and Structure-Based Drug Discovery (FBDD, SBDD) have advanced therapeutic agent identification and development. FBDD has led to 4 (with ∼50 more drugs under preclinical and clinical development), while SBDD is estimated to have contributed to the development of >200, FDA-approved medicines. Protein X-ray crystallography-based fragment library screening, especially for elusive or "undruggable" targets, allows for simultaneous generation of hits plus details of protein-ligand interactions and binding sites (orthosteric or allosteric) that inform chemical tractability, downstream biology, and intellectual property. Using a novel high-throughput crystallography-based fragment library screening platform, we screened five diverse proteins, yielding hit rates of ∼2-8% and crystal structures from ∼1.8 to 3.2 Å. We consider current FBDD/SBDD methods and some exemplary results of efforts to design inhibitors against the DDR nucleases meiotic recombination 11 (MRE11, a.k.a., MRE11A), apurinic/apyrimidinic endonuclease 1 (APE1, a.k.a., APEX1), and flap endonuclease 1 (FEN1).
Collapse
Affiliation(s)
- David M Wilson
- Hasselt University, Biomedical Research Institute, Diepenbeek, Belgium; Boost Scientific, Heusden-Zolder, Belgium; XPose Therapeutics Inc., San Carlos, CA, USA
| | - Ashley M Deacon
- Accelero Biostructures Inc., San Francisco, CA, USA; XPose Therapeutics Inc., San Carlos, CA, USA
| | | | | | - Millie M Georgiadis
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA; XPose Therapeutics Inc., San Carlos, CA, USA
| | - Andrew P Yeh
- Accelero Biostructures Inc., San Francisco, CA, USA
| | - Andrew S Arvai
- Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Davide Moiani
- Department of Cancer Biology, MD Anderson Cancer Center, Houston, TX, USA; Department of Molecular and Cellular Oncology, MD Anderson Cancer Center, Houston, TX, USA
| | - John A Tainer
- Department of Cancer Biology, MD Anderson Cancer Center, Houston, TX, USA; Department of Molecular and Cellular Oncology, MD Anderson Cancer Center, Houston, TX, USA
| | - Debanu Das
- Accelero Biostructures Inc., San Francisco, CA, USA; XPose Therapeutics Inc., San Carlos, CA, USA.
| |
Collapse
|
3
|
Bepler T, Berger B. Learning the protein language: Evolution, structure, and function. Cell Syst 2021; 12:654-669.e3. [PMID: 34139171 PMCID: PMC8238390 DOI: 10.1016/j.cels.2021.05.017] [Citation(s) in RCA: 212] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 05/20/2021] [Accepted: 05/20/2021] [Indexed: 02/06/2023]
Abstract
Language models have recently emerged as a powerful machine-learning approach for distilling information from massive protein sequence databases. From readily available sequence data alone, these models discover evolutionary, structural, and functional organization across protein space. Using language models, we can encode amino-acid sequences into distributed vector representations that capture their structural and functional properties, as well as evaluate the evolutionary fitness of sequence variants. We discuss recent advances in protein language modeling and their applications to downstream protein property prediction problems. We then consider how these models can be enriched with prior biological knowledge and introduce an approach for encoding protein structural knowledge into the learned representations. The knowledge distilled by these models allows us to improve downstream function prediction through transfer learning. Deep protein language models are revolutionizing protein biology. They suggest new ways to approach protein and therapeutic design. However, further developments are needed to encode strong biological priors into protein language models and to increase their accessibility to the broader community.
Collapse
Affiliation(s)
- Tristan Bepler
- Simons Machine Learning Center, New York Structural Biology Center, New York, NY, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
4
|
Susanti D, Frazier MC, Mukhopadhyay B. A Genetic System for Methanocaldococcus jannaschii: An Evolutionary Deeply Rooted Hyperthermophilic Methanarchaeon. Front Microbiol 2019; 10:1256. [PMID: 31333590 PMCID: PMC6616113 DOI: 10.3389/fmicb.2019.01256] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 05/20/2019] [Indexed: 12/20/2022] Open
Abstract
Phylogenetically deeply rooted methanogens belonging to the genus of Methanocaldococcus living in deep-sea hydrothermal vents derive energy exclusively from hydrogenotrophic methanogenesis, one of the oldest respiratory metabolisms on Earth. These hyperthermophilic, autotrophic archaea synthesize their biomolecules from inorganic substrates and perform high temperature biocatalysis producing methane, a valuable fuel and potent greenhouse gas. The information processing and stress response systems of archaea are highly homologous to those of the eukaryotes. For this broad relevance, Methanocaldococcus jannaschii, the first hyperthermophilic chemolithotrophic organism that was isolated from a deep-sea hydrothermal vent, was also the first archaeon and third organism for which the whole genome sequence was determined. The research that followed uncovered numerous novel information in multiple fields, including those described above. M. jannaschii was found to carry ancient redox control systems, precursors of dissimilatory sulfate reduction enzymes, and a eukaryotic-like protein translocation system. It provided a platform for structural genomics and tools for incorporating unnatural amino acids into proteins. However, the assignments of in vivo relevance to these findings or interrogations of unknown aspects of M. jannaschii through genetic manipulations remained out of reach, as the organism was genetically intractable. This report presents tools and methods that remove this block. It is now possible to knockout or modify a gene in M. jannaschii and genetically fuse a gene with an affinity tag sequence, thereby allowing facile isolation of a protein with M. jannaschii-specific attributes. These tools have helped to genetically validate the role of a novel coenzyme F420-dependent sulfite reductase in conferring resistance to sulfite in M. jannaschii and to demonstrate that the organism possesses a deazaflavin-dependent system for neutralizing oxygen.
Collapse
Affiliation(s)
- Dwi Susanti
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, United States
| | - Mary C Frazier
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, United States
| | - Biswarup Mukhopadhyay
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, United States.,Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States.,Virginia Tech Carilion School of Medicine, Virginia Tech, Blacksburg, VA, United States
| |
Collapse
|
5
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
6
|
Abstract
Proteins are macromolecules that serve a cell’s myriad processes and functions in all living organisms via dynamic interactions with other proteins, small molecules and cellular components. Genetic variations in the protein-encoding regions of the human genome account for >85% of all known Mendelian diseases, and play an influential role in shaping complex polygenic diseases. Proteins also serve as the predominant target class for the design of small molecule drugs to modulate their activity. Knowledge of the shape and form of proteins, by means of their three-dimensional structures, is therefore instrumental to understanding their roles in disease and their potentials for drug development. In this chapter we outline, with the wide readership of non-structural biologists in mind, the various experimental and computational methods available for protein structure determination. We summarize how the wealth of structure information, contributed to a large extent by the technological advances in structure determination to date, serves as a useful tool to decipher the molecular basis of genetic variations for disease characterization and diagnosis, particularly in the emerging era of genomic medicine, and becomes an integral component in the modern day approach towards rational drug development.
Collapse
Affiliation(s)
- Nelson L.S. Tang
- Dept. of Chemical Pathology and Lab. of Genetics of Disease Suscept., The Chinese University of Hong Kong, Hong Kong, People's Republic of China
| | - Terence Poon
- Department of Paediatrics and Proteomics Laboratory, The Chinese University of Hong Kong, Hong Kong, People's Republic of China
| |
Collapse
|
7
|
Das D, Murzin AG, Rawlings ND, Finn RD, Coggill P, Bateman A, Godzik A, Aravind L. Structure and computational analysis of a novel protein with metallopeptidase-like and circularly permuted winged-helix-turn-helix domains reveals a possible role in modified polysaccharide biosynthesis. BMC Bioinformatics 2014; 15:75. [PMID: 24646163 PMCID: PMC4000134 DOI: 10.1186/1471-2105-15-75] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2013] [Accepted: 03/04/2014] [Indexed: 11/10/2022] Open
Abstract
Background CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space. Results The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an α+β domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain. Conclusions Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.
Collapse
Affiliation(s)
- Debanu Das
- Joint Center for Structural Genomics, La Jolla, CA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Production of bulk chemicals via novel metabolic pathways in microorganisms. Biotechnol Adv 2012; 31:925-35. [PMID: 23280013 DOI: 10.1016/j.biotechadv.2012.12.008] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Revised: 12/09/2012] [Accepted: 12/23/2012] [Indexed: 02/05/2023]
Abstract
Metabolic engineering has been playing important roles in developing high performance microorganisms capable of producing various chemicals and materials from renewable biomass in a sustainable manner. Synthetic and systems biology are also contributing significantly to the creation of novel pathways and the whole cell-wide optimization of metabolic performance, respectively. In order to expand the spectrum of chemicals that can be produced biotechnologically, it is necessary to broaden the metabolic capacities of microorganisms. Expanding the metabolic pathways for biosynthesizing the target chemicals requires not only the enumeration of a series of known enzymes, but also the identification of biochemical gaps whose corresponding enzymes might not actually exist in nature; this issue is the focus of this paper. First, pathway prediction tools, effectively combining reactions that lead to the production of a target chemical, are analyzed in terms of logics representing chemical information, and designing and ranking the proposed metabolic pathways. Then, several approaches for potentially filling in the gaps of the novel metabolic pathway are suggested along with relevant examples, including the use of promiscuous enzymes that flexibly utilize different substrates, design of novel enzymes for non-natural reactions, and exploration of hypothetical proteins. Finally, strain optimization by systems metabolic engineering in the context of novel metabolic pathways constructed is briefly described. It is hoped that this review paper will provide logical ways of efficiently utilizing 'big' biological data to design and develop novel metabolic pathways for the production of various bulk chemicals that are currently produced from fossil resources.
Collapse
|
9
|
Cheng C, Shaw N, Zhang X, Zhang M, Ding W, Wang BC, Liu ZJ. Structural view of a non Pfam singleton and crystal packing analysis. PLoS One 2012; 7:e31673. [PMID: 22363703 PMCID: PMC3282739 DOI: 10.1371/journal.pone.0031673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Accepted: 01/11/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Comparative genomic analysis has revealed that in each genome a large number of open reading frames have no homologues in other species. Such singleton genes have attracted the attention of biochemists and structural biologists as a potential untapped source of new folds. Cthe_2751 is a 15.8 kDa singleton from an anaerobic, hyperthermophile Clostridium thermocellum. To gain insights into the architecture of the protein and obtain clues about its function, we decided to solve the structure of Cthe_2751. RESULTS The protein crystallized in 4 different space groups that diffracted X-rays to 2.37 Å (P3(1)21), 2.17 Å (P2(1)2(1)2(1)), 3.01 Å (P4(1)22), and 2.03 Å (C222(1)) resolution, respectively. Crystal packing analysis revealed that the 3-D packing of Cthe_2751 dimers in P4(1)22 and C222(1) is similar with only a rotational difference of 2.69° around the C axes. A new method developed to quantify the differences in packing of dimers in crystals from different space groups corroborated the findings of crystal packing analysis. Cthe_2751 is an all α-helical protein with a central hydrophobic core providing thermal stability via π:cation and π: π interactions. A ProFunc analysis retrieved a very low match with a splicing endonuclease, suggesting a role for the protein in the processing of nucleic acids. CONCLUSIONS Non-Pfam singleton Cthe_2751 folds into a known all α-helical fold. The structure has increased sequence coverage of non-Pfam proteins such that more protein sequences can be amenable to modelling. Our work on crystal packing analysis provides a new method to analyze dimers of the protein crystallized in different space groups. The utility of such an analysis can be expanded to oligomeric structures of other proteins, especially receptors and signaling molecules, many of which are known to function as oligomers.
Collapse
Affiliation(s)
- Chongyun Cheng
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Neil Shaw
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Stem Cell and Molecular Clinical Medicine, Kunming Medical University, Kunming, China
| | - Xuejun Zhang
- Department of Immunology, Tianjin Medical University, Tianjin, China
| | - Min Zhang
- School of Life Sciences, Anhui University, Hefei, Anhui, China
| | - Wei Ding
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Bi-Cheng Wang
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, United States of America
| | - Zhi-Jie Liu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- College of Stem Cell and Molecular Clinical Medicine, Kunming Medical University, Kunming, China
- * E-mail:
| |
Collapse
|
10
|
Pang B, Zhao N, Becchi M, Korkin D, Shyu CR. Accelerating large-scale protein structure alignments with graphics processing units. BMC Res Notes 2012; 5:116. [PMID: 22357132 PMCID: PMC3309952 DOI: 10.1186/1756-0500-5-116] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 02/22/2012] [Indexed: 11/24/2022] Open
Abstract
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Collapse
Affiliation(s)
- Bin Pang
- Informatics Institute, University of Missouri, Columbia, MO, USA
| | | | | | | | | |
Collapse
|
11
|
Sael L, Chitale M, Kihara D. Structure- and sequence-based function prediction for non-homologous proteins. ACTA ACUST UNITED AC 2012; 13:111-23. [PMID: 22270458 DOI: 10.1007/s10969-012-9126-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 01/10/2012] [Indexed: 01/14/2023]
Abstract
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | |
Collapse
|
12
|
|
13
|
Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci 2010; 67:1049-64. [PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 12/01/2009] [Accepted: 12/07/2009] [Indexed: 11/12/2022]
Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Collapse
Affiliation(s)
- Ursula Hinz
- Swiss-Prot Group, Swiss Institute of Bioinformatics, 1 rue Michel Servet, 1211, Geneva, Switzerland.
| |
Collapse
|
14
|
Brocks JJ, Banfield J. Unravelling ancient microbial history with community proteogenomics and lipid geochemistry. Nat Rev Microbiol 2009; 7:601-9. [PMID: 19609261 DOI: 10.1038/nrmicro2167] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Our window into the Earth's ancient microbial past is narrow and obscured by missing data. However, we can glean information about ancient microbial ecosystems using fossil lipids (biomarkers) that are extracted from billion-year-old sedimentary rocks. In this Opinion article, we describe how environmental genomics and related methodologies will give molecular fossil research a boost, by increasing our knowledge about how evolutionary innovations in microorganisms have changed the surface of planet Earth.
Collapse
Affiliation(s)
- Jochen J Brocks
- Research School of Earth Sciences, and Centre for Macroevolution and Macroecology, The Australian National University, Canberra, ACT 0200, Australia.
| | | |
Collapse
|
15
|
Abstract
The large-scale structural biology projects that target human proteins focus predominantly on the catalytic domains of potential therapeutic targets and the domains of human proteins that mediate protein-protein and protein-small-molecule interactions. Their main scientific objective is to elucidate the molecular basis for specificity and selectivity of function within large protein families of therapeutic interest, such as kinases, phosphatases, and proteins involved in epigenetic regulation. Half of the unique human protein structures determined in the past three years derive from these initiatives.
Collapse
Affiliation(s)
- Aled Edwards
- Banting and Best Department of Medical Research, University of Toronto, Ontario M5G 1L6, Canada
| |
Collapse
|
16
|
Abstract
ORFan genes can constitute a large fraction of a bacterial genome, but due to their lack of homologs, their functions have remained largely unexplored. To determine if particular features of ORFan-encoded proteins promote their presence in a genome, we analyzed properties of ORFans that originated over a broad evolutionary timescale. We also compared ORFan genes to another class of acquired genes, heterogeneous occurrence in prokaryotes (HOPs), which have homologs in other bacteria. A total of 54 ORFan and HOP genes selected from different phylogenetic depths in the Escherichia coli lineage were cloned, expressed, purified, and subjected to circular dichroism (CD) spectroscopy. A majority of genes could be expressed, but only 18 yielded sufficient soluble protein for spectral analysis. Of these, half were significantly alpha-helical, three were predominantly beta-sheet, and six were of intermediate/indeterminate structure. Although a higher proportion of HOPs yielded soluble proteins with resolvable secondary structures, ORFans resembled HOPs with regard to most of the other features tested. Overall, we found that those ORFan and HOP genes that have persisted in the E. coli lineage were more likely to encode soluble and folded proteins, more likely to display environmental modulation of their gene expression, and by extrapolation, are more likely to be functional.
Collapse
Affiliation(s)
- Hema Prasad Narra
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ 85721, USA
| | | | | |
Collapse
|
17
|
Abstract
A decade of structural genomics, the large-scale determination of protein structures, has generated a wealth of data and many important lessons for structural biology and for future large-scale projects. These lessons include a confirmation that it is possible to construct large-scale facilities that can determine the structures of a hundred or more proteins per year, that these structures can be of high quality, and that these structures can have an important impact. Technology development has played a critical role in structural genomics, the difficulties at each step of determining a structure of a particular protein can be quantified, and validation of technologies is nearly as important as the technologies themselves. Finally, rapid deposition of data in public databases has increased the impact and usefulness of the data and international cooperation has advanced the field and improved data sharing.
Collapse
|
18
|
Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ever increasing number of protein structures determined by structural genomic projects has spurred much interest in the development of methods for structure-based function prediction. Existing methods can be roughly classified in two groups: some use a comparative approach looking for the presence of structural motifs possibly associated with a known biochemical function. Other methods try to identify functional patches on the surface of a protein using only its physicochemical characteristics. This review will cover both kinds of approaches to structure-based function prediction as well as their use in real-world cases. The main issues and limitations in using protein structure to predict function will also be discussed. These are mainly: the assessment of the statistical significance of structural similarities and the extent to which these methods depend on the accuracy and availability of structural data.
Collapse
Affiliation(s)
- Pier Federico Gherardini
- Department of Biology, Centre for Molecular Bioinformatics, University of Tor Vergata, Rome, Italy.
| | | |
Collapse
|
19
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|