1
|
Structural and Functional Insights into CP2c Transcription Factor Complexes. Int J Mol Sci 2022; 23:ijms23126369. [PMID: 35742810 PMCID: PMC9223585 DOI: 10.3390/ijms23126369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/04/2022] [Accepted: 06/05/2022] [Indexed: 02/04/2023] Open
Abstract
CP2c, also known as TFCP2, α-CP2, LSF, and LBP-1c, is a prototypic member of the transcription factor (TF) CP2 subfamily involved in diverse ubiquitous and tissue/stage-specific cellular processes and in human malignancies including cancer. Despite its importance, many fundamental regulatory mechanisms of CP2c are still unclear. Here, we uncover unprecedented structural and functional aspects of CP2c using DSP crosslinking and Western blot in addition to conventional methods. We found that a monomeric form of a CP2c homotetramer (tCP2c; [C4]) binds to the known CP2c-binding DNA motif (CNRG-N(5~6)-CNRG), whereas a dimeric form of a CP2c, CP2b, and PIAS1 heterohexamer ([C2B2P2]2) binds to the three consecutive CP2c half-sites or two staggered CP2c binding motifs, where the [C4] exerts a pioneering function for recruiting the [C2B2P2]2 to the target. All CP2c exists as a [C4], or as a [C2B2P2]2 or [C2B2P2]4 in the nucleus. Importantly, one additional cytosolic heterotetrameric CP2c and CP2a complex, ([C2A2]), exerts some homeostatic regulation of the nuclear complexes. These data indicate that these findings are essential for the transcriptional regulation of CP2c in cells within relevant timescales, providing clues not only for the transcriptional regulation mechanism by CP2c but also for future therapeutics targeting CP2c function.
Collapse
|
2
|
Computational Prediction of Intrinsically Disordered Proteins Based on Protein Sequences and Convolutional Neural Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:4455604. [PMID: 34992646 PMCID: PMC8727116 DOI: 10.1155/2021/4455604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 12/08/2021] [Indexed: 11/17/2022]
Abstract
Intrinsically disordered proteins (IDPs) possess at least one region that lacks a single stable structure in vivo, which makes them play an important role in a variety of biological functions. We propose a prediction method for IDPs based on convolutional neural networks (CNNs) and feature selection. The combination of sequence and evolutionary properties is used to describe the differences between disordered and ordered regions. Especially, to highlight the correlation between the target residue and adjacent residues, multiple windows are selected to preprocess the protein sequence through the selected properties. The shorter windows reflect the characteristics of the central residue, and the longer windows reflect the characteristics of the surroundings around the central residue. Moreover, to highlight the specificity of sequence and evolutionary properties, they are preprocessed, respectively. After that, the preprocessed properties are combined into feature matrices as the input of the constructed CNN. Our method is training as well as testing based on the DisProt database. The simulation results show that the proposed method can predict IDPs effectively, and the performance is competitive in comparison with IsUnstruct and ESpritz.
Collapse
|
3
|
Bondos SE, Dunker AK, Uversky VN. On the roles of intrinsically disordered proteins and regions in cell communication and signaling. Cell Commun Signal 2021; 19:88. [PMID: 34461937 PMCID: PMC8404256 DOI: 10.1186/s12964-021-00774-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
For proteins, the sequence → structure → function paradigm applies primarily to enzymes, transmembrane proteins, and signaling domains. This paradigm is not universal, but rather, in addition to structured proteins, intrinsically disordered proteins and regions (IDPs and IDRs) also carry out crucial biological functions. For these proteins, the sequence → IDP/IDR ensemble → function paradigm applies primarily to signaling and regulatory proteins and regions. Often, in order to carry out function, IDPs or IDRs cooperatively interact, either intra- or inter-molecularly, with structured proteins or other IDPs or intermolecularly with nucleic acids. In this IDP/IDR thematic collection published in Cell Communication and Signaling, thirteen articles are presented that describe IDP/IDR signaling molecules from a variety of organisms from humans to fruit flies and tardigrades ("water bears") and that describe how these proteins and regions contribute to the function and regulation of cell signaling. Collectively, these papers exhibit the diverse roles of disorder in responding to a wide range of signals as to orchestrate an array of organismal processes. They also show that disorder contributes to signaling in a broad spectrum of species, ranging from micro-organisms to plants and animals.
Collapse
Affiliation(s)
- Sarah E Bondos
- Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, College Station, TX, 77843, USA.
| | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia.
| |
Collapse
|
4
|
Identification of Intrinsically Disordered Protein Regions Based on Deep Neural Network-VGG16. ALGORITHMS 2021. [DOI: 10.3390/a14040107] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The accurate of i identificationntrinsically disordered proteins or protein regions is of great importance, as they are involved in critical biological process and related to various human diseases. In this paper, we develop a deep neural network that is based on the well-known VGG16. Our deep neural network is then trained through using 1450 proteins from the dataset DIS1616 and the trained neural network is tested on the remaining 166 proteins. Our trained neural network is also tested on the blind test set R80 and MXD494 to further demonstrate the performance of our model. The MCC value of our trained deep neural network is 0.5132 on the test set DIS166, 0.5270 on the blind test set R80 and 0.4577 on the blind test set MXD494. All of these MCC values of our trained deep neural network exceed the corresponding values of existing prediction methods.
Collapse
|
5
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
6
|
Goh GKM, Dunker AK, Foster JA, Uversky VN. A Novel Strategy for the Development of Vaccines for SARS-CoV-2 (COVID-19) and Other Viruses Using AI and Viral Shell Disorder. J Proteome Res 2020; 19:4355-4363. [PMID: 33006287 PMCID: PMC7640981 DOI: 10.1021/acs.jproteome.0c00672] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Indexed: 12/29/2022]
Abstract
A model that predicts levels of coronavirus (CoV) respiratory and fecal-oral transmission potentials based on the shell disorder has been built using neural network (artificial intelligence, AI) analysis of the percentage of disorder (PID) in the nucleocapsid, N, and membrane, M, proteins of the inner and outer viral shells, respectively. Using primarily the PID of N, SARS-CoV-2 is grouped as having intermediate levels of both respiratory and fecal-oral transmission potentials. Related studies, using similar methodologies, have found strong positive correlations between virulence and inner shell disorder among numerous viruses, including Nipah, Ebola, and Dengue viruses. There is some evidence that this is also true for SARS-CoV-2 and SARS-CoV, which have N PIDs of 48% and 50%, and case-fatality rates of 0.5-5% and 10.9%, respectively. The underlying relationship between virulence and respiratory potentials has to do with the viral loads of vital organs and body fluids, respectively. Viruses can spread by respiratory means only if the viral loads in saliva and mucus exceed certain minima. Similarly, a patient is likelier to die when the viral load overwhelms vital organs. Greater disorder in inner shell proteins has been known to play important roles in the rapid replication of viruses by enhancing the efficiency pertaining to protein-protein/DNA/RNA/lipid bindings. This paper suggests a novel strategy in attenuating viruses involving comparison of disorder patterns of inner shells (N) of related viruses to identify residues and regions that could be ideal for mutation. The M protein of SARS-CoV-2 has one of the lowest M PID values (6%) in its family, and therefore, this virus has one of the hardest outer shells, which makes it resistant to antimicrobial enzymes in body fluid. While this is likely responsible for its greater contagiousness, the risks of creating an attenuated virus with a more disordered M are discussed.
Collapse
Affiliation(s)
| | - A. Keith Dunker
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - James A. Foster
- Department
of Biological Sciences, University of Idaho, Moscow, Idaho 83844, United States
- Institute
for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho 83844, United States
| | - Vladimir N. Uversky
- Department
of Molecular Medicine, USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33620, United States
- Laboratory
of New Methods in Biology, Institute for Biological Instrumentation
of the Russian Academy of Sciences, Federal
Research Center “Pushchino Scientific Center for Biological
Research of the Russian Academy of Sciences”, Pushchino, Moscow region 142290, Russia
| |
Collapse
|
7
|
Zhou J, Oldfield CJ, Yan W, Shen B, Dunker A. Identification of Intrinsic Disorder in Complexes from the Protein Data Bank. ACS OMEGA 2020; 5:17883-17891. [PMID: 32743159 PMCID: PMC7391252 DOI: 10.1021/acsomega.9b03927] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 03/18/2020] [Indexed: 02/08/2023]
Abstract
![]()
Background:
Intrinsically disordered proteins or regions (IDPs
or IDRs) lack stable structures in solution, yet often fold upon binding
with partners. IDPs or IDRs are highly abundant in all proteomes and
represent a significant modification of sequence → structure
→ function paradigm. The Protein Data Bank (PDB) includes complexes
containing disordered segments bound to globular proteins, but the
molecular mechanisms of such binding interactions remain largely unknown.
Results: In this study, we present the results of various disorder
predictions on a nonredundant set of PDB complexes. In contrast to
their structural appearances, many PDB proteins were predicted to
be disordered when separated from their binding partners. These predicted-to-be-disordered
proteins were observed to form structures depending upon various factors,
including heterogroup binding, protein/DNA/RNA binding, disulfide
bonds, and ion binding. Conclusions: This study collects many examples
of disorder-to-order transition in IDP complex formation, thus revealing
the unusual structure–function relationships of IDPs and providing
an additional support for the newly proposed paradigm of the sequence
→ IDP/IDR ensemble → function.
Collapse
Affiliation(s)
- Jianhong Zhou
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Christopher J. Oldfield
- Computer Science Department, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Wenying Yan
- School of Biology & Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - A.Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
8
|
Yan J, Cheng J, Kurgan L, Uversky VN. Structural and functional analysis of "non-smelly" proteins. Cell Mol Life Sci 2020; 77:2423-2440. [PMID: 31486849 PMCID: PMC11105052 DOI: 10.1007/s00018-019-03292-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 08/21/2019] [Accepted: 08/28/2019] [Indexed: 01/09/2023]
Abstract
Cysteine and aromatic residues are major structure-promoting residues. We assessed the abundance, structural coverage, and functional characteristics of the "non-smelly" proteins, i.e., proteins that do not contain cysteine residues (C-depleted) or cysteine and aromatic residues (CFYWH-depleted), across 817 proteomes from all domains of life. The analysis revealed that although these proteomes contained significant levels of the C-depleted proteins, with prokaryotes being significantly more enriched in such proteins than eukaryotes, the CFYWH-depleted proteins were relatively rare, accounting for about 0.05% of proteomes. Furthermore, CFYWH-depleted proteins were virtually never found in PDB. Depletion in cysteine and in aromatic residues was associated with the substantially increased intrinsic disorder levels across all domains of life. Archaeal and eukaryotic organisms with higher levels of the C-depleted proteins were shown to have higher levels of the intrinsic disorder and lower levels of structural coverage. We also showed that the "non-smelly" proteins typically did not independently fold into monomeric structures, and instead, they fold by interacting with nucleic acids as constituents of the ribosome and nucleosome complexes. They were shown to be involved in translation, transcription, nucleosome assembly, transmembrane transport, and protein folding functions, all of which are known to be associated with the intrinsic disorder. Our data suggested that, in general, structure of monomeric proteins is crucially dependent on the presence of cysteine and aromatic residues.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL, 33612, USA.
- Protein Research Group, Institute for Biological Instrumentation of the Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
| |
Collapse
|
9
|
Ghadermarzi S, Li X, Li M, Kurgan L. Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins. Front Genet 2019; 10:1075. [PMID: 31803227 PMCID: PMC6872670 DOI: 10.3389/fgene.2019.01075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug–protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.
Collapse
Affiliation(s)
- Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
10
|
Abstract
![]()
Since
the proposal of Anfinsen’s thermodynamic hypothesis
in 1963, our understanding of protein folding and dynamics has gained
significant appreciation of its nuance and complexity. Intrinsically
disordered proteins, chameleonic sequences, morpheeins, and metamorphic
proteins have broadened the protein folding paradigm. Here, we discuss
noncanonical protein folding patterns, with an emphasis on metamorphic
proteins, and we review known metamorphic proteins that occur naturally
and that have been engineered in the laboratory. Finally, we discuss
research areas surrounding metamorphic proteins that are primed for
future exploration, including evolution, drug discovery, and the quest
for previously unrecognized metamorphs. As we enter an age where we
are capable of complex bioinformatic searches and de novo protein design, we are primed to search for previously unrecognized
metamorphic proteins and to design our own metamorphs to act as targeted,
switchable drugs; biosensors; and more.
Collapse
Affiliation(s)
- Acacia F. Dishman
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| | - Brian F. Volkman
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| |
Collapse
|
11
|
Meng F, Murray GF, Kurgan L, Donahue HJ. Functional and structural characterization of osteocytic MLO-Y4 cell proteins encoded by genes differentially expressed in response to mechanical signals in vitro. Sci Rep 2018; 8:6716. [PMID: 29712973 PMCID: PMC5928037 DOI: 10.1038/s41598-018-25113-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 04/09/2018] [Indexed: 12/29/2022] Open
Abstract
The anabolic response of bone to mechanical load is partially the result of osteocyte response to fluid flow-induced shear stress. Understanding signaling pathways activated in osteocytes exposed to fluid flow could identify novel signaling pathways involved in the response of bone to mechanical load. Bioinformatics allows for a unique perspective and provides key first steps in understanding these signaling pathways. We examined proteins encoded by genes differentially expressed in response to fluid flow in murine osteocytic MLO-Y4 cells. We considered structural and functional characteristics including putative intrinsic disorder, evolutionary conservation, interconnectedness in protein-protein interaction networks, and cellular localization. Our analysis suggests that proteins encoded by fluid flow activated genes have lower than expected conservation, are depleted in intrinsic disorder, maintain typical levels of connectivity for the murine proteome, and are found in the cytoplasm and extracellular space. Pathway analyses reveal that these proteins are associated with cellular response to stress, chemokine and cytokine activity, enzyme binding, and osteoclast differentiation. The lower than expected disorder of proteins encoded by flow activated genes suggests they are relatively specialized.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Graeme F Murray
- Bone Engineering, Science and Technology (BEST) Laboratory, Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, United States of America.
| | - Henry J Donahue
- Bone Engineering, Science and Technology (BEST) Laboratory, Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, Virginia, United States of America.
| |
Collapse
|
12
|
Gao J, Wu Z, Hu G, Wang K, Song J, Joachimiak A, Kurgan L. Survey of Predictors of Propensity for Protein Production and Crystallization with Application to Predict Resolution of Crystal Structures. Curr Protein Pept Sci 2018; 19:200-210. [PMID: 28933304 PMCID: PMC7001581 DOI: 10.2174/1389203718666170921114437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 09/14/2017] [Accepted: 09/14/2017] [Indexed: 11/22/2022]
Abstract
Selection of proper targets for the X-ray crystallography will benefit biological research community immensely. Several computational models were proposed to predict propensity of successful protein production and diffraction quality crystallization from protein sequences. We reviewed a comprehensive collection of 22 such predictors that were developed in the last decade. We found that almost all of these models are easily accessible as webservers and/or standalone software and we demonstrated that some of them are widely used by the research community. We empirically evaluated and compared the predictive performance of seven representative methods. The analysis suggests that these methods produce quite accurate propensities for the diffraction-quality crystallization. We also summarized results of the first study of the relation between these predictive propensities and the resolution of the crystallizable proteins. We found that the propensities predicted by several methods are significantly higher for proteins that have high resolution structures compared to those with the low resolution structures. Moreover, we tested a new meta-predictor, MetaXXC, which averages the propensities generated by the three most accurate predictors of the diffraction-quality crystallization. MetaXXC generates putative values of resolution that have modest levels of correlation with the experimental resolutions and it offers the lowest mean absolute error when compared to the seven considered methods. We conclude that protein sequences can be used to fairly accurately predict whether their corresponding protein structures can be solved using X-ray crystallography. Moreover, we also ascertain that sequences can be used to reasonably well predict the resolution of the resulting protein crystals.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, Australia
| | - Andrzej Joachimiak
- Midwest Center for Structural Genomics, Argonne, USA
- Structural Biology Center, Biosciences, Argonne National Laboratory, Argonne, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, USA
| |
Collapse
|
13
|
Ereño-Orbea J, Sicard T, Cui H, Carson J, Hermans P, Julien JP. Structural Basis of Enhanced Crystallizability Induced by a Molecular Chaperone for Antibody Antigen-Binding Fragments. J Mol Biol 2017; 430:322-336. [PMID: 29277294 DOI: 10.1016/j.jmb.2017.12.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 11/30/2017] [Accepted: 12/13/2017] [Indexed: 12/20/2022]
Abstract
Monoclonal antibodies constitute one of the largest groups of drugs to treat cancers and immune disorders, and are guiding the design of vaccines against infectious diseases. Fragments antigen-binding (Fabs) have been preferred over monoclonal antibodies for the structural characterization of antibody-antigen complexes due to their relatively low flexibility. Nonetheless, Fabs often remain challenging to crystallize because of the surface characteristics of complementary determining regions and the residual flexibility in the hinge region between the variable and constant domains. Here, we used a variable heavy-chain (VHH) domain specific for the human kappa light chain to assist in the structure determination of three therapeutic Fabs that were recalcitrant to crystallization on their own. We show that this ligand alters the surface properties of the antibody-ligand complex and lowers its aggregation temperature to favor crystallization. The VHH crystallization chaperone also restricts the flexible hinge of Fabs to a narrow range of angles, and so independently of the variable region. Our findings contribute a valuable approach to antibody structure determination and provide biophysical insight into the principles that govern the crystallization of macromolecules.
Collapse
Affiliation(s)
- June Ereño-Orbea
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4
| | - Taylor Sicard
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4; Department of Biochemistry, University of Toronto, Toronto, ON, Canada M5S 1A8
| | - Hong Cui
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4
| | - Jacob Carson
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4
| | - Pim Hermans
- BAC, BV, part of Thermo Fisher Scientific, Leiden, the Netherlands
| | - Jean-Philippe Julien
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, Canada M5G 0A4; Department of Biochemistry, University of Toronto, Toronto, ON, Canada M5S 1A8; Department of Immunology, University of Toronto, Toronto, ON, Canada M5S 1A8.
| |
Collapse
|
14
|
Halliwell LM, Jathoul AP, Bate JP, Worthy HL, Anderson JC, Jones DD, Murray JAH. ΔFlucs: Brighter Photinus pyralis firefly luciferases identified by surveying consecutive single amino acid deletion mutations in a thermostable variant. Biotechnol Bioeng 2017; 115:50-59. [PMID: 28921549 DOI: 10.1002/bit.26451] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/08/2017] [Accepted: 09/11/2017] [Indexed: 11/05/2022]
Abstract
The bright bioluminescence catalyzed by Photinus pyralis firefly luciferase (Fluc) enables a vast array of life science research such as bio imaging in live animals and sensitive in vitro diagnostics. The effectiveness of such applications is improved using engineered enzymes that to date have been constructed using amino acid substitutions. We describe ΔFlucs: consecutive single amino acid deletion mutants within six loop structures of the bright and thermostable ×11 Fluc. Deletion mutations are a promising avenue to explore new sequence and functional space and isolate novel mutant phenotypes. However, this method is often overlooked and to date there have been no surveys of the effects of consecutive single amino acid deletions in Fluc. We constructed a large semi-rational ΔFluc library and isolated significantly brighter enzymes after finding ×11 Fluc activity was largely tolerant to deletions. Targeting an "omega-loop" motif (T352-G360) significantly enhanced activity, altered kinetics, reduced Km for D-luciferin, altered emission colors, and altered substrate specificity for redshifted analog DL-infraluciferin. Experimental and in silico analyses suggested remodeling of the Ω-loop impacts on active site hydrophobicity to increase light yields. This work demonstrates the further potential of deletion mutations, which can generate useful Fluc mutants and broaden the palette of the biomedical and biotechnological bioluminescence enzyme toolbox.
Collapse
Affiliation(s)
| | - Amit P Jathoul
- School of Biosciences, University of Cardiff, Cardiff, UK
| | - Jack P Bate
- School of Biosciences, University of Cardiff, Cardiff, UK
| | | | | | - D Dafydd Jones
- School of Biosciences, University of Cardiff, Cardiff, UK
| | | |
Collapse
|
15
|
Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp. Comput Biol Chem 2016; 67:102-113. [PMID: 28068515 DOI: 10.1016/j.compbiolchem.2016.12.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 06/13/2016] [Accepted: 12/29/2016] [Indexed: 11/22/2022]
Abstract
The nirk gene encoding the copper-containing nitrite reductase (CuNiR), a key catalytic enzyme in the environmental denitrification process that helps to produce nitric oxide from nitrite. The molecular mechanism of denitrification process is definitely complex and in this case a theoretical investigation has been conducted to know the sequence information and amino acid composition of the active site of CuNiR enzyme using various Bioinformatics tools. 10 Fasta formatted sequences were retrieved from the NCBI database and the domain and disordered regions identification and phylogenetic analyses were done on these sequences. The comparative modeling of protein was performed through Modeller 9v14 program and visualized by PyMOL tools. Validated protein models were deposited in the Protein Model Database (PMDB) (PMDB id: PM0080150 to PM0080159). Active sites of nirk encoding CuNiR enzyme were identified by Castp server. The PROCHECK showed significant scores for four protein models in the most favored regions of the Ramachandran plot. Active sites and cavities prediction exhibited that the amino acid, namely Glycine, Alanine, Histidine, Aspartic acid, Glutamic acid, Threonine, and Glutamine were common in four predicted protein models. The present in silico study anticipates that active site analyses result will pave the way for further research on the complex denitrification mechanism of the selected species in the experimental laboratory.
Collapse
|
16
|
Rahman KS, Chowdhury EU, Sachse K, Kaltenboeck B. Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction. J Biol Chem 2016; 291:14585-99. [PMID: 27189949 PMCID: PMC4938180 DOI: 10.1074/jbc.m116.729020] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Revised: 05/03/2016] [Indexed: 11/06/2022] Open
Abstract
X-ray crystallography has shown that an antibody paratope typically binds 15-22 amino acids (aa) of an epitope, of which 2-5 randomly distributed amino acids contribute most of the binding energy. In contrast, researchers typically choose for B-cell epitope mapping short peptide antigens in antibody binding assays. Furthermore, short 6-11-aa epitopes, and in particular non-epitopes, are over-represented in published B-cell epitope datasets that are commonly used for development of B-cell epitope prediction approaches from protein antigen sequences. We hypothesized that such suboptimal length peptides result in weak antibody binding and cause false-negative results. We tested the influence of peptide antigen length on antibody binding by analyzing data on more than 900 peptides used for B-cell epitope mapping of immunodominant proteins of Chlamydia spp. We demonstrate that short 7-12-aa peptides of B-cell epitopes bind antibodies poorly; thus, epitope mapping with short peptide antigens falsely classifies many B-cell epitopes as non-epitopes. We also show in published datasets of confirmed epitopes and non-epitopes a direct correlation between length of peptide antigens and antibody binding. Elimination of short, ≤11-aa epitope/non-epitope sequences improved datasets for evaluation of in silico B-cell epitope prediction. Achieving up to 86% accuracy, protein disorder tendency is the best indicator of B-cell epitope regions for chlamydial and published datasets. For B-cell epitope prediction, the most effective approach is plotting disorder of protein sequences with the IUPred-L scale, followed by antibody reactivity testing of 16-30-aa peptides from peak regions. This strategy overcomes the well known inaccuracy of in silico B-cell epitope prediction from primary protein sequences.
Collapse
Affiliation(s)
- Kh Shamsur Rahman
- From the Department of Pathobiology, Auburn University, Auburn, Alabama 36849 and
| | | | - Konrad Sachse
- the Federal Institute for Animal Health, D-07743 Jena, Germany
| | - Bernhard Kaltenboeck
- From the Department of Pathobiology, Auburn University, Auburn, Alabama 36849 and
| |
Collapse
|
17
|
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) do not adopt a well-defined folded structure under physiological conditions. Instead, these proteins exist as heterogeneous and dynamical conformational ensembles. IDPs are widespread in eukaryotic proteomes and are involved in fundamental biological processes, mostly related to regulation and signaling. At the same time, disordered regions often pose significant challenges to the structure determination process, which generally requires highly homogeneous proteins samples. In this book chapter, we provide a brief overview of protein disorder, describe various bioinformatics resources that have been developed in recent years for their characterization, and give a general outline of their applications in various types of structural genomics projects. Traditionally, disordered segments were filtered out to optimize the yield of structure determination pipelines. However, it is becoming increasingly clear that the structural characterization of proteins cannot be complete without the incorporation of intrinsically disordered regions.
Collapse
Affiliation(s)
- Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | |
Collapse
|
18
|
Dunker AK, Oldfield CJ. Back to the Future: Nuclear Magnetic Resonance and Bioinformatics Studies on Intrinsically Disordered Proteins. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 870:1-34. [PMID: 26387098 DOI: 10.1007/978-3-319-20164-1_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
From the 1970s to the present, regions of missing electron density in protein structures determined by X-ray diffraction and the characterization of the functions of these regions have suggested that not all protein regions depend on prior 3D structure to carry out function. Motivated by these observations, in early 1996 we began to use bioinformatics approaches to study these intrinsically disordered proteins (IDPs) and IDP regions. At just about the same time, several laboratory groups began to study a collection of IDPs and IDP regions using nuclear magnetic resonance. The temporal overlap of the bioinformatics and NMR studies played a significant role in the development of our understanding of IDPs. Here the goal is to recount some of this history and to project from this experience possible directions for future work.
Collapse
Affiliation(s)
- A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 46202, Indianapolis, IN, USA.
| | - Christopher J Oldfield
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 46202, Indianapolis, IN, USA.
| |
Collapse
|
19
|
Mizianty MJ, Fan X, Yan J, Chalmers E, Woloschuk C, Joachimiak A, Kurgan L. Covering complete proteomes with X-ray structures: a current snapshot. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2014; 70:2781-93. [PMID: 25372670 PMCID: PMC4220968 DOI: 10.1107/s1399004714019427] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 08/27/2014] [Indexed: 12/23/2022]
Abstract
Structural genomics programs have developed and applied structure-determination pipelines to a wide range of protein targets, facilitating the visualization of macromolecular interactions and the understanding of their molecular and biochemical functions. The fundamental question of whether three-dimensional structures of all proteins and all functional annotations can be determined using X-ray crystallography is investigated. A first-of-its-kind large-scale analysis of crystallization propensity for all proteins encoded in 1953 fully sequenced genomes was performed. It is shown that current X-ray crystallographic knowhow combined with homology modeling can provide structures for 25% of modeling families (protein clusters for which structural models can be obtained through homology modeling), with at least one structural model produced for each Gene Ontology functional annotation. The coverage varies between superkingdoms, with 19% for eukaryotes, 35% for bacteria and 49% for archaea, and with those of viruses following the coverage values of their hosts. It is shown that the crystallization propensities of proteomes from the taxonomic superkingdoms are distinct. The use of knowledge-based target selection is shown to substantially increase the ability to produce X-ray structures. It is demonstrated that the human proteome has one of the highest attainable coverage values among eukaryotes, and GPCR membrane proteins suitable for X-ray structure determination were determined.
Collapse
Affiliation(s)
- Marcin J. Mizianty
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Xiao Fan
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Jing Yan
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Eric Chalmers
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Christopher Woloschuk
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Andrzej Joachimiak
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Lukasz Kurgan
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| |
Collapse
|
20
|
Intrinsically disordered proteins undergo and assist folding transitions in the proteome. Arch Biochem Biophys 2012; 531:80-9. [PMID: 23142500 DOI: 10.1016/j.abb.2012.09.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 09/17/2012] [Accepted: 09/20/2012] [Indexed: 11/20/2022]
Abstract
The common notion in the protein world holds that proteins are synthesized as a linear polypeptide chain, followed by folding into a unique, functional 3D-structure. As outlined in many articles of this volume, this is in fact the case for a great proportion of the proteome. Many proteins and protein domains, however, are intrinsically disordered (IDPs), i.e., they cannot fold on their own, but often undergo a folding transition in the presence of a binding partner. This binding-induced folding process shows strong conceptual parallels with the folding of globular proteins, in a sense that it can proceed via two routes, either induction of the folded conformation from an initial random state or selection of a pre-formed state already present in the ensemble. In addition, we show that IDPs not only undergo folding themselves, they also assist the folding process of other proteins as chaperones, and even contribute to the quality control processes of the cell, in which irreparably misfolded proteins are recognized and tagged for proteasomal degradation. These various mechanisms suggest that structural disorder, in a biological context, is linked with protein folding in several ways, in which both the IDP and its partner may undergo reciprocal structural transitions.
Collapse
|
21
|
Abstract
Background Intrinsically disordered proteins (IDPs) and regions (IDRs) perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving per-residue accuracies over 80%. In a genome-wide study of intrinsic disorder in human genome we observed a big difference in predicted disorder content between confirmed and putative human proteins. We investigated a hypothesis that this discrepancy is not correct, and that it is due to incorrectly annotated parts of the putative protein sequences that exhibit some similarities to confirmed IDRs, which lead to high predicted disorder content. Methods To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. We developed a procedure to create synthetic peptide sequences by translation of non-coding regions of genomic sequences and translation of coding regions with incorrect codon alignment. Results Application of the developed predictor to putative human protein sequences showed that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. This partially, albeit not completely, explains the observed discrepancy in predicted disorder content between confirmed and putative human proteins. Conclusions Our findings provide the first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates may be biased.
Collapse
Affiliation(s)
- Uros Midic
- Fels Institute for Cancer Research & Molecular Biology, Temple University School of Medicine, 3307 N, Broad St, Philadelphia, PA 19140, USA.
| | | |
Collapse
|
22
|
Das RK, Mao AH, Pappu RV. Unmasking Functional Motifs Within Disordered Regions of Proteins. Sci Signal 2012; 5:pe17. [DOI: 10.1126/scisignal.2003091] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
23
|
Montelione GT. The Protein Structure Initiative: achievements and visions for the future. F1000 BIOLOGY REPORTS 2012; 4:7. [PMID: 22500193 PMCID: PMC3318194 DOI: 10.3410/b4-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The Protein Structure Initiative (PSI) was established in 2000 by the National Institutes of General Medical Sciences with the long-term goal of providing 3D (three-dimensional) structural information for most proteins in nature. As advances in genomic sequencing, bioinformatics, homology modelling, and methods for rapid determination of 3D structures of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) converged, it was proposed that our understanding of the biology of protein structure and evolution could be greatly enabled by ‘genomic-scale’ protein structure determination. Over the past 12 years, the PSI has evolved from a testing bed for new methods of sample and structure production to a core component of a wide range of biology programs.
Collapse
Affiliation(s)
- Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| |
Collapse
|
24
|
Kumar S. Homology modeling and consensus protein disorder prediction of human filamin. Bioinformation 2011; 6:366-9. [PMID: 21904422 PMCID: PMC3163912 DOI: 10.6026/97320630006366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 07/14/2011] [Indexed: 11/23/2022] Open
Abstract
Filamins are dimeric actin-binding proteins participating in the organization of the actin-based cytoskeleton. Their modular domain organization is made up of an N-terminal actin-binding domain composed of two CH domains followed by flexible rod regions that consist of 24 Ig-like domains. Homology modeling was used to model human filamin using Modeller 9v5. The resulting model assessed by Verify 3D and PROCHECK showed that the final model is reliable. The conformational disorder prediction of human filamin residues were also mapped on the validated structure of human filamin. Prediction of protein disorder in filamin structures will help structural biologists to find suitable targets to be analyzed and for understanding protein function.
Collapse
Affiliation(s)
- Suresh Kumar
- Department of Bioinformatics, School of Biotechnology and Health Sciences, Karunya University, Coimbatore - 641114, Tamil Nadu, India
| |
Collapse
|
25
|
Protein disorder--a breakthrough invention of evolution? Curr Opin Struct Biol 2011; 21:412-8. [PMID: 21514145 DOI: 10.1016/j.sbi.2011.03.014] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/29/2011] [Accepted: 03/29/2011] [Indexed: 11/21/2022]
Abstract
As an operational definition, we refer to regions in proteins that do not adopt regular three-dimensional structures in isolation, as disordered regions. An antipode to disorder would be 'well-structured' rather than 'ordered'. Here, we argue for the following three hypotheses. Firstly, it is more useful to picture disorder as a distinct phenomenon in structural biology than as an extreme example of protein flexibility. Secondly, there are many very different flavors of protein disorder, nevertheless, it seems advantageous to portray the universe of all possible proteins in terms of two main types: well-structured, disordered. There might be a third type 'other' but we have so far no positive evidence for this. Thirdly, nature uses protein disorder as a tool to adapt to different environments. Protein disorder is evolutionarily conserved and this maintenance of disorder is highly nontrivial. Increasingly integrating protein disorder into the toolbox of a living cell was a crucial step in the evolution from simple bacteria to complex eukaryotes. We need new advanced computational methods to study this new milestone in the advance of protein biology.
Collapse
|
26
|
Adkins NL, Georgel PT. MeCP2: structure and functionThis paper is one of a selection of papers published in a Special Issue entitled 31st Annual International Asilomar Chromatin and Chromosomes Conference, and has undergone the Journal’s usual peer review process. Biochem Cell Biol 2011; 89:1-11. [DOI: 10.1139/o10-112] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Despite a vast body of literature linking chromatin structure to regulation of gene expression, the role of architectural proteins in higher order chromatin transitions required for transcription activation and repression has remained an under-studied field. To demonstrate the current knowledge of the role of such proteins, we have focused our attention on the methylated DNA binding and chromatin-associated protein MeCP2. Structural studies using chromatin assembled in vitro have revealed that MeCP2 can associate with nucleosomes in an N-terminus dependent manner and efficiently condense nucleosome arrays. The present review attempts to match MeCP2 structural domains, or lack thereof, and specific chromatin features needed for the proper recruitment of MeCP2 to its multiple functions as either activator or repressor. We specifically focused on MeCP2’s role in Rett syndrome, a neurological disorder associated with specific MeCP2 mutations.
Collapse
Affiliation(s)
- Nicholas L. Adkins
- Byrd Biotechnology Building, Department of Biological Sciences, Marshall University, 1 John Marshall Drive, Huntington, WV 25755, USA
| | - Philippe T. Georgel
- Byrd Biotechnology Building, Department of Biological Sciences, Marshall University, 1 John Marshall Drive, Huntington, WV 25755, USA
| |
Collapse
|
27
|
Graebsch A, Roche S, Kostrewa D, Söding J, Niessing D. Of bits and bugs--on the use of bioinformatics and a bacterial crystal structure to solve a eukaryotic repeat-protein structure. PLoS One 2010; 5:e13402. [PMID: 20976240 PMCID: PMC2954813 DOI: 10.1371/journal.pone.0013402] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Accepted: 09/24/2010] [Indexed: 11/19/2022] Open
Abstract
Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins.
Collapse
Affiliation(s)
- Almut Graebsch
- Institute of Structural Biology, Helmholtz Zentrum München, Munich, Germany
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Stéphane Roche
- Institute of Structural Biology, Helmholtz Zentrum München, Munich, Germany
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Dirk Kostrewa
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Johannes Söding
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Dierk Niessing
- Institute of Structural Biology, Helmholtz Zentrum München, Munich, Germany
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| |
Collapse
|
28
|
Babnigg G, Joachimiak A. Predicting protein crystallization propensity from protein sequence. ACTA ACUST UNITED AC 2010; 11:71-80. [PMID: 20177794 DOI: 10.1007/s10969-010-9080-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 02/05/2010] [Indexed: 10/19/2022]
Abstract
The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein's propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for approximately 720 unique proteins that resulted in X-ray structures. The correlation of the protein's iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein's propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor .
Collapse
Affiliation(s)
- György Babnigg
- Midwest Center for Structural Genomics, Biosciences Division, Argonne National Laboratory, 9700 S Cass Ave., Argonne, IL 60439, USA.
| | | |
Collapse
|
29
|
Busche AEL, Aranko AS, Talebzadeh-Farooji M, Bernhard F, Dötsch V, Iwaï H. Segmental isotopic labeling of a central domain in a multidomain protein by protein trans-splicing using only one robust DnaE intein. Angew Chem Int Ed Engl 2009; 48:6128-31. [PMID: 19591176 DOI: 10.1002/anie.200901488] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Alena E L Busche
- Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance and Cluster of Excellence Frankfurt Macromolecular Complexes (CEF), University of Frankfurt, 60438 Frankfurt, Germany
| | | | | | | | | | | |
Collapse
|
30
|
Busche A, Aranko A, Talebzadeh-Farooji M, Bernhard F, Dötsch V, Iwaï H. Segmental Isotopic Labeling of a Central Domain in a Multidomain Protein by ProteinTrans-Splicing Using Only One Robust DnaE Intein. Angew Chem Int Ed Engl 2009. [DOI: 10.1002/ange.200901488] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
31
|
Midic U, Oldfield CJ, Dunker AK, Obradovic Z, Uversky VN. Protein disorder in the human diseasome: unfoldomics of human genetic diseases. BMC Genomics 2009; 10 Suppl 1:S12. [PMID: 19594871 PMCID: PMC2709255 DOI: 10.1186/1471-2164-10-s1-s12] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Intrinsically disordered proteins lack stable structure under physiological conditions, yet carry out many crucial biological functions, especially functions associated with regulation, recognition, signaling and control. Recently, human genetic diseases and related genes were organized into a bipartite graph (Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci U S A 104: 8685-8690). This diseasome network revealed several significant features such as the common genetic origin of many diseases. METHODS AND FINDINGS We analyzed the abundance of intrinsic disorder in these diseasome network proteins by means of several prediction algorithms, and we analyzed the functional repertoires of these proteins based on prior studies relating disorder to function. Our analyses revealed that (i) Intrinsic disorder is common in proteins associated with many human genetic diseases; (ii) Different disease classes vary in the IDP contents of their associated proteins; (iii) Molecular recognition features, which are relatively short loosely structured protein regions within mostly disordered sequences and which gain structure upon binding to partners, are common in the diseasome, and their abundance correlates with the intrinsic disorder level; (iv) Some disease classes have a significant fraction of genes affected by alternative splicing, and the alternatively spliced regions in the corresponding proteins are predicted to be highly disordered; and (v) Correlations were found among the various diseasome graph-related properties and intrinsic disorder. CONCLUSION These observations provide the basis for the construction of the human-genetic-disease-associated unfoldome.
Collapse
Affiliation(s)
- Uros Midic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Christopher J Oldfield
- Center for Computational Biology and Bioinformatics, Indiana University School of Informatics, Indianapolis, IN 46202, USA
| | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Zoran Obradovic
- Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA
| | - Vladimir N Uversky
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Institute for Intrinsically Disordered Protein Research, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
| |
Collapse
|
32
|
Price WN, Chen Y, Handelman SK, Neely H, Manor P, Karlin R, Nair R, Liu J, Baran M, Everett J, Tong SN, Forouhar F, Swaminathan SS, Acton T, Xiao R, Luft JR, Lauricella A, DeTitta GT, Rost B, Montelione GT, Hunt JF. Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol 2009; 27:51-7. [PMID: 19079241 DOI: 10.1038/nbt.1514] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Crystallization is the most serious bottleneck in high-throughput protein-structure determination by diffraction methods. We have used data mining of the large-scale experimental results of the Northeast Structural Genomics Consortium and experimental folding studies to characterize the biophysical properties that control protein crystallization. This analysis leads to the conclusion that crystallization propensity depends primarily on the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. We identify specific sequence features that correlate with crystallization propensity and that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the amino acid-sequence properties of human versus eubacterial proteins, which likely reflect differences in biophysical properties, including crystallization propensity. Our thermodynamic measurements do not generally support previous claims regarding correlations between sequence properties and protein stability.
Collapse
Affiliation(s)
- W Nicholson Price
- Northeast Structural Genomics Consortium, Columbia University, New York, New York 10027, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Xue B, Oldfield CJ, Dunker AK, Uversky VN. CDF it all: consensus prediction of intrinsically disordered proteins based on various cumulative distribution functions. FEBS Lett 2009; 583:1469-74. [PMID: 19351533 DOI: 10.1016/j.febslet.2009.03.070] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2009] [Revised: 03/18/2009] [Accepted: 03/27/2009] [Indexed: 11/29/2022]
Abstract
Many biologically active proteins are intrinsically disordered. A reasonable understanding of the disorder status of these proteins may be beneficial for better understanding of their structures and functions. The disorder contents of disordered proteins vary dramatically, with two extremes being fully ordered and fully disordered proteins. Often, it is necessary to perform a binary classification and classify a whole protein as ordered or disordered. Here, an improved error estimation technique was applied to develop the cumulative distribution function (CDF) algorithms for several established disorder predictors. A consensus binary predictor, based on the artificial neural networks, NN-CDF, was developed by using output of the individual CDFs. The consensus method outperforms the individual predictors by 4-5% in the averaged accuracy.
Collapse
Affiliation(s)
- Bin Xue
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 410 W. 10th Street, HS 5009, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
34
|
Markley JL, Aceti DJ, Bingman CA, Fox BG, Frederick RO, Makino SI, Nichols KW, Phillips GN, Primm JG, Sahu SC, Vojtik FC, Volkman BF, Wrobel RL, Zolnai Z. The Center for Eukaryotic Structural Genomics. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2009; 10:165-79. [PMID: 19130299 PMCID: PMC2705709 DOI: 10.1007/s10969-008-9057-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Accepted: 12/12/2008] [Indexed: 10/29/2022]
Abstract
The Center for Eukaryotic Structural Genomics (CESG) is a "specialized" or "technology development" center supported by the Protein Structure Initiative (PSI). CESG's mission is to develop improved methods for the high-throughput solution of structures from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three years of PSI-2, CESG selected targets representing 601 proteins from Homo sapiens, 33 from mouse, 10 from rat, 139 from Galdieria sulphuraria, 35 from Arabidopsis thaliana, 96 from Cyanidioschyzon merolae, 80 from Plasmodium falciparum, 24 from yeast, and about 25 from other eukaryotes. Notably, 30% of all structures of human proteins solved by the PSI Centers were determined at CESG. Whereas eukaryotic proteins generally are considered to be much more challenging targets than prokaryotic proteins, the technology now in place at CESG yields success rates that are comparable to those of the large production centers that work primarily on prokaryotic proteins. We describe here the technological innovations that underlie CESG's platforms for bioinformatics and laboratory information management, target selection, protein production, and structure determination by X-ray crystallography or NMR spectroscopy.
Collapse
Affiliation(s)
- John L Markley
- Center for Eukaryotic Structural Genomics, Biochemistry Department, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Han P, Zhang X, Norton RS, Feng ZP. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinformatics 2009; 10:8. [PMID: 19128505 PMCID: PMC2637845 DOI: 10.1186/1471-2105-10-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2008] [Accepted: 01/07/2009] [Indexed: 12/02/2022] Open
Abstract
Background Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. Results A new algorithm, IUPforest-L, for predicting long disordered regions using the random forest learning model is proposed in this paper. IUPforest-L is based on the Moreau-Broto auto-correlation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences. In 10-fold cross validation tests, IUPforest-L can achieve an area of 89.5% under the receiver operating characteristic (ROC) curve. Compared with existing disorder predictors, IUPforest-L has high prediction accuracy and is efficient for predicting long disordered regions in large-scale proteomes. Conclusion The random forest model based on the auto-correlation functions of the AAIs within a protein fragment and other physicochemical features could effectively detect long disordered regions in proteins. A new predictor, IUPforest-L, was developed to batch predict long disordered regions in proteins, and the server can be accessed from
Collapse
Affiliation(s)
- Pengfei Han
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia.
| | | | | | | |
Collapse
|
36
|
Abstract
Background We have previously shown that using multiple prediction methods improves the accuracy of disorder predictions. It is, however, a time-consuming procedure, since individual outputs of multiple predictions have to be retrieved, compared to each other and a comprehensive view of the results can only be obtained through a manual, fastidious, non-automated procedure. We herein describe a new web metaserver, MeDor, which allows fast, simultaneous analysis of a query sequence by multiple predictors and provides a graphical interface with a unified view of the outputs. Results MeDor was developed in Java and is freely available and downloadable at: . Presently, MeDor provides a HCA plot and runs a secondary structure prediction, a prediction of signal peptides and transmembrane regions and a set of disorder predictions. MeDor also enables the user to customize the output and to retrieve the sequence of specific regions of interest. Conclusion As MeDor outputs can be printed, saved, commented and modified further on, this offers a dynamic support for the analysis of protein sequences that is instrumental for delineating domains amenable to structural and functional studies.
Collapse
Affiliation(s)
- Philippe Lieutaud
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, 163 Avenue de Luminy, Case 932, 13288 Marseille Cedex 09, France.
| | | | | |
Collapse
|
37
|
Haquin S, Oeuillet E, Pajon A, Harris M, Jones AT, van Tilbeurgh H, Markley JL, Zolnai Z, Poupon A. Data management in structural genomics: an overview. Methods Mol Biol 2008; 426:49-79. [PMID: 18542857 DOI: 10.1007/978-1-60327-058-8_4] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Data management has been identified as a crucial issue in all large-scale experimental projects. In this type of project, many different persons manipulate multiple objects in different locations; thus, unless complete and accurate records are maintained, it is extremely difficult to understand exactly what has been done, when it was done, who did it, and what exact protocol was used. All of this information is essential for use in publications, reusing successful protocols, determining why a target has failed, and validating and optimizing protocols. Although data management solutions have been in place for certain focused activities (e.g., genome sequencing and microarray experiments), they are just emerging for more widespread projects, such as structural genomics, metabolomics, and systems biology as a whole. The complexity of experimental procedures, and the diversity and high rate of development of protocols used in a single center, or across various centers, have important consequences for the design of information management systems. Because procedures are carried out by both machines and hand, the system must be capable of handling data entry both from robotic systems and by means of a user-friendly interface. The information management system needs to be flexible so it can handle changes in existing protocols or newly added protocols. Because no commercial information management systems have had the needed features, most structural genomics groups have developed their own solutions. This chapter discusses the advantages of using a LIMS (laboratory information management system), for day-to-day management of structural genomics projects, and also for data mining. This chapter reviews different solutions currently in place or under development with emphasis on three systems developed by the authors: Xtrack, Sesame (developed at the Center for Eukaryotic Structural Genomics under the US Protein Structural Genomics Initiative), and HalX (developed at the Yeast Structural Genomics Laboratory, in collaboration with the European SPINE project).
Collapse
Affiliation(s)
- Sabrina Haquin
- Yeast Structural Genomics, IBBMC, Université Paris-Sud, Orsay, France
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Schlessinger A, Liu J, Rost B. Natively unstructured loops differ from other loops. PLoS Comput Biol 2008; 3:e140. [PMID: 17658943 PMCID: PMC1924875 DOI: 10.1371/journal.pcbi.0030140] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 06/05/2007] [Indexed: 11/24/2022] Open
Abstract
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks. The details of protein structures are important for function. Regions that do not adopt any regular structure in isolation (natively unstructured or disordered regions) initially appeared as a curious exception to this structure–function paradigm. It has become increasingly clear that unstructured regions are fundamental to many roles and that they are particularly important for multicellular organisms. Structural biology is just beginning to apprehend the stunning diversity of these roles. Here, we focused on unstructured regions dominated by a particular type of loop, namely the natively unstructured one. We developed a method that succeeded in the distinction between well-structured and natively unstructured loops. For the development, we did not use any experimental data for unstructured regions; when tested on experimental data, the method performed surprisingly well. Due to its different premises, the method captured very different aspects of unstructured regions than other methods that we tested. We applied the new method to two different problems. The first was the identification of proteins that may be difficult targets for structure determination. The second was the identification of worm proteins that have many interaction partners (more than seven) and unstructured regions. Surprisingly, we found unstructured regions of the loopy type in more than 50% of all the promiscuous worm proteins.
Collapse
Affiliation(s)
- Avner Schlessinger
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | | | |
Collapse
|
39
|
Bulashevska A, Eils R. Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered. J Theor Biol 2008; 254:799-803. [PMID: 18611404 DOI: 10.1016/j.jtbi.2008.05.040] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Revised: 05/19/2008] [Accepted: 05/19/2008] [Indexed: 10/21/2022]
Abstract
Intrinsically disordered proteins (IDPs) lack a well-defined three-dimensional structure under physiological conditions. Intrinsic disorder is a common phenomenon, particularly in multicellular eukaryotes, and is responsible for important protein functions including regulation and signaling. Many disease-related proteins are likely to be intrinsically disordered or to have disordered regions. In this paper, a new predictor model based on the Bayesian classification methodology is introduced to predict for a given protein or protein region if it is intrinsically disordered or ordered using only its primary sequence. The method allows to incorporate length-dependent amino acid compositional differences of disordered regions by including separate statistical representations for short, middle and long disordered regions. The predictor was trained on the constructed data set of protein regions with known structural properties. In a Jack-knife test, the predictor achieved the sensitivity of 89.2% for disordered and 81.4% for ordered regions. Our method outperformed several reported predictors when evaluated on the previously published data set of Prilusky et al. [2005. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics 21 (16), 3435-3438]. Further strength of our approach is the ease of implementation.
Collapse
Affiliation(s)
- Alla Bulashevska
- Department of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany.
| | | |
Collapse
|
40
|
Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners. BMC Genomics 2008; 9 Suppl 1:S1. [PMID: 18366598 PMCID: PMC2386051 DOI: 10.1186/1471-2164-9-s1-s1] [Citation(s) in RCA: 438] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Proteins are involved in many interactions with other proteins leading to networks that regulate and control a wide variety of physiological processes. Some of these proteins, called hub proteins or hubs, bind to many different protein partners. Protein intrinsic disorder, via diversity arising from structural plasticity or flexibility, provide a means for hubs to associate with many partners (Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN: Flexible Nets: The roles of intrinsic disorder in protein interaction networks. FEBS J 2005, 272:5129-5148). Results Here we present a detailed examination of two divergent examples: 1) p53, which uses different disordered regions to bind to different partners and which also has several individual disordered regions that each bind to multiple partners, and 2) 14-3-3, which is a structured protein that associates with many different intrinsically disordered partners. For both examples, three-dimensional structures of multiple complexes reveal that the flexibility and plasticity of intrinsically disordered protein regions as well as induced-fit changes in the structured regions are both important for binding diversity. Conclusions These data support the conjecture that hub proteins often utilize intrinsic disorder to bind to multiple partners and provide detailed information about induced fit in structured regions.
Collapse
|
41
|
Huang YJ, Hang D, Lu LJ, Tong L, Gerstein MB, Montelione GT. Targeting the human cancer pathway protein interaction network by structural genomics. Mol Cell Proteomics 2008; 7:2048-60. [PMID: 18487680 DOI: 10.1074/mcp.m700550-mcp200] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Structural genomics provides an important approach for characterizing and understanding systems biology. As a step toward better integrating protein three-dimensional (3D) structural information in cancer systems biology, we have constructed a Human Cancer Pathway Protein Interaction Network (HCPIN) by analysis of several classical cancer-associated signaling pathways and their physical protein-protein interactions. Many well known cancer-associated proteins play central roles as "hubs" or "bottlenecks" in the HCPIN. At least half of HCPIN proteins are either directly associated with or interact with multiple signaling pathways. Although some 45% of residues in these proteins are in sequence segments that meet criteria sufficient for approximate homology modeling (Basic Local Alignment Search Tool (BLAST) E-value <10(-6)), only approximately 20% of residues in these proteins are structurally covered using high accuracy homology modeling criteria (i.e. BLAST E-value <10(-6) and at least 80% sequence identity) or by actual experimental structures. The HCPIN Website provides a comprehensive description of this biomedically important multipathway network together with experimental and homology models of HCPIN proteins useful for cancer biology research. To complement and enrich cancer systems biology, the Northeast Structural Genomics Consortium is targeting >1000 human proteins and protein domains from the HCPIN for sample production and 3D structure determination. The long range goal of this effort is to provide a comprehensive 3D structure-function database for human cancer-associated proteins and protein complexes in the context of their interaction networks. The network-based target selection (BioNet) approach described here is an example of a general strategy for targeting co-functioning proteins by structural genomics projects.
Collapse
Affiliation(s)
- Yuanpeng Janet Huang
- Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey 08854, USA
| | | | | | | | | | | |
Collapse
|
42
|
Ishida T, Kinoshita K. Prediction of disordered regions in proteins based on the meta approach. Bioinformatics 2008; 24:1344-8. [PMID: 18426805 DOI: 10.1093/bioinformatics/btn195] [Citation(s) in RCA: 212] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Intrinsically disordered regions in proteins have no unique stable structures without their partner molecules, thus these regions sometimes prevent high-quality structure determination. Furthermore, proteins with disordered regions are often involved in important biological processes, and the disordered regions are considered to play important roles in molecular interactions. Therefore, identifying disordered regions is important to obtain high-resolution structural information and to understand the functional aspects of these proteins. RESULTS We developed a new prediction method for disordered regions in proteins based on the meta approach and implemented a web-server for this prediction method named 'metaPrDOS'. The method predicts the disorder tendency of each residue using support vector machines from the prediction results of the seven independent predictors. Evaluation of the meta approach was performed using the CASP7 prediction targets to avoid an overestimation due to the inclusion of proteins used in the training set of some component predictors. As a result, the meta approach achieved higher prediction accuracy than all methods participating in CASP7.
Collapse
Affiliation(s)
- Takashi Ishida
- Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
| | | |
Collapse
|
43
|
Bannen RM, Bingman CA, Phillips GN. Effect of low-complexity regions on protein structure determination. ACTA ACUST UNITED AC 2008; 8:217-26. [PMID: 18302007 DOI: 10.1007/s10969-008-9039-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 02/05/2008] [Indexed: 11/24/2022]
Abstract
It has been previously shown that protein sequences containing a quasi-repetitive assortment of amino acids are common in genomes and databases such as Swiss-Prot but are under-represented in the structure-based Protein Data Bank (PDB). Structural genomics groups have been using the absence of these "low-complexity" sequences for several years as a way to select proteins that have a good chance of successful structure determination. In this study, we examine the data deposited in the PDB as well as the available data from structural genomics groups in TargetDB and PepcDB to reveal interesting trends that could be taken into consideration when using low-complexity sequences as part of the target selection process.
Collapse
Affiliation(s)
- Ryan M Bannen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53711, USA
| | | | | |
Collapse
|
44
|
Abstract
Intrinsically unstructured regions in proteins have been associated with numerous important biological cellular functions. As measuring native disorder experimentally is technically challenging, computational methods for prediction of disordered regions in a protein have gained much interest in recent years. As part of the seventh Critical Assessment of Techniques for Protein Structure Prediction (CASP7), we have assessed 19 methods for disorder prediction based on their results for 96 target proteins. Prediction accuracy was assessed using detailed numerical comparison between the predicted disorder and the experimental structures. On average, methods participating in CASP7 have improved accuracy in comparison to the previous assessment in CASP6. Overall, however, no improvement over the best methods in CASP6 was observed in CASP7. Significant differences between different prediction methods were identified with regard to their sensitivity and specificity in correctly predicting ordered and disordered residues based on a protein target sequence, which is of relevance for practical applications of these computational tools.
Collapse
|
45
|
Slabinski L, Jaroszewski L, Rodrigues APC, Rychlewski L, Wilson IA, Lesley SA, Godzik A. The challenge of protein structure determination--lessons from structural genomics. Protein Sci 2008; 16:2472-82. [PMID: 17962404 DOI: 10.1110/ps.073037907] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The process of experimental determination of protein structure is marred with a high ratio of failures at many stages. With availability of large quantities of data from high-throughput structure determination in structural genomics centers, we can now learn to recognize protein features correlated with failures; thus, we can recognize proteins more likely to succeed and eventually learn how to modify those that are less likely to succeed. Here, we identify several protein features that correlate strongly with successful protein production and crystallization and combine them into a single score that assesses "crystallization feasibility." The formula derived here was tested with a jackknife procedure and validated on independent benchmark sets. The "crystallization feasibility" score described here is being applied to target selection in the Joint Center for Structural Genomics, and is now contributing to increasing the success rate, lowering the costs, and shortening the time for protein structure determination. Analyses of PDB depositions suggest that very similar features also play a role in non-high-throughput structure determination, suggesting that this crystallization feasibility score would also be of significant interest to structural biology, as well as to molecular and biochemistry laboratories.
Collapse
Affiliation(s)
- Lukasz Slabinski
- Joint Center for Structural Genomics, Bioinformatics Core, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | | | | | | | | | | | | |
Collapse
|
46
|
Abstract
The recent advance in our understanding of the relation of protein structure and function cautions that many proteins, or regions of proteins, exist and function without a well-defined three-dimensional structure. These intrinsically disordered/unstructured proteins (IDP/IUP) are frequent in proteomes and carry out essential functions, but their lack of stable structures hampers efforts of solving structures at high resolution by x-ray crystallography and/or NMR. Thus, filtering such proteins/regions out of high-throughput structural genomics pipelines would be of significant benefit in terms of cost and success rate. This chapter outlines the theoretical background of structural disorder, and provides practical advice on the application of advanced bioinformatic predictors to this end, that is to recognize fully/mostly disordered proteins or regions, which are incompatible with structure determination. An emphasis is also given to a somewhat different approach, in which ordered/disordered regions are explicitly delineated to the end of making constructs amenable for structure determination even when disordered regions are present.
Collapse
Affiliation(s)
- Zsuzsanna Dosztányi
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest, Hungary
| | | |
Collapse
|
47
|
Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A. XtalPred: a web server for prediction of protein crystallizability. ACTA ACUST UNITED AC 2007; 23:3403-5. [PMID: 17921170 DOI: 10.1093/bioinformatics/btm477] [Citation(s) in RCA: 218] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED XtalPred is a web server for prediction of protein crystallizability. The prediction is made by comparing several features of the protein with distributions of these features in TargetDB and combining the results into an overall probability of crystallization. XtalPred provides: (1) a detailed comparison of the protein's features to the corresponding distribution from TargetDB; (2) a summary of protein features and predictions that indicate problems that are likely to be encountered during protein crystallization; (3) prediction of ligands; and (4) (optional) lists of close homologs from complete microbial genomes that are more likely to crystallize. AVAILABILITY The XtalPred web server is freely available for academic users on http://ffas.burnham.org/XtalPred
Collapse
|
48
|
Abstract
PrDOS is a server that predicts the disordered regions of a protein from its amino acid sequence (http://prdos.hgc.jp). The server accepts a single protein amino acid sequence, in either plain text or FASTA format. The prediction system is composed of two predictors: a predictor based on local amino acid sequence information and one based on template proteins. The server combines the results of the two predictors and returns a two-state prediction (order/disorder) and a disorder probability for each residue. The prediction results are sent by e-mail, and the server also provides a web-interface to check the results.
Collapse
Affiliation(s)
- Takashi Ishida
- Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan.
| | | |
Collapse
|
49
|
Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T. POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 2007; 23:2046-53. [PMID: 17545177 DOI: 10.1093/bioinformatics/btm302] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. RESULTS We developed Prediction Of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthew's correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors. AVAILABILITY POODLE-L is freely available at http://mbs.cbrc.jp/poodle/poodle-l.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
50
|
Knappenberger JA, Lecomte JTJ. Loop anchor modification causes the population of an alternative native state in an SH3-like domain. Protein Sci 2007; 16:863-79. [PMID: 17456740 PMCID: PMC2206634 DOI: 10.1110/ps.062469507] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Many stably folded proteins are proposed to contain long, unstructured loops. A series of hybrid proteins (EbE1-4) containing the folded scaffold of photosystem I accessory protein E (PsaE), an SH3-like protein, and the 40-residue heme-binding loop of cytochrome b(5) was created to inspect the dependence of thermodynamic and kinetic parameters on the residues at the interface of folded and flexible regions. Compared to the simplest hybrid (EbE1), the chimeras differed by Gly insertions (EbE2, EbE3) or an asymmetric four-residue restructuring of loop termini (EbE4). NMR spectroscopy indicated that the chimeras retained the PsaE topology; native and unfolded state solubilities, however, were affected to varying degrees. Thermal and chemical denaturation experiments revealed that the EbE2 and EbE1 constructs resulted in a modest destabilization of the PsaE core, whereas apparent stability was increased by >5 kJ/mol in EbE4. EbE3 aggregated at microM concentrations and was not studied in detail. EbE4 populated two native states (N1 and N2), which differed by hydrophobic core packing and C-terminal interactions. At room temperature, the population ratio ( approximately 3-4:1) favored the state whose spectroscopic properties most resembled those of PsaE (N1). EbE4 also demonstrated altered folding kinetics, displaying multiple slow phases related to the population of intermediates and possibly N2. It was concluded that loop anchors can affect protein properties, including stability, via short-range effects on local structure and long-range communication with the packed hydrophobic core. Modification of the attachment points appears to be a possible stepping stone in the transition from one three-dimensional structure to another.
Collapse
Affiliation(s)
- Jane A Knappenberger
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | |
Collapse
|