1
|
Mathony J, Aschenbrenner S, Becker P, Niopek D. Dissecting the Determinants of Domain Insertion Tolerance and Allostery in Proteins. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2303496. [PMID: 37562980 PMCID: PMC10558690 DOI: 10.1002/advs.202303496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 07/21/2023] [Indexed: 08/12/2023]
Abstract
Domain insertion engineering is a promising approach to recombine the functions of evolutionarily unrelated proteins. Insertion of light-switchable receptor domains into a selected effector protein, for instance, can yield allosteric effectors with light-dependent activity. However, the parameters that determine domain insertion tolerance and allostery are poorly understood. Here, an unbiased screen is used to systematically assess the domain insertion permissibility of several evolutionary unrelated proteins. Training machine learning models on the resulting data allow to dissect features informative for domain insertion tolerance and revealed sequence conservation statistics as the strongest indicators of suitable insertion sites. Finally, extending the experimental pipeline toward the identification of switchable hybrids results in opto-chemogenetic derivatives of the transcription factor AraC that function as single-protein Boolean logic gates. The study reveals determinants of domain insertion tolerance and yielded multimodally switchable proteins with unique functional properties.
Collapse
Affiliation(s)
- Jan Mathony
- Center for Synthetic BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Department of BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Institute of Pharmacy and Molecular Biotechnology (IPMB)Faculty of Engineering SciencesHeidelberg University69120HeidelbergGermany
| | - Sabine Aschenbrenner
- Institute of Pharmacy and Molecular Biotechnology (IPMB)Faculty of Engineering SciencesHeidelberg University69120HeidelbergGermany
| | - Philipp Becker
- Center for Synthetic BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Department of BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Department of Biotechnology and BiomedicineTechnical University of DenmarkKongens Lyngby2800Denmark
| | - Dominik Niopek
- Institute of Pharmacy and Molecular Biotechnology (IPMB)Faculty of Engineering SciencesHeidelberg University69120HeidelbergGermany
| |
Collapse
|
2
|
Iqbal S, Li F, Akutsu T, Ascher DB, Webb GI, Song J. Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations. Brief Bioinform 2021; 22:6289890. [PMID: 34058752 DOI: 10.1093/bib/bbab184] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/07/2021] [Accepted: 04/21/2021] [Indexed: 11/14/2022] Open
Abstract
Understanding how a mutation might affect protein stability is of significant importance to protein engineering and for understanding protein evolution genetic diseases. While a number of computational tools have been developed to predict the effect of missense mutations on protein stability protein stability upon mutations, they are known to exhibit large biases imparted in part by the data used to train and evaluate them. Here, we provide a comprehensive overview of predictive tools, which has provided an evolving insight into the importance and relevance of features that can discern the effects of mutations on protein stability. A diverse selection of these freely available tools was benchmarked using a large mutation-level blind dataset of 1342 experimentally characterised mutations across 130 proteins from ThermoMutDB, a second test dataset encompassing 630 experimentally characterised mutations across 39 proteins from iStable2.0 and a third blind test dataset consisting of 268 mutations in 27 proteins from the newly published ProThermDB. The performance of the methods was further evaluated with respect to the site of mutation, type of mutant residue and by ranging the pH and temperature. Additionally, the classification performance was also evaluated by classifying the mutations as stabilizing (∆∆G ≥ 0) or destabilizing (∆∆G < 0). The results reveal that the performance of the predictors is affected by the site of mutation and the type of mutant residue. Further, the results show very low performance for pH values 6-8 and temperature higher than 65 for all predictors except iStable2.0 on the S630 dataset. To illustrate how stability and structure change upon single point mutation, we considered four stabilizing, two destabilizing and two stabilizing mutations from two proteins, namely the toxin protein and bovine liver cytochrome. Overall, the results on S268, S630 and S1342 datasets show that the performance of the integrated predictors is better than the mechanistic or individual machine learning predictors. We expect that this paper will provide useful guidance for the design and development of next-generation bioinformatic tools for predicting protein stability changes upon mutations.
Collapse
Affiliation(s)
- Shahid Iqbal
- Computer System Engineering from Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Pakistan
| | - Fuyi Li
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, the University of Melbourne, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | | | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Victoria 3800, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Australia
| |
Collapse
|
3
|
Kon Kam King G, Papaspiliopoulos O, Ruggiero M. Exact inference for a class of hidden Markov models on general state spaces. Electron J Stat 2021. [DOI: 10.1214/21-ejs1841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | | | - Matteo Ruggiero
- University of Torino and Collegio Carlo Alberto, Corso Unione Sovietica 218/bis, 10134, Torino, Italy
| |
Collapse
|
4
|
Farag S, Bleich RM, Shank EA, Isayev O, Bowers AA, Tropsha A. Inter-Modular Linkers play a crucial role in governing the biosynthesis of non-ribosomal peptides. Bioinformatics 2020; 35:3584-3591. [PMID: 30785185 DOI: 10.1093/bioinformatics/btz127] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Revised: 02/12/2019] [Accepted: 02/17/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals and anti-cancer agents. Several groups have tried to expand natural product diversity by intermixing different NRPS modules to create synthetic peptides. This approach has not been as successful as anticipated, suggesting that these modules are not fully interchangeable. RESULTS We explored whether Inter-Modular Linkers (IMLs) impact the ability of NRPS modules to communicate during the synthesis of NRPs. We developed a parser to extract 39 804 IMLs from both well annotated and putative NRPS biosynthetic gene clusters from 39 232 bacterial genomes and established the first IMLs database. We analyzed these IMLs and identified a striking relationship between IMLs and the amino acid substrates of their adjacent modules. More than 92% of the identified IMLs connect modules that activate a particular pair of substrates, suggesting that significant specificity is embedded within these sequences. We therefore propose that incorporating the correct IML is critical when attempting combinatorial biosynthesis of novel NRPS. AVAILABILITY AND IMPLEMENTATION The IMLs database as well as the NRPS-Parser have been made available on the web at https://nrps-linker.unc.edu. The entire source code of the project is hosted in GitHub repository (https://github.com/SWFarag/nrps-linker). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sherif Farag
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Rachel M Bleich
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Elizabeth A Shank
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Olexandr Isayev
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Albert A Bowers
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alexander Tropsha
- Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
5
|
Milano T, Angelaccio S, Tramonti A, Di Salvo ML, Contestabile R, Pascarella S. Structural properties of the linkers connecting the N- and C- terminal domains in the MocR bacterial transcriptional regulators. BIOCHIMIE OPEN 2016; 3:8-18. [PMID: 29450126 PMCID: PMC5801912 DOI: 10.1016/j.biopen.2016.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Accepted: 07/10/2016] [Indexed: 12/03/2022]
Abstract
Peptide inter-domain linkers are peptide segments covalently linking two adjacent domains within a protein. Linkers play a variety of structural and functional roles in naturally occurring proteins. In this work we analyze the sequence properties of the predicted linker regions of the bacterial transcriptional regulators belonging to the recently discovered MocR subfamily of the GntR regulators. Analyses were carried out on the MocR sequences taken from the phyla Actinobacteria, Firmicutes, Alpha-, Beta- and Gammaproteobacteria. The results suggest that MocR linkers display phylum-specific characteristics and unique features different from those already described for other classes of inter-domain linkers. They show an average length significantly higher: 31.8 ± 14.3 residues reaching a maximum of about 150 residues. Compositional propensities displayed general and phylum-specific trends. Pro is dominating in all linkers. Dyad propensity analysis indicate Pro–Pro as the most frequent amino acid pair in all linkers. Physicochemical properties of the linker regions were assessed using amino acid indices relative to different features: in general, MocR linkers are flexible, hydrophilic and display propensity for β-turn or coil conformations. Linker sequences are hypervariable: only similarities between MocR linkers from organisms related at the level of species or genus could be found with sequence searches. The results shed light on the properties of the linker regions of the new MocR subfamily of bacterial regulators and may provide knowledge-based rules for designing artificial linkers with desired properties. An overview of the structural properties of MocR inter-domain linkers is reported. Linker length distribution is heterogeneous in different phyla. Linkers are flexible, hydrophilic and have coil conformation propensity. Pro and Pro–Pro dyads are very frequent in all the linkers. MocR linkers display a few properties different from those reported for other linkers.
Collapse
Affiliation(s)
- Teresa Milano
- Dipartimento di Scienze biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, 00185 Roma, Italy
| | - Sebastiana Angelaccio
- Dipartimento di Scienze biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, 00185 Roma, Italy
| | - Angela Tramonti
- Istituto di Biologia e Patologia Molecolari, Consiglio Nazionale delle Ricerche, 00185 Roma, Italy
| | - Martino Luigi Di Salvo
- Dipartimento di Scienze biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, 00185 Roma, Italy
| | - Roberto Contestabile
- Dipartimento di Scienze biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, 00185 Roma, Italy
| | - Stefano Pascarella
- Dipartimento di Scienze biochimiche "A. Rossi Fanelli", Sapienza Università di Roma, 00185 Roma, Italy
| |
Collapse
|
6
|
Chatterjee P, Basu S, Zubek J, Kundu M, Nasipuri M, Plewczynski D. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach. J Mol Model 2016; 22:72. [PMID: 26969678 PMCID: PMC4788683 DOI: 10.1007/s00894-016-2933-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 02/17/2016] [Indexed: 01/04/2023]
Abstract
The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.
Collapse
Affiliation(s)
- Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia, Kolkata, 700152, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Julian Zubek
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.,Center of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
| | - Mahantapas Kundu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Dariusz Plewczynski
- Center of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland. .,Faculty of Pharmacy, Medical University of Warsaw, Warsaw, Poland.
| |
Collapse
|
7
|
Chen SA, Lee TY, Ou YY. Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins. BMC Bioinformatics 2010; 11:536. [PMID: 21034461 PMCID: PMC2989983 DOI: 10.1186/1471-2105-11-536] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2010] [Accepted: 10/29/2010] [Indexed: 11/16/2022] Open
Abstract
Background While occurring enzymatically in biological systems, O-linked glycosylation affects protein folding, localization and trafficking, protein solubility, antigenicity, biological activity, as well as cell-cell interactions on membrane proteins. Catalytic enzymes involve glycotransferases, sugar-transferring enzymes and glycosidases which trim specific monosaccharides from precursors to form intermediate structures. Due to the difficulty of experimental identification, several works have used computational methods to identify glycosylation sites. Results By investigating glycosylated sites that contain various motifs between Transmembrane (TM) and non-Transmembrane (non-TM) proteins, this work presents a novel method, GlycoRBF, that implements radial basis function (RBF) networks with significant amino acid pairs (SAAPs) for identifying O-linked glycosylated serine and threonine on TM proteins and non-TM proteins. Additionally, a membrane topology is considered for reducing the false positives on glycosylated TM proteins. Based on an evaluation using five-fold cross-validation, the consideration of a membrane topology can reduce 31.4% of the false positives when identifying O-linked glycosylation sites on TM proteins. Via an independent test, GlycoRBF outperforms previous O-linked glycosylation site prediction schemes. Conclusion A case study of Cyclic AMP-dependent transcription factor ATF-6 alpha was presented to demonstrate the effectiveness of GlycoRBF. Web-based GlycoRBF, which can be accessed at http://GlycoRBF.bioinfo.tw, can identify O-linked glycosylated serine and threonine effectively and efficiently. Moreover, the structural topology of Transmembrane (TM) proteins with glycosylation sites is provided to users. The stand-alone version of GlycoRBF is also available for high throughput data analysis.
Collapse
Affiliation(s)
- Shu-An Chen
- Department of Computer Science and Engineering, Yuan Ze University, Chungli 320, Taiwan
| | | | | |
Collapse
|
8
|
Liang G, Zhao W. Using factor analysis scales of generalized amino acid information for prediction and characteristic analysis of β-turns in proteins based on a support vector machine model. Sci China Chem 2010. [DOI: 10.1007/s11426-010-0165-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
Ebina T, Toh H, Kuroda Y. Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics. Biopolymers 2009; 92:1-8. [PMID: 18844295 DOI: 10.1002/bip.21105] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The prediction of structural domains in novel protein sequences is becoming of practical importance. One important area of application is the development of computer-aided techniques for identifying, at a low cost, novel protein domain targets for large-scale functional and structural proteomics. Here, we report a loop-length-dependent support vector machine (SVM) prediction of domain linkers, which are loops separating two structural domains. (DLP-SVM is freely available at: http://www.tuat.ac.jp/ approximately domserv/cgi-bin/DLP-SVM.cgi.) We constructed three loop-length-dependent SVM predictors of domain linkers (SVM-All, SVM-Long and SVM-Short), and also built SVM-Joint, which combines the results of SVM-Short and SVM-Long into a single consolidated prediction. The performances of SVM-Joint were, in most aspects, the highest, with a sensitivity of 59.7% and a specificity of 43.6%, which indicated that the specificity and the sensitivity were improved by over 2 and 3% respectively, when loop-length-dependent characteristics were taken into account. Furthermore, the sensitivity and specificity of SVM-Joint were, respectively, 37.6 and 17.4% higher than those of a random guess, and also superior to those of previously reported domain linker predictors. These results indicate that SVMs can be used to predict domain linkers, and that loop-length-dependent characteristics are useful for improving SVM prediction performances.
Collapse
Affiliation(s)
- Teppei Ebina
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Naka-machi, Koganei-shi, Tokyo 184-8588, Japan
| | | | | |
Collapse
|
10
|
Pang CNI, Lin K, Wouters MA, Heringa J, George RA. Identifying foldable regions in protein sequence from the hydrophobic signal. Nucleic Acids Res 2007; 36:578-88. [PMID: 18056079 PMCID: PMC2241846 DOI: 10.1093/nar/gkm1070] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains.
Collapse
Affiliation(s)
- Chi N I Pang
- Structural & Computational Biology Program, Victor Chang Cardiac Research Institute, Sydney, Australia
| | | | | | | | | |
Collapse
|
11
|
Bernardes JS, Dávila AMR, Costa VS, Zaverucha G. Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinformatics 2007; 8:435. [PMID: 17999748 PMCID: PMC2245980 DOI: 10.1186/1471-2105-8-435] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 11/09/2007] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. RESULTS We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. CONCLUSION We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.
Collapse
Affiliation(s)
- Juliana S Bernardes
- COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Vítor S Costa
- DCC-FCUP e LIACC, Universidade do Porto, Porto, Portugal
| | - Gerson Zaverucha
- COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
12
|
Domain selection combined with improved cloning strategy for high throughput expression of higher eukaryotic proteins. BMC Biotechnol 2007; 7:45. [PMID: 17663785 PMCID: PMC1950093 DOI: 10.1186/1472-6750-7-45] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 07/30/2007] [Indexed: 12/02/2022] Open
Abstract
Background Expression of higher eukaryotic genes as soluble, stable recombinant proteins is still a bottleneck step in biochemical and structural studies of novel proteins today. Correct identification of stable domains/fragments within the open reading frame (ORF), combined with proper cloning strategies, can greatly enhance the success rate when higher eukaryotic proteins are expressed as these domains/fragments. Furthermore, a HTP cloning pipeline incorporated with bioinformatics domain/fragment selection methods will be beneficial to studies of structure and function genomics/proteomics. Results With bioinformatics tools, we developed a domain/domain boundary prediction (DDBP) method, which was trained by available experimental data. Combined with an improved cloning strategy, DDBP had been applied to 57 proteins from C. elegans. Expression and purification results showed there was a 10-fold increase in terms of obtaining purified proteins. Based on the DDBP method, the improved GATEWAY cloning strategy and a robotic platform, we constructed a high throughput (HTP) cloning pipeline, including PCR primer design, PCR, BP reaction, transformation, plating, colony picking and entry clones extraction, which have been successfully applied to 90 C. elegans genes, 88 Brucella genes, and 188 human genes. More than 97% of the targeted genes were obtained as entry clones. This pipeline has a modular design and can adopt different operations for a variety of cloning/expression strategies. Conclusion The DDBP method and improved cloning strategy were satisfactory. The cloning pipeline, combined with our recombinant protein HTP expression pipeline and the crystal screening robots, constitutes a complete platform for structure genomics/proteomics. This platform will increase the success rate of purification and crystallization dramatically and promote the further advancement of structure genomics/proteomics.
Collapse
|
13
|
Emmert-Streib F, Mushegian A. A topological algorithm for identification of structural domains of proteins. BMC Bioinformatics 2007; 8:237. [PMID: 17608939 PMCID: PMC1933582 DOI: 10.1186/1471-2105-8-237] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 07/03/2007] [Indexed: 11/10/2022] Open
Abstract
Background Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure. Results We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy. Conclusion Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Stowers Institute for Medical Research, 1000 E. 50th Street, Kansas City, MO 64110, USA
- University of Washington, 1705 NE Pacific St, Box 355065, Seattle WA 98195-5065, USA
| | - Arcady Mushegian
- Stowers Institute for Medical Research, 1000 E. 50th Street, Kansas City, MO 64110, USA
- University of Kansas Medical Center, Kansas City, KS 66160, USA
| |
Collapse
|
14
|
Dong Q, Wang X, Lin L, Xu Z. Domain boundary prediction based on profile domain linker propensity index. Comput Biol Chem 2006; 30:127-33. [PMID: 16531120 DOI: 10.1016/j.compbiolchem.2006.01.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2005] [Revised: 12/29/2005] [Accepted: 01/08/2006] [Indexed: 11/19/2022]
Abstract
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multi-domain proteins but also for the experimental structure determination. In this work, a novel index at the profile level is presented, namely, the profile domain linker propensity index (PDLI), which uses the evolutionary information of profiles for domain linker prediction. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. PDLI is then obtained by the frequencies of binary profiles in domain linkers as compared to those in domains. A smooth and normalized numeric profile is generated for any amino acid sequences from which the domain linkers can be predicted. Testing on the Structural Classification of Proteins (SCOP) database and CASP6 targets shows that PDLI outperforms other indexes at the amino acid level.
Collapse
Affiliation(s)
- Qiwen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, PR China.
| | | | | | | |
Collapse
|