1
|
Grigorev V, Tinkov O, Grigoreva L, Rasdolsky A. Structural fractal analysis of the active sites of acetylcholinesterase from various organisms. J Mol Graph Model 2022; 116:108265. [PMID: 35816907 DOI: 10.1016/j.jmgm.2022.108265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/24/2022] [Accepted: 06/29/2022] [Indexed: 12/15/2022]
Abstract
Acetylcholinesterase (AChE) is the object of many studies due to the fact that it plays an important role in the vital activity of organisms. In particular, when new AChE inhibitors are developed, much attention is paid to the specificity of their action. One of the approaches used to study the specificity is to compare AChE taken from various organisms. In this work, crystallographic data are used to investigate the active sites of AChE (ASAs) in the free (uncomplexed) state for the following five organisms: Homo sapiens (HS), Mus musculus (MM), Torpedo californica (TC), Electrophorus electricus (EE), and Drosophila melanogaster (DM). The structural fractal analysis (SFA) proposed by us earlier is used as a research method. This method is based on the calculation and comparison of the fractal dimensions of molecular structures. SFA demonstrates that there are no significant structural differences between the active sites of human AChE and other AChEs. However, differences are found for the MM/EE pair. Further analysis of individual AARs has revealed two different areas of active sites. Ser203, Trp236, Phe338, and Tyr341 are found to belong to a variable region, and the remaining AARs belong to a conservative region of the ASAs. The fraction of "variability" is low, 0.8%.
Collapse
Affiliation(s)
- Veniamin Grigorev
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Sciences, Severniy proezd 1, 142432, Chernogolovka, Moscow region, Russia.
| | - Oleg Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Transnistrian State University, October 25 Str. 128, 3300, Tiraspol, Transdniestria, Republic of Moldova
| | - Ludmila Grigoreva
- Department of Fundamental Physical and Chemical Engineering, Moscow State University, Leninskiye Gory 1/51, 119991, Moscow, Russia
| | - Alexander Rasdolsky
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Sciences, Severniy proezd 1, 142432, Chernogolovka, Moscow region, Russia
| |
Collapse
|
2
|
Ling C, Wei X, Shen Y, Zhang H. Development and validation of multiple machine learning algorithms for the classification of G-protein-coupled receptors using molecular evolution model-based feature extraction strategy. Amino Acids 2021; 53:1705-1714. [PMID: 34562175 DOI: 10.1007/s00726-021-03080-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 09/13/2021] [Indexed: 11/25/2022]
Abstract
Machine learning is one of the most potential ways to realize the function prediction of the incremental large-scale G-protein-coupled receptors (GPCR). Prior research reveals that the key to determining the overall classification accuracy of GPCR is extracting valuable features and filtering out redundancy. To achieve a more efficient classification model, we put the feature synonym problem into consideration and create a new method based on functional word clustering and integration. Through evaluating the evolution correlation between features using the transition scores in mature molecular substitution matrices, candidate features are clustered into synonym groups. Each group of the clustered features is then integrated and represented by a unique key functional word. These retained key functional words are used to form a feature knowledge base. The original GPCR sequences are then transferred into feature vectors based on a feature re-extraction strategy according to the features in the knowledge base before the training and testing stage. We create multiple machine learning models based on Naïve Bayesian (NB), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms. The established model is applied to classify two public data sets containing 8354 and 12,731 GPCRs, respectively. These models achieve significant performance in almost all evaluation criteria in comparison with state-of-the art. This work demonstrated the potential of the novel feature extraction strategy and provided an effective theoretical design for the hierarchical classification of GPCRs.
Collapse
Affiliation(s)
- Cheng Ling
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Xiaolin Wei
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Yitian Shen
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Haoyu Zhang
- School of Information Engineering, Zhejiang Ocean University, Zhoushan, China.
| |
Collapse
|
3
|
Leinweber M, Fober T, Freisleben B. GPU-Based Point Cloud Superpositioning for Structural Comparisons of Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:740-752. [PMID: 27845672 DOI: 10.1109/tcbb.2016.2625793] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we present a novel approach to solve the labeled point cloud superpositioning problem for performing structural comparisons of protein binding sites. The solution is based on a parallel evolution strategy that operates on large populations and runs on GPU hardware. The proposed evolution strategy reduces the likelihood of getting stuck in a local optimum of the multimodal real-valued optimization problem represented by labeled point cloud superpositioning. The performance of the GPU-based parallel evolution strategy is compared to a previously proposed CPU-based sequential approach for labeled point cloud superpositioning, indicating that the GPU-based parallel evolution strategy leads to qualitatively better results and significantly shorter runtimes, with speed improvements of up to a factor of 1,500 for large populations. Binary classification tests based on the ATP, NADH, and FAD protein subsets of CavBase, a database containing putative binding sites, show average classification rate improvements from about 92 percent (CPU) to 96 percent (GPU). Further experiments indicate that the proposed GPU-based labeled point cloud superpositioning approach can be superior to traditional protein comparison approaches based on sequence alignments.
Collapse
|
4
|
Li M, Ling C, Xu Q, Gao J. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments. Amino Acids 2017; 50:255-266. [PMID: 29151135 DOI: 10.1007/s00726-017-2512-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 11/14/2017] [Indexed: 10/18/2022]
Abstract
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Collapse
Affiliation(s)
- Man Li
- Department of Computer Science and Technology, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Cheng Ling
- Department of Computer Science and Technology, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.
| | - Qi Xu
- Department of Computer Science and Technology, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Jingyang Gao
- Department of Computer Science and Technology, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
| |
Collapse
|
5
|
Krotzky T, Grunwald C, Egerland U, Klebe G. Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J Chem Inf Model 2014; 55:165-79. [PMID: 25474400 DOI: 10.1021/ci5005898] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Determination of structural similarities between protein binding pockets is an important challenge in in silico drug design. It can help to understand selectivity considerations, predict unexpected ligand cross-reactivity, and support the putative annotation of function to orphan proteins. To this end, Cavbase was developed as a tool for the automated detection, storage, and classification of putative protein binding sites. In this context, binding sites are characterized as sets of pseudocenters, which denote surface-exposed physicochemical properties, and can be used to enable mutual binding site comparisons. However, these comparisons tend to be computationally very demanding and often lead to very slow computations of the similarity measures. In this study, we propose RAPMAD (RApid Pocket MAtching using Distances), a new evaluation formalism for Cavbase entries that allows for ultrafast similarity comparisons. Protein binding sites are represented by sets of distance histograms that are both generated and compared with linear complexity. Attaining a speed of more than 20 000 comparisons per second, screenings across large data sets and even entire databases become easily feasible. We demonstrate the discriminative power and the short runtime by performing several classification and retrieval experiments. RAPMAD attains better success rates than the comparison formalism originally implemented into Cavbase or several alternative approaches developed in recent time, while requiring only a fraction of their runtime. The pratical use of our method is finally proven by a successful prospective virtual screening study that aims for the identification of novel inhibitors of the NMDA receptor.
Collapse
Affiliation(s)
- Timo Krotzky
- Department of Pharmaceutical Chemistry, Philipps-Universität Marburg , Marbacher Weg 6-10, 35032 Marburg, Germany
| | | | | | | |
Collapse
|
6
|
Krotzky T, Fober T, Hüllermeier E, Klebe G. Extended Graph-Based Models for Enhanced Similarity Search in Cavbase. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:878-890. [PMID: 26356860 DOI: 10.1109/tcbb.2014.2325020] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.
Collapse
|
7
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
8
|
Kakumani R, Devabhaktuni V, Ahmad M. A two-stage neural network based technique for protein secondary structure prediction. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2008:1355-8. [PMID: 19162919 DOI: 10.1109/iembs.2008.4649416] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein secondary structure prediction is one of the most important research areas in bioinformatics. In this paper, we propose a two-stage protein secondary structure prediction technique, implemented using neural network models. The first neural network stage of the proposed technique associates the input protein sequence to a bin containing its corresponding homologues. The second stage predicts the secondary structure of the input sequence utilizing a neural prediction model specific to the bin obtained from stage one. The strategy of binning allows for simplified and accurate neural models. This technique is implemented on the RS126 dataset and its prediction accuracy is compared with that of the standard PHD approach.
Collapse
Affiliation(s)
- Rajasekhar Kakumani
- Department of Electrical and Computer Engineering, Concordia University, 1455 de Maisonneuve Blvd. West, Montreal, H3G1M8, Quebec, Canada.
| | | | | |
Collapse
|
9
|
Tseng YY, Dundas J, Liang J. Predicting protein function and binding profile via matching of local evolutionary and geometric surface patterns. J Mol Biol 2009; 387:451-64. [PMID: 19154742 PMCID: PMC2670802 DOI: 10.1016/j.jmb.2008.12.072] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 12/19/2008] [Accepted: 12/23/2008] [Indexed: 11/25/2022]
Abstract
Inferring protein functions from structures is a challenging task, as a large number of orphan protein structures from structural genomics project are now solved without their biochemical functions characterized. For proteins binding to similar substrates or ligands and carrying out similar functions, their binding surfaces are under similar physicochemical constraints, and hence the sets of allowed and forbidden residue substitutions are similar. However, it is difficult to isolate such selection pressure due to protein function from selection pressure due to protein folding, and evolutionary relationship reflected by global sequence and structure similarities between proteins is often unreliable for inferring protein function. We have developed a method, called pevoSOAR (pocket-based evolutionary search of amino acid residues), for predicting protein functions by solving the problem of uncovering amino acids residue substitution pattern due to protein function and separating it from amino acids substitution pattern due to protein folding. We incorporate evolutionary information specific to an individual binding region and match local surfaces on a large scale with millions of precomputed protein surfaces to identify those with similar functions. Our pevoSOAR method also generates a probablistic model called the computed binding a profile that characterizes protein-binding activities that may involve multiple substrates or ligands. We show that our method can be used to predict enzyme functions with accuracy. Our method can also assess enzyme binding specificity and promiscuity. In an objective large-scale test of 100 enzyme families with thousands of structures, our predictions are found to be sensitive and specific: At the stringent specificity level of 99.98%, we can correctly predict enzyme functions for 80.55% of the proteins. The overall area under the receiver operating characteristic curve measuring the performance of our prediction is 0.955, close to the perfect value of 1.00. The best Matthews coefficient is 86.6%. Our method also works well in predicting the biochemical functions of orphan proteins from structural genomics projects.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, IL 60607-7052, USA
| | | | | |
Collapse
|
10
|
Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2009; 16:1755-63. [PMID: 19081051 DOI: 10.1016/j.str.2008.10.017] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Revised: 10/16/2008] [Accepted: 10/19/2008] [Indexed: 10/21/2022]
Abstract
Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.
Collapse
Affiliation(s)
- Guy Nimrod
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | | | | | | |
Collapse
|
11
|
Pai RD, Zhang W, Schuwirth BS, Hirokawa G, Kaji H, Kaji A, Cate JHD. Structural Insights into ribosome recycling factor interactions with the 70S ribosome. J Mol Biol 2008; 376:1334-47. [PMID: 18234219 PMCID: PMC2712656 DOI: 10.1016/j.jmb.2007.12.048] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Revised: 12/11/2007] [Accepted: 12/19/2007] [Indexed: 11/25/2022]
Abstract
At the end of translation in bacteria, ribosome recycling factor (RRF) is used together with elongation factor G to recycle the 30S and 50S ribosomal subunits for the next round of translation. In x-ray crystal structures of RRF with the Escherichia coli 70S ribosome, RRF binds to the large ribosomal subunit in the cleft that contains the peptidyl transferase center. Upon binding of either E. coli or Thermus thermophilus RRF to the E. coli ribosome, the tip of ribosomal RNA helix 69 in the large subunit moves away from the small subunit toward RRF by 8 A, thereby disrupting a key contact between the small and large ribosomal subunits termed bridge B2a. In the ribosome crystals, the ability of RRF to destabilize bridge B2a is influenced by crystal packing forces. Movement of helix 69 involves an ordered-to-disordered transition upon binding of RRF to the ribosome. The disruption of bridge B2a upon RRF binding to the ribosome seen in the present structures reveals one of the key roles that RRF plays in ribosome recycling, the dissociation of 70S ribosomes into subunits. The structures also reveal contacts between domain II of RRF and protein S12 in the 30S subunit that may also play a role in ribosome recycling.
Collapse
Affiliation(s)
- Raj D Pai
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Singh A, Kushwaha HR, Sharma P. Molecular modelling and comparative structural account of aspartyl beta-semialdehyde dehydrogenase of Mycobacterium tuberculosis (H37Rv). J Mol Model 2008; 14:249-63. [PMID: 18236087 DOI: 10.1007/s00894-008-0267-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 01/03/2008] [Indexed: 11/29/2022]
Abstract
Aspartyl beta-semialdehyde dehydrogenase (ASADH) is an important enzyme, occupying the first branch position of the biosynthetic pathway of the aspartate family of amino acids in bacteria, fungi and higher plants. It catalyses reversible dephosphorylation of L: -beta-aspartyl phosphate (betaAP) to L: -aspartate-beta-semialdehyde (ASA), a key intermediate in the biosynthesis of diaminopimelic acid (DAP)-an essential component of cross linkages in bacterial cell walls. Since the aspartate pathway is unique to plants and bacteria, and ASADH is the key enzyme in this pathway, it becomes an attractive target for antimicrobial agent development. Therefore, with the objective of deducing comparative structural models, we have described a molecular model emphasizing the uniqueness of ASADH from Mycobacterium tuberculosis (H37Rv) that should generate insights into the structural distinctiveness of this protein as compared to structurally resolved ASADH from other bacterial species. We find that mtASADH exhibits structural features common to bacterial ASADH, while other structural motifs are not present. Structural analysis of various domains in mtASADH reveals structural conservation among all bacterial ASADH proteins. The results suggest that the probable mechanism of action of the mtASADH enzyme might be same as that of other bacterial ASADH. Analysis of the structure of mtASADH will shed light on its mechanism of action and may help in designing suitable antagonists against this enzyme that could control the growth of Mycobacterium tuberculosis.
Collapse
Affiliation(s)
- Anupama Singh
- Centre of Computational Biology and Bioinformatics (CCBB), School of Information Technology, Jawaharlal Nehru University, New Delhi, 110067, India
| | | | | |
Collapse
|
13
|
Song J, Yuan Z, Tan H, Huber T, Burrage K. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. ACTA ACUST UNITED AC 2007; 23:3147-54. [PMID: 17942444 DOI: 10.1093/bioinformatics/btm505] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. RESULTS We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. AVAILABILITY The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Collapse
Affiliation(s)
- Jiangning Song
- Advanced Computational Modelling Centre, The University of Queensland, Brisbane, QLD 4072, Australia
| | | | | | | | | |
Collapse
|
14
|
Quantitative assessment of relationship between sequence similarity and function similarity. BMC Genomics 2007; 8:222. [PMID: 17620139 PMCID: PMC1949826 DOI: 10.1186/1471-2164-8-222] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2006] [Accepted: 07/09/2007] [Indexed: 11/16/2022] Open
Abstract
Background Comparative sequence analysis is considered as the first step towards annotating new proteins in genome annotation. However, sequence comparison may lead to creation and propagation of function assignment errors. Thus, it is important to perform a thorough analysis for the quality of sequence-based function assignment using large-scale data in a systematic way. Results We present an analysis of the relationship between sequence similarity and function similarity for the proteins in four model organisms, i.e., Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorrhabditis elegans, and Drosophila melanogaster. Using a measure of functional similarity based on the three categories of Gene Ontology (GO) classifications (biological process, molecular function, and cellular component), we quantified the correlation between functional similarity and sequence similarity measured by sequence identity or statistical significance of the alignment and compared such a correlation against randomly chosen protein pairs. Conclusion Various sequence-function relationships were identified from BLAST versus PSI-BLAST, sequence identity versus Expectation Value, GO indices versus semantic similarity approaches, and within genome versus between genome comparisons, for the three GO categories. Our study provides a benchmark to estimate the confidence in assignment of functions purely based on sequence similarity.
Collapse
|
15
|
Saini HK, Fischer D. FRalanyzer: a tool for functional analysis of fold-recognition sequence-structure alignments. Nucleic Acids Res 2007; 35:W499-502. [PMID: 17537819 PMCID: PMC1933221 DOI: 10.1093/nar/gkm367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
We describe FRalanyzer (Fold Recognition alignment analyzer), a new web tool to visually inspect sequence–structure alignments in order to predict functionally important residues in a query sequence of unknown function. This tool is aimed at helping to infer functional relationships between a query sequence and a template structure, and is particularly useful in analyzing fold recognition (FR) results. Because similar folds do not necessarily share the same function, it is not always straightforward to infer a function from an FR result alone. Manual inspection of the FR sequence-structure alignment is often required in order to search for conservation of functionally important residues. FRalanyzer automates parts of this time-consuming process. FRalanyzer takes as input a sequence–structure alignment, automatically searches annotated databases, displays functionally significant residues and highlights the functionally important positions that are identical in the alignment. FRalanyzer can also be used with sequence-structure alignments obtained by other methods, and with structure–structure alignments obtained from structural comparison of newly determined 3D-structures of unknown function. Fralanyzer is available at http://fralanyzer.cse.buffalo.edu/.
Collapse
Affiliation(s)
- Harpreet Kaur Saini
- Computer Science and Engineering Department, 201 Bell Hall University at Buffalo, Buffalo, NY 14260, USA.
| | | |
Collapse
|
16
|
Nagano N, Noguchi T, Akiyama Y. Systematic comparison of catalytic mechanisms of hydrolysis and transfer reactions classified in the EzCatDB database. Proteins 2007; 66:147-59. [PMID: 17039546 DOI: 10.1002/prot.21193] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Catalytic mechanisms of 270 enzymes from 131 superfamilies, mainly hydrolases and transferases, were analyzed based on their enzyme structures. A method of systematic comparison and classification of the catalytic reactions was developed. Hydrolysis and transfer reactions closely resemble one another, displaying common mechanisms, single displacement, and double displacement. These displacement mechanisms might be further subclassified according to the type of catalytic factors and nucleophilic substitution involved. Several types of catalytic factors exist: nucleophile, acid, base, stabilizer, modulator, cofactors. Nucleophilic substitution might be categorized as S(N)1/S(N)2 (or dissociative/associative) reactions. The classification indicates that some mechanisms favor particular types of catalytic factors. In hydrolyses of amide bonds and phosphoric ester bonds, mechanisms with single displacement tend to use inorganic cofactors such as zinc and magnesium ions as important catalysts, whereas those with double displacement frequently do not use such cofactors. In contrast, hydrolyses of O-glycoside bond rarely use such cofactors, with one exception. The trypsin-like hydrolytic reaction, which is catalyzed by the classic catalytic triad comprising serine/histidine/aspartate, can be considered as a "super-reaction" because it is observed in at least three nonhomologous enzymes, whereas most reactions are singlets without any nonhomologous enzymes. By dividing complex reactions into several reactions, correlations between active site structures and catalytic functions can be suggested. This classification method is applicable to other reactions such as elimination and isomerization. Furthermore, it will facilitate annotation of enzyme functions from 3D patterns of enzyme active sites. The classification is available at http://mbs.cbrc.jp/EzCatDB/RLCP/index.html.
Collapse
Affiliation(s)
- Nozomi Nagano
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
17
|
Saha RP, Chakrabarti P. Molecular modeling and characterization of Vibrio cholerae transcription regulator HlyU. BMC STRUCTURAL BIOLOGY 2006; 6:24. [PMID: 17116251 PMCID: PMC1665450 DOI: 10.1186/1472-6807-6-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2006] [Accepted: 11/20/2006] [Indexed: 11/15/2022]
Abstract
Background The SmtB/ArsR family of prokaryotic metal-regulatory transcriptional repressors represses the expression of operons linked to stress-inducing concentrations of heavy metal ions, while derepression results from direct binding of metal ions by these 'metal-sensor' proteins. The HlyU protein from Vibrio cholerae is the positive regulator of haemolysin gene, it also plays important role in the regulation of expression of the virulence genes. Despite the understanding of biochemical properties, its structure and relationship to other protein families remain unknown. Results We find that HlyU exhibits structural features common to the SmtB/ArsR family of transcriptional repressors. Analysis of the modeled structure of HlyU reveals that it does not have the key metal-sensing residues which are unique to the SmtB/ArsR family of repressors, yet the tertiary structure is very similar to the family members. HlyU is the only member that has a positive control on transcription, while all the other members in the family are repressors. An evolutionary analysis with other SmtB/ArsR family members suggests that during evolution HlyU probably occurred by gene duplication and mutational events that led to the emergence of this protein from ancestral transcriptional repressor by the loss of the metal-binding sites. Conclusion The study indicates that the same protein family can contain both the positive regulator of transcription and repressors – the exact function being controlled by the absence or the presence of metal-binding sites.
Collapse
Affiliation(s)
- Rudra P Saha
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| |
Collapse
|
18
|
Glaser F, Rosenberg Y, Kessel A, Pupko T, Ben-Tal N. The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins 2006; 58:610-7. [PMID: 15614759 DOI: 10.1002/prot.20305] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The HSSP (Homology-Derived Secondary Structure of Proteins) database provides multiple sequence alignments (MSAs) for proteins of known three-dimensional (3D) structure in the Protein Data Bank (PDB). The database also contains an estimate of the degree of evolutionary conservation at each amino acid position. This estimate, which is based on the relative entropy, correlates with the functional importance of the position; evolutionarily conserved positions (i.e., positions with limited variability and low entropy) are occasionally important to maintain the 3D structure and biological function(s) of the protein. We recently developed the Rate4Site algorithm for scoring amino acid conservation based on their calculated evolutionary rate. This algorithm takes into account the phylogenetic relationships between the homologs and the stochastic nature of the evolutionary process. Here we present the ConSurf-HSSP database of Rate4Site estimates of the evolutionary rates of the amino acid positions, calculated using HSSP's MSAs. The database provides precalculated evolutionary rates for nearly all of the PDB. These rates are projected, using a color code, onto the protein structure, and can be viewed online using the ConSurf server interface. To exemplify the database, we analyzed in detail the conservation pattern obtained for pyruvate kinase and compared the results with those observed using the relative entropy scores of the HSSP database. It is reassuring to know that the main functional region of the enzyme is detectable using both conservation scores. Interestingly, the ConSurf-HSSP calculations mapped additional functionally important regions, which are moderately conserved and were overlooked by the original HSSP estimate. The ConSurf-HSSP database is available online (http://consurf-hssp.tau.ac.il).
Collapse
Affiliation(s)
- Fabian Glaser
- Department of Biochemistry, Tel Aviv University, Tel Aviv, Israel
| | | | | | | | | |
Collapse
|
19
|
Mika S, Rost B. Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2006; 2:e79. [PMID: 16854211 PMCID: PMC1513270 DOI: 10.1371/journal.pcbi.0020079] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Indexed: 11/21/2022] Open
Abstract
Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/ The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (Saccharomyces cerevisiae), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.
Collapse
Affiliation(s)
- Sven Mika
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
20
|
Affiliation(s)
- Marvin L Tanzer
- Department of Cell Biology and Anatomy, University of Arizona Health Sciences Center, PO Box 86535, Tucson, AZ, 85754-6535, USA
| |
Collapse
|
21
|
Abstract
MOTIVATION The genome of Arabidopsis thaliana, which has the best understood plant genome, still has approximately one-third of its genes with no functional annotation at all from either MIPS or TAIR. We have applied our Data Mining Prediction (DMP) method to the problem of predicting the functional classes of these protein sequences. This method is based on using a hybrid machine-learning/data-mining method to identify patterns in the bioinformatic data about sequences that are predictive of function. We use data about sequence, predicted secondary structure, predicted structural domain, InterPro patterns, sequence similarity profile and expressions data. RESULTS We predicted the functional class of a high percentage of the Arabidopsis genes with currently unknown function. These predictions are interpretable and have good test accuracies. We describe in detail seven of the rules produced.
Collapse
Affiliation(s)
- A Clare
- Department of Computer Science, University of Wales Aberystwyth SY23 3DB, UK.
| | | | | | | |
Collapse
|
22
|
Vries JK, Munshi R, Tobi D, Klein-Seetharaman J, Benos PV, Bahar I. A sequence alignment-independent method for protein classification. ACTA ACUST UNITED AC 2005; 3:137-48. [PMID: 15693739 DOI: 10.2165/00822942-200403020-00008] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Annotation of the rapidly accumulating body of sequence data relies heavily on the detection of remote homologues and functional motifs in protein families. The most popular methods rely on sequence alignment. These include programs that use a scoring matrix to compare the probability of a potential alignment with random chance and programs that use curated multiple alignments to train profile hidden Markov models (HMMs). Related approaches depend on bootstrapping multiple alignments from a single sequence. However, alignment-based programs have limitations. They make the assumption that contiguity is conserved between homologous segments, which may not be true in genetic recombination or horizontal transfer. Alignments also become ambiguous when sequence similarity drops below 40%. This has kindled interest in classification methods that do not rely on alignment. An approach to classification without alignment based on the distribution of contiguous sequences of four amino acids (4-grams) was developed. Interest in 4-grams stemmed from the observation that almost all theoretically possible 4-grams (20(4)) occur in natural sequences and the majority of 4-grams are uniformly distributed. This implies that the probability of finding identical 4-grams by random chance in unrelated sequences is low. A Bayesian probabilistic model was developed to test this hypothesis. For each protein family in Pfam-A and PIR-PSD, a feature vector called a probe was constructed from the set of 4-grams that best characterised the family. In rigorous jackknife tests, unknown sequences from Pfam-A and PIR-PSD were compared with the probes for each family. A classification result was deemed a true positive if the probe match with the highest probability was in first place in a rank-ordered list. This was achieved in 70% of cases. Analysis of false positives suggested that the precision might approach 85% if selected families were clustered into subsets. Case studies indicated that the 4-grams in common between an unknown and the best matching probe correlated with functional motifs from PRINTS. The results showed that remote homologues and functional motifs could be identified from an analysis of 4-gram patterns.
Collapse
Affiliation(s)
- John K Vries
- Department of Molecular Genetics and Biochemistry, School of Medicine, Center for Computational Biology and Bioinformatics, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15213, USA.
| | | | | | | | | | | |
Collapse
|
23
|
Magliery TJ, Regan L. Sequence variation in ligand binding sites in proteins. BMC Bioinformatics 2005; 6:240. [PMID: 16194281 PMCID: PMC1261162 DOI: 10.1186/1471-2105-6-240] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2005] [Accepted: 09/30/2005] [Indexed: 11/25/2022] Open
Abstract
Background The recent explosion in the availability of complete genome sequences has led to the cataloging of tens of thousands of new proteins and putative proteins. Many of these proteins can be structurally or functionally categorized from sequence conservation alone. In contrast, little attention has been given to the meaning of poorly-conserved sites in families of proteins, which are typically assumed to be of little structural or functional importance. Results Recently, using statistical free energy analysis of tetratricopeptide repeat (TPR) domains, we observed that positions in contact with peptide ligands are more variable than surface positions in general. Here we show that statistical analysis of TPRs, ankyrin repeats, Cys2His2 zinc fingers and PDZ domains accurately identifies specificity-determining positions by their sequence variation. Sequence variation is measured as deviation from a neutral reference state, and we present probabilistic and information theory formalisms that improve upon recently suggested methods such as statistical free energies and sequence entropies. Conclusion Sequence variation has been used to identify functionally-important residues in four selected protein families. With TPRs and ankyrin repeats, protein families that bind highly diverse ligands, the effect is so pronounced that sequence "hypervariation" alone can be used to predict ligand binding sites.
Collapse
Affiliation(s)
- Thomas J Magliery
- Department of Molecular Biophysics & Biochemistry, Yale University, P.O. Box 208114, New Haven, CT 06520-8114, USA
- Present address: Department of Chemistry and Department of Biochemistry, The Ohio State University, 100 W. 18Ave., Columbus, OH 43210, USA
| | - Lynne Regan
- Department of Molecular Biophysics & Biochemistry, Yale University, P.O. Box 208114, New Haven, CT 06520-8114, USA
- Department of Chemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
24
|
Abstract
MOTIVATION Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local contacts between residues could improve comparative modeling, fold recognition and could assist in the experimental structure determination. RESULTS Here, we introduced PROFcon, a novel contact prediction method that combines information from alignments, from predictions of secondary structure and solvent accessibility, from the region between two residues and from the average properties of the entire protein. In contrast to some other methods, PROFcon predicted short and long proteins at similar levels of accuracy. As expected, PROFcon was clearly less accurate when tested on sparse evolutionary profiles, that is, on families with few homologs. Prediction accuracy was highest for proteins belonging to the SCOP alpha/beta class. PROFcon compared favorably with state-of-the-art prediction methods at the CASP6 meeting. While the performance may still be perceived as low, our method clearly pushed the mark higher. Furthermore, predictions are already accurate enough to seed predictions of global features of protein structure.
Collapse
Affiliation(s)
- Marco Punta
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
25
|
Namboori S, Mhatre N, Sujatha S, Srinivasan N, Pandit SB. Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome of Mycobacterium tuberculosis H37Rv. J Biosci 2005; 29:245-59. [PMID: 15381846 DOI: 10.1007/bf02702607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The sequencing of the Mycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78 % of the encoded gene products. For 69 % of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by approximately 11%. Remote similarity detection methods have enabled domain assignments for 1325 'hypothetical proteins'. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at http://hodgkin.mbu.iisc.ernet.in/~dots.
Collapse
Affiliation(s)
- Seema Namboori
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012
| | | | | | | | | |
Collapse
|
26
|
Marabotti A, D'Auria S, Rossi M, Facchiano AM. Theoretical model of the three-dimensional structure of a sugar-binding protein from Pyrococcus horikoshii: structural analysis and sugar-binding simulations. Biochem J 2004; 380:677-84. [PMID: 15015939 PMCID: PMC1224218 DOI: 10.1042/bj20031876] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2003] [Revised: 03/11/2004] [Accepted: 03/12/2004] [Indexed: 11/17/2022]
Abstract
The three-dimensional structure of a sugar-binding protein from the thermophilic archaea Pyrococcus horikoshii has been predicted by a homology modelling procedure and investigated for its stability and its ability to bind different sugars. The model was created by using as templates the three-dimensional structures of a maltodextrin-binding protein from Pyrococcus furiosus, a trehalose-maltose-binding protein from Thermococcus litoralis and a maltodextrin-binding protein from Escherichia coli. According to the suggestions from the CASP (Critical Assessment of Structure Prediction) meetings, the homology modelling strategy was applied by assessing an accurate multiple sequence alignment, based on the high structural conservation in the family of ATP-binding cassette transporters to which all these proteins belong. The model has been deposited in the Protein Data Bank with the code 1R25. According to the origin of the protein, several characteristics in the organization of the secondary-structure elements and in the distribution of polar and non-polar amino acids are very similar to those of thermophilic proteins, compared with proteins from mesophilic organisms, and are analysed in detail. Finally, a simulation of the binding of several sugars in the binding site of this protein is presented, and interactions with amino acids are highlighted in detail.
Collapse
Affiliation(s)
- Anna Marabotti
- Laboratory of Bioinformatics, Institute of Food Science, Italian National Research Council, Via Roma 52A/C, 83100 Avellino, Italy
| | | | | | | |
Collapse
|
27
|
Toriumi C, Imai K. An identification method for altered proteins in tissues utilizing fluorescence derivatization, liquid chromatography, tandem mass spectrometry, and a database-searching algorithm. Anal Chem 2004; 75:3725-30. [PMID: 14572036 DOI: 10.1021/ac020693x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) is now widely used as a tool for proteomic studies. For the sensitive determination of proteins in 2D-PAGE, fluorescence derivatization of primary amino moieties of proteins with cyanine dyes was recently developed. However, precipitation of the proteins could occur if completely derivatized because of the lower solubility of the resultant derivatives owing to the hydrophobicity of the reagents and the loss of the hydrophilic primary amino moieties. Thus, in this paper, a water-soluble and thiol-specific fluorogenic reagent, ammonium 7-fluoro-2,1,3-benzoxadiazole-4-sulfonate, was adopted for the derivatization of proteins in tissues either with and without stimulation. Then, the method follows a separation of the derivatives by liquid chromatography with fluorescence detection, an isolation of only the altered proteins, an enzymatic digestion of the isolated proteins, and an identification of the proteins by liquid chromatography/MS/MS with the database-searching algorithm. By using this method, we identified the altered expressions of five increased proteins (e.g., pancreatic polypeptide) as well as three decreased proteins (e.g., insulin 2) in the islets of Langerhans in Wistar rats 2 days after they were subcutaneously administered with dexamethasone.
Collapse
Affiliation(s)
- Chifuyu Toriumi
- Laboratory of Bio-Analytical Chemistry, Graduate School of Pharmaceutical Sciences, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | | |
Collapse
|
28
|
Guo J, Chen H, Sun Z, Lin Y. A novel method for protein secondary structure prediction using dual-layer SVM and profiles. Proteins 2004; 54:738-43. [PMID: 14997569 DOI: 10.1002/prot.10634] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A high-performance method was developed for protein secondary structure prediction based on the dual-layer support vector machine (SVM) and position-specific scoring matrices (PSSMs). SVM is a new machine learning technology that has been successfully applied in solving problems in the field of bioinformatics. The SVM's performance is usually better than that of traditional machine learning approaches. The performance was further improved by combining PSSM profiles with the SVM analysis. The PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the second SVM layer output. On the CB513 data set, the three-state overall per-residue accuracy, Q3, reached 75.2%, while segment overlap (SOV) accuracy increased to 80.0%. On the CB396 data set, the Q3 of our method reached 74.0% and the SOV reached 78.1%. A web server utilizing the method has been constructed and is available at http://www.bioinfo.tsinghua.edu.cn/pmsvm.
Collapse
Affiliation(s)
- Jian Guo
- Institute of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing, China
| | | | | | | |
Collapse
|
29
|
Conrad C, Vianna C, Schultz C, Thal DR, Ghebremedhin E, Lenz J, Braak H, Davies P. Molecular evolution and genetics of the Saitohin gene and tau haplotype in Alzheimer's disease and argyrophilic grain disease. J Neurochem 2004; 89:179-88. [PMID: 15030402 DOI: 10.1046/j.1471-4159.2004.02320.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A single nucleotide polymorphism that results in an amino acid change (Q7R) has been identified in the Saitohin (STH) gene and was initially found to be over-represented in the homozygous state in subjects with late-onset Alzheimer's disease (AD). More extensive studies provide limited support for the association with AD, but confirm an association of the Q allele with progressive supranuclear palsy and argyrophilic grain disease. A homologous sequence was found in the appropriate location of the rat and mouse tau genes, but there was no open reading frame allowing STH expression in these species, suggesting relatively recent evolution of this gene. In some non-human primates, the STH gene was identified, and this was found to differ from the human gene at two of 128 amino acids. All primates in which the STH gene was identified were homozygous for the R allele of STH, suggesting this is the ancestral allele. This observation was surprising, in that the Q allele is more common in human populations, and raises the possibility that natural selection has operated to favor individuals carrying this allele. The STH polymorphism is part of the tau gene haplotype, of which two major variants exist in human populations, the Q being part of the H1 haplotype and the R part of the H2 haplotype. More detailed studies confirm the H2 haplotype to be the ancestral tau gene. This situation is reminiscent of the evolution of the apolipoprotein (ApoE) gene, another locus that is potentially important for the risk of development of AD.
Collapse
Affiliation(s)
- Chris Conrad
- Department of Pathology, Albert Einstein College of Medicine, Bronx, New York, USA.
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Although bioinformatics achieved prominence because of its central role in genome data storage, management and analysis, its focus has shifted as the life sciences exploit these data. In pharmacology, genomic, transcriptomic and proteomic data are being used in the quest for drugs that fulfill unmet medical needs, are disease modifying or curative and are more effective and safer than current drugs. Bioinformatics is used in drug target identification and validation and in the development of biomarkers and toxicogenomic and pharmacogenomic tools to maximize the therapeutic benefit of drugs. Now that the 'parts list' of cellular signalling pathways is available, integrated computational and experimental programmes are being developed, with the goal of enabling in silico pharmacology by linking the genome, transcriptome and proteome to cellular pathophysiology.
Collapse
Affiliation(s)
- Paul A Whittaker
- Novartis Respiratory Research Centre, Wimblehurst Road, Horsham, West Sussex RH12 5AB, UK.
| |
Collapse
|
31
|
Abstract
Protein-protein interactions are facilitated by a myriad of residue-residue contacts on the interacting proteins. Identifying the site of interaction in the protein is a key for deciphering its functional mechanisms, and is crucial for drug development. Many studies indicate that the compositions of contacting residues are unique. Here, we describe a neural network that identifies protein-protein interfaces from sequence. For the most strongly predicted sites (in 34 of 333 proteins), 94% of the predictions were confirmed experimentally. When 70% of our predictions were right, we correctly predicted at least one interaction site in 20% of the complexes (66/333). These results indicate that the prediction of some interaction sites from sequence alone is possible. Incorporating evolutionary and predicted structural information may improve our method. However, even at this early stage, our tool might already assist wet-lab biology.
Collapse
Affiliation(s)
- Yanay Ofran
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
32
|
Lan N, Montelione GT, Gerstein M. Ontologies for proteomics: towards a systematic definition of structure and function that scales to the genome level. Curr Opin Chem Biol 2003; 7:44-54. [PMID: 12547426 DOI: 10.1016/s1367-5931(02)00020-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A principal aim of post-genomic biology is elucidating the structures, functions and biochemical properties of all gene products in a genome. However, to adequately comprehend such a large amount of information we need new descriptions of proteins that scale to the genomic level. In short, we need a unified ontology for proteomics. Much progress has been made towards this end, including a variety of approaches to systematic structural and functional classification and initial work towards developing standardized, unified descriptions for protein properties. In relation to function, there is a particularly great diversity of approaches, involving placing a protein in structured hierarchies or more-generalized networks and a recent approach based on circumscribing a protein's function through systematic enumeration of molecular interactions.
Collapse
Affiliation(s)
- Ning Lan
- Department of Molecular Biophysics, New Haven, CT 06520, USA.
| | | | | |
Collapse
|
33
|
McDonald JD, Andriolo M, Calì F, Mirisola M, Puglisi-Allegra S, Romano V, Sarkissian CN, Smith CB. The phenylketonuria mouse model: a meeting review. Mol Genet Metab 2002; 76:256-61. [PMID: 12208130 DOI: 10.1016/s1096-7192(02)00115-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- J David McDonald
- Department of Biological Sciences, Wichita State University, Kansas, USA.
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Hart AL, Stagg AJ, Frame M, Graffner H, Glise H, Falk P, Kamm MA. The role of the gut flora in health and disease, and its modification as therapy. Aliment Pharmacol Ther 2002; 16:1383-93. [PMID: 12182739 DOI: 10.1046/j.1365-2036.2002.01310.x] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The gut flora is a vast interior ecosystem whose nature is only beginning to be unravelled, due to the emergence of sophisticated molecular tools. Techniques such as 16S ribosomal RNA analysis, polymerase chain reaction amplification and the use of DNA microarrays now facilitate rapid identification and characterization of species resistant to conventional culture and possibly unknown species. Life-long cross-talk between the host and the gut flora determines whether health is maintained or disease intervenes. An understanding of these bacteria-bacteria and bacteria-host immune and epithelial cell interactions is likely to lead to a greater insight into disease pathogenesis. Studies of single organism-epithelial interactions have revealed the large range of metabolic processes that gut bacteria may influence. In inflammatory bowel diseases, bacteria drive the inflammatory process, and genetic predisposition to disease identified to date, such as the recently described NOD2/CARD15 gene variants, may relate to altered bacterial recognition. Extra-intestinal disorders, such as atopy and arthritis, may also have an altered gut milieu as their basis. Clinical evidence is emerging that the modification of this internal environment, using either antibiotics or probiotic bacteria, is beneficial in preventing and treating disease. This natural and apparently safe approach holds great appeal.
Collapse
Affiliation(s)
- A L Hart
- St. Mark's Hospital, Harrow, Middlesex, UK
| | | | | | | | | | | | | |
Collapse
|
35
|
Wang HW, Sharp TV, Koumi A, Koentges G, Boshoff C. Characterization of an anti-apoptotic glycoprotein encoded by Kaposi's sarcoma-associated herpesvirus which resembles a spliced variant of human survivin. EMBO J 2002; 21:2602-15. [PMID: 12032073 PMCID: PMC126038 DOI: 10.1093/emboj/21.11.2602] [Citation(s) in RCA: 129] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
We have investigated the expression and function of a novel protein encoded by open reading frame (ORF) K7 of Kaposi's sarcoma-associated herpesvirus (KSHV). Computational analyses revealed that K7 is structurally related to survivin-DeltaEx3, a splice variant of human survivin that protects cells from apoptosis by an undefined mechanism. Both K7 and survivin-DeltaEx3 contain a mitochondrial-targeting sequence, an N-terminal region of a BIR (baculovirus IAP repeat) domain and a putative BH2 (Bcl-2 homology)-like domain. These suggested that K7 is a new viral anti-apoptotic protein and survivin-DeltaEx3 is its likely cellular homologue. We show that K7 is a glycoprotein, which can inhibit apoptosis and anchor to intracellular membranes where Bcl-2 resides. K7 does not associate with Bax, but does bind to Bcl-2 via its putative BH2 domain. In addition, K7 binds to active caspase-3 via its BIR domain and thus inhibits the activity of caspase-3. The BH2 domain of K7 is crucial for the inhibition of caspase-3 activity and is therefore essential for its anti-apoptotic function. Furthermore, K7 bridges Bcl-2 and activated caspase-3 into a protein complex. K7 therefore appears to be an adaptor protein and part of an anti-apoptotic complex that presents effector caspases to Bcl-2, enabling Bcl-2 to inhibit caspase activity. These data also suggest that survivin-DeltaEx3 might function by a similar mechanism to that of K7. We denote K7 as vIAP (viral inhibitor-of-apoptosis protein).
Collapse
MESH Headings
- Alternative Splicing
- Amino Acid Sequence
- Apoptosis
- Blotting, Northern
- Caspase 3
- Caspases/metabolism
- Cell Line
- Chromosomal Proteins, Non-Histone/chemistry
- Cloning, Molecular
- DNA, Complementary/metabolism
- Endoplasmic Reticulum/metabolism
- Glutathione Transferase/metabolism
- Glycoproteins/metabolism
- Herpesvirus 8, Human/genetics
- Herpesvirus 8, Human/metabolism
- Humans
- Inhibitor of Apoptosis Proteins
- Microscopy, Fluorescence
- Microtubule-Associated Proteins
- Mitochondria/metabolism
- Models, Biological
- Models, Molecular
- Molecular Sequence Data
- Mutation
- Neoplasm Proteins
- Oligonucleotide Array Sequence Analysis
- Open Reading Frames
- Phylogeny
- Protein Binding
- Protein Structure, Tertiary
- Proto-Oncogene Proteins c-bcl-2/metabolism
- Sequence Homology, Amino Acid
- Software
- Subcellular Fractions/metabolism
- Survivin
- Transfection
Collapse
Affiliation(s)
| | | | | | | | - Chris Boshoff
- The Cancer Research UK Viral Oncology Group, Wolfson Institute for Biomedical Research, Cruciform Building, University College London, London WC1E 6BT, UK
Corresponding author e-mail:
| |
Collapse
|
36
|
Stahura FL, Bajorath J. Bio- and chemo-informatics beyond data management: crucial challenges and future opportunities. Drug Discov Today 2002; 7:S41-7. [PMID: 12047879 DOI: 10.1016/s1359-6446(02)02271-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Bio- and chemo-informatics are now thought to be crucial to the success and integration of biotechnology and drug discovery. Research in this area has expanded to go beyond data- and information-management. Here, we review exemplary areas, such as target identification and validation, virtual screening, and prediction of downstream characteristics of leads, where further research will play a key role in progressing the field.
Collapse
Affiliation(s)
- Florence L Stahura
- Albany Molecular Research, Bothell Research Center, (AMRI-BRC), 18804 North Creek Parkway, Bothell, WA 98011, USA
| | | |
Collapse
|
37
|
Dieckman L, Gu M, Stols L, Donnelly MI, Collart FR. High throughput methods for gene cloning and expression. Protein Expr Purif 2002; 25:1-7. [PMID: 12071692 DOI: 10.1006/prep.2001.1602] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We outline a high throughput process for the production of bacterial expression clones using automated liquid handlers. The protocol consists of a series of interlinked methods representing liquid manipulations or incubations on various stations of the automation system. The methods employ the ligation-independent cloning approach that enables the simultaneous production of plasmids for different expression systems. The current cloning protocol spans 3 days with a linear throughput of 400 targets per production run. This automated approach enables the production of large numbers of bacterial expression clones and ultimately purified proteins. Although they were developed for structural genomics, these molecular protocols can also be applied in high throughput strategies such as those used for site-specific mutagenesis or protein interaction studies.
Collapse
Affiliation(s)
- Lynda Dieckman
- Biosciences Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | | | | | | |
Collapse
|
38
|
Abstract
The genomes of over 60 organisms from all three kingdoms of life are now entirely sequenced. In many respects, the inventory of proteins used in different kingdoms appears surprisingly similar. However, eukaryotes differ from other kingdoms in that they use many long proteins, and have more proteins with coiled-coil helices and with regions abundant in regular secondary structure. Particular structural domains are used in many pathways. Nevertheless, one domain tends to occur only once in one particular pathway. Many proteins do not have close homologues in different species (orphans) and there could even be folds that are specific to one species. This view implies that protein fold space is discrete. An alternative model suggests that structure space is continuous and that modern proteins evolved by aggregating fragments of ancient proteins. Either way, after having harvested proteomes by applying standard tools, the challenge now seems to be to develop better methods for comparative proteomics.
Collapse
Affiliation(s)
- Burkhard Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street, BB217, New York, NY 10032, USA.
| |
Collapse
|
39
|
Ueberle B, Frank R, Herrmann R. The proteome of the bacterium Mycoplasma pneumoniae: comparing predicted open reading frames to identified gene products. Proteomics 2002; 2:754-64. [PMID: 12112859 DOI: 10.1002/1615-9861(200206)2:6<754::aid-prot754>3.0.co;2-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
An existing proteome map of the bacterium Mycoplasma pneumoniae comprising proteins from 224 genes was extended to 305 genes. This corresponds to about 44% of the 688 proposed genome sequence derived open reading frames (ORFs). The newly assigned gene products were enriched, separated by one-dimensional or two-dimensional (2-D) gel electrophoresis and identified by mass spectrometry. The enrichment procedures included differential centrifugation, anion and cation exchange chromatography, affinity chromatography with heparin as a ligand and isolation of biotinylated proteins by binding to immobilized streptavidin. A comparative analysis of the identified proteins from 305 genes with the as yet unverified 383 ORFs concerning isoelectric point, molecular weight and number of transmembrane segments revealed that proteins with more than three predicted transmembrane segments and an isoelectric point above 10.5 are most likely not to be separated by 2-D gel electrophoresis. The mutual benefits of genomics and proteomics were shown by the identification of a todate unannotated 128 amino acid long protein.
Collapse
Affiliation(s)
- Barbara Ueberle
- Zentrum für Molekulare Biologie, Universität Heidelberg, Heidelberg, Germany
| | | | | |
Collapse
|
40
|
Conrad C, Vianna C, Freeman M, Davies P. A polymorphic gene nested within an intron of the tau gene: implications for Alzheimer's disease. Proc Natl Acad Sci U S A 2002; 99:7751-6. [PMID: 12032355 PMCID: PMC124341 DOI: 10.1073/pnas.112194599] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A previously undescribed gene, Saitohin (STH), has been discovered in the intron between exons 9 and 10 of the human tau gene. STH is an intronless gene that encodes a 128-aa protein with no clear homologs. The tissue expression of STH is similar to tau, a gene that is implicated in many neurodegenerative disorders. In humans, a single nucleotide polymorphism that results in an amino acid change (Q7R) has been identified in STH and was used in a case control study. The Q7R polymorphism appears to be over-represented in the homozygous state in late onset Alzheimer's disease subjects.
Collapse
Affiliation(s)
- Chris Conrad
- Department of Pathology, F526, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| | | | | | | |
Collapse
|
41
|
Abstract
OBJECTIVES To review the advances in clinically useful molecular biological techniques and to identify their applications in clinical practice, as presented at the Tenth Annual William Beaumont Hospital DNA Symposium. DATA SOURCES The 11 manuscripts submitted were reviewed and their major findings were compared with literature on the same topic. STUDY SELECTION Manuscripts address creative thinking techniques applied to DNA discovery, extraction of DNA from clotted blood, the relationship of mitochondrial dysfunction in neurodegenerative disorders, and molecular methods to identify human lymphocyte antigen class I and class II loci. Two other manuscripts review current issues in molecular microbiology, including detection of hepatitis C virus and biological warfare. The last 5 manuscripts describe current issues in molecular cardiovascular disease, including assessing thrombotic risk, genomic analysis, gene therapy, and a device for aiding in cardiac angiogenesis. DATA SYNTHESIS Novel problem-solving techniques have been used in the past and will be required in the future in DNA discovery. The extraction of DNA from clotted blood demonstrates a potential cost-effective strategy. Cybrids created from mitochondrial DNA-depleted cells and mitochondrial DNA from a platelet donor have been useful in defining the role mitochondria play in neurodegeneration. Mitochondrial depletion has been reported as a genetically inherited disorder or after human immunodeficiency virus therapy. Hepatitis C viral detection by qualitative, quantitative, or genotyping techniques is useful clinically. Preparedness for potential biological warfare is a responsibility of all clinical laboratorians. Thrombotic risk in cardiovascular disorders may be assessed by coagulation screening assays and further defined by mutation analysis for specific genes for prothrombin and factor V Leiden. Gene therapy for reducing arteriosclerotic risk has been hindered primarily by complications introduced by the vectors used to introduce the therapeutic genes. Neovascularization in cardiac muscle with occluded vessels represents a promising method for recovery of viable tissue following ischemia. CONCLUSIONS The sequence of the human genome was reported by 2 groups in February 2001. The postgenomic era will emphasize the use of microarrays and database software for genomic and proteomic screening in the search for useful clinical assays. The number of molecular pathologic techniques and assays will expand as additional disease-associated mutations are defined. Gene therapy and tissue engineering will represent successful therapeutic adjuncts.
Collapse
Affiliation(s)
- Frederick L Kiechle
- Department of Clinical Pathology, William Beaumont Hospital, Royal Oak, MI 48073-6769, USA.
| | | |
Collapse
|
42
|
Abstract
Rapid progress in structural biology and whole-genome sequencing technology means that, for many protein families, structural and evolutionary information are readily available. Recent developments demonstrate how this information can be integrated to identify canonical determinants of protein structure and function. Among these determinants, those residues that are on protein surfaces are especially likely to form binding sites and are the logical choice for further mutational analysis and drug targeting.
Collapse
Affiliation(s)
- Olivier Lichtarge
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, Texas 77030, USA.
| | | |
Collapse
|
43
|
LIU LEYUAN, VO AMY, LIU GUOQIN, MCKEEHAN WALLACEL. Novel complex integrating mitochondria and the microtubular cytoskeleton with chromosome remodeling and tumor suppressor RASSF1 deduced by in silico homology analysis, interaction cloning in yeast, and colocalization in cultured cells. In Vitro Cell Dev Biol Anim 2002; 38:582-94. [PMID: 12762840 PMCID: PMC3225227 DOI: 10.1290/1543-706x(2002)38<582:ncimat>2.0.co;2] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Availability of the complete sequence of the human genome and sequence homology analysis has accelerated new protein discovery and clues to protein function. Protein-protein interaction cloning suggests multisubunit complexes and pathways. Here, we combine these molecular approaches with cultured cell colocalization analysis to suggest a novel complex and a pathway that integrate the mitochondrial location and the microtubular cytoskeleton with chromosome remodeling, apoptosis, and tumor suppression based on a novel leucine-rich pentatricopeptide repeat-motif-containing protein (LRPPRC) that copurified with the fibroblast growth factor receptor complex. One round of interaction cloning and sequence homology analysis defined a primary LRPPRC complex with novel subunits cat eye syndrome chromosome region candidate 2 (CECR2), ubiquitously expressed transcript (UXT), and chromosome 19 open reading frames 5 (C19ORF5) but still of unknown function. Immuno, deoxyribonucleic acid (DNA), and green fluorescent protein (GFP) tag colocalization analyses revealed that LRPPRC appears in both cytosol and nuclei of cultured cells, colocalizes with mitochondria and beta-tubulin rather than with alpha-actin in the cytosol of interphase cells, and exhibits phase-dependent organization around separating chromosomes in mitotic cells. GFP-tagged CECR2B was strictly nuclear and colocalized with condensed DNA in apoptotic cells. GFP-tagged UXT and GFP-tagged C19ORF5 appeared in both cytosol and nuclei and colocalized with LRPPRC and beta-tubulin. Cells exhibiting nuclear C19ORF5 were apoptotic. Screening for interactive substrates with the primary LRPPRC substrates in the human liver complementary DNA library revealed that CECR2B interacted with chromatin-associated TFIID-associated protein TAFII30 and ribonucleic acid splicing factor SRP40, UXT bridged to CBP/p300-binding factor CITED2 and kinetochore-associated factor BUB3, and C19ORF5 complexed with mitochondria-associated NADH dehydrogenase I and cytochrome c oxidase I. C19ORF5 also interacted with RASSF1, providing a bridge to apoptosis and tumor suppression.
Collapse
|
44
|
Knudsen B, Miyamoto MM. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc Natl Acad Sci U S A 2001; 98:14512-7. [PMID: 11734650 PMCID: PMC64713 DOI: 10.1073/pnas.251526398] [Citation(s) in RCA: 88] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Changes in protein function can lead to changes in the selection acting on specific residues. This can often be detected as evolutionary rate changes at the sites in question. A maximum-likelihood method for detecting evolutionary rate shifts at specific protein positions is presented. The method determines significance values of the rate differences to give a sound statistical foundation for the conclusions drawn from the analyses. A statistical test for detecting slowly evolving sites is also described. The methods are applied to a set of Myc proteins for the identification of both conserved sites and those with changing evolutionary rates. Those positions with conserved and changing rates are related to the structures and functions of their proteins. The results are compared with an earlier Bayesian method, thereby highlighting the advantages of the new likelihood ratio tests.
Collapse
Affiliation(s)
- B Knudsen
- Bioinformatics Research Center, University of Aarhus, Høegh Guldbergsgade 10, Building 090, DK-8000 Arhus C, Denmark.
| | | |
Collapse
|
45
|
Bajorath J. Rational drug discovery revisited: interfacing experimental programs with bio- and chemo-informatics. Drug Discov Today 2001; 6:989-995. [PMID: 11576865 DOI: 10.1016/s1359-6446(01)01961-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Over the past few years, bio- and chemo-informatics have rapidly evolved as related yet distinct disciplines. In drug discovery, it is increasingly recognized that combining and integrating these approaches is crucial for their successful application. In addition, the use of complementary experimental and informatics techniques increases the chances of success in many stages of the discovery process, from the identification of novel targets and elucidation of their functions to the discovery and development of lead compounds with desired properties. This review highlights recent trends that emphasize the role of integrated bio- and chemo-informatics research in drug discovery and discusses representative concepts and methodologies.
Collapse
Affiliation(s)
- J Bajorath
- Albany Molecular Research, Bothell Research Center, 18804 North Creek Parkway, 98011, Bothell, WA, USA
| |
Collapse
|
46
|
Abstract
Structural genomics projects aim to provide an experimental or computational three-dimensional model structure for all of the tractable macromolecules that are encoded by complete genomes. To this end, pilot centres worldwide are now exploring the feasibility of large-scale structure determination. Their experimental structures and computational models are expected to yield insight into the molecular function and mechanism of thousands of proteins. The pervasiveness of this information is likely to change the use of structure in molecular biology and biochemistry.
Collapse
Affiliation(s)
- S E Brenner
- Department of Plant and Microbial Biology, University of California, 461A Koshland Hall, Berkeley, California 94720-3102, USA.
| |
Collapse
|
47
|
Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M. Interrelating different types of genomic data, from proteome to secretome: 'oming in on function. Genome Res 2001; 11:1463-8. [PMID: 11544189 DOI: 10.1101/gr.207401] [Citation(s) in RCA: 114] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
With the completion of genome sequences, the current challenge for biology is to determine the functions of all gene products and to understand how they contribute in making an organism viable. For the first time, biological systems can be viewed as being finite, with a limited set of molecular parts. However, the full range of biological processes controlled by these parts is extremely complex. Thus, a key approach in genomic research is to divide the cellular contents into distinct sub-populations, which are often given an "-omic" term. For example, the proteome is the full complement of proteins encoded by the genome, and the secretome is the part of it secreted from the cell. Carrying this further, we suggest the term "translatome" to describe the members of the proteome weighted by their abundance, and the "functome" to describe all the functions carried out by these. Once the individual sub-populations are defined and analyzed, we can then try to reconstruct the full organism by interrelating them, eventually allowing for a full and dynamic view of the cell. All this is, of course, made possible because of the increasing amount of large-scale data resulting from functional genomics experiments. However, there are still many difficulties resulting from the noisiness and complexity of the information. To some degree, these can be overcome through averaging with broad proteomic categories such as those implicit in functional and structural classifications. For illustration, we discuss one example in detail, interrelating transcript and cellular protein populations (transcriptome and translatome). Further information is available at http://bioinfo.mbb.yale.edu/what-is-it.
Collapse
Affiliation(s)
- D Greenbaum
- Department of Genetics, Yale University, New Haven, Connecticut 06520-8114, USA
| | | | | | | | | |
Collapse
|
48
|
|