1
|
Yu ZZ, Peng CX, Liu J, Zhang B, Zhou XG, Zhang GJ. DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance. IEEE/ACM Trans Comput Biol Bioinform 2023; 20:912-922. [PMID: 35594218 DOI: 10.1109/tcbb.2022.3175905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.
Collapse
|
2
|
Sánchez BJ, Mubaid S, Busque S, de los Santos Y, Ashour K, Sadek J, Lian X, Khattak S, Di Marco S, Gallouzi IE. The formation of HuR/YB1 complex is required for the stabilization of target mRNA to promote myogenesis. Nucleic Acids Res 2023; 51:1375-1392. [PMID: 36629268 PMCID: PMC9943665 DOI: 10.1093/nar/gkac1245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 12/14/2022] [Indexed: 01/12/2023] Open
Abstract
mRNA stability is the mechanism by which cells protect transcripts allowing their expression to execute various functions that affect cell metabolism and fate. It is well-established that RNA binding proteins (RBPs) such as HuR use their ability to stabilize mRNA targets to modulate vital processes such as muscle fiber formation (myogenesis). However, the machinery and the mechanisms regulating mRNA stabilization are still elusive. Here, we identified Y-Box binding protein 1 (YB1) as an indispensable HuR binding partner for mRNA stabilization and promotion of myogenesis. Both HuR and YB1 bind to 409 common mRNA targets, 147 of which contain a U-rich consensus motif in their 3' untranslated region (3'UTR) that can also be found in mRNA targets in other cell systems. YB1 and HuR form a heterodimer that associates with the U-rich consensus motif to stabilize key promyogenic mRNAs. The formation of this complex involves a small domain in HuR (227-234) that if mutated prevents HuR from reestablishing myogenesis in siHuR-treated muscle cells. Together our data uncover that YB1 is a key player in HuR-mediated stabilization of pro-myogenic mRNAs and provide the first indication that the mRNA stability mechanism is as complex as other key cellular processes such as mRNA decay and translation.
Collapse
Affiliation(s)
- Brenda Janice Sánchez
- KAUST Smart-Health Initiative King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia,KAUST Biological Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia,Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | - Souad Mubaid
- Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | - Sandrine Busque
- Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | - Yossef Lopez de los Santos
- KAUST Biological Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia
| | - Kholoud Ashour
- Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | - Jason Sadek
- Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | - Xian Jin Lian
- Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | - Shahryar Khattak
- KAUST Smart-Health Initiative King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia,KAUST Biological Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia
| | - Sergio Di Marco
- KAUST Smart-Health Initiative King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia,KAUST Biological Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia,Dept. of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G1Y6, Canada,Rosalind & Morris Goodman Cancer Institute, McGill University, 1160 Pine Avenue, Montreal, QC H3A1A3, Canada
| | | |
Collapse
|
3
|
Wang L, Zhong H, Xue Z, Wang Y. Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM. Bioinform Adv 2022; 2:vbac060. [PMID: 36699417 PMCID: PMC9710680 DOI: 10.1093/bioadv/vbac060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Revised: 07/01/2022] [Accepted: 08/30/2022] [Indexed: 01/28/2023]
Abstract
Motivation Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. Results In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew's correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. Availability and implementation All source code, datasets and model are available at http://isyslab.info/Res-Dom/.
Collapse
Affiliation(s)
- Lei Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.,School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.,School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.,School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
4
|
Grinkevich VV, Vema A, Fawkner K, Issaeva N, Andreotti V, Dickinson ER, Hedström E, Spinnler C, Inga A, Larsson LG, Karlén A, Wilhelm M, Barran PE, Okorokov AL, Selivanova G, Zawacka-Pankau JE. Novel Allosteric Mechanism of Dual p53/MDM2 and p53/MDM4 Inhibition by a Small Molecule. Front Mol Biosci 2022; 9:823195. [PMID: 35720128 PMCID: PMC9198586 DOI: 10.3389/fmolb.2022.823195] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 04/26/2022] [Indexed: 01/26/2023] Open
Abstract
Restoration of the p53 tumor suppressor for personalised cancer therapy is a promising treatment strategy. However, several high-affinity MDM2 inhibitors have shown substantial side effects in clinical trials. Thus, elucidation of the molecular mechanisms of action of p53 reactivating molecules with alternative functional principle is of the utmost importance. Here, we report a discovery of a novel allosteric mechanism of p53 reactivation through targeting the p53 N-terminus which promotes inhibition of both p53/MDM2 (murine double minute 2) and p53/MDM4 interactions. Using biochemical assays and molecular docking, we identified the binding site of two p53 reactivating molecules, RITA (reactivation of p53 and induction of tumor cell apoptosis) and protoporphyrin IX (PpIX). Ion mobility-mass spectrometry revealed that the binding of RITA to serine 33 and serine 37 is responsible for inducing the allosteric shift in p53, which shields the MDM2 binding residues of p53 and prevents its interactions with MDM2 and MDM4. Our results point to an alternative mechanism of blocking p53 interaction with MDM2 and MDM4 and may pave the way for the development of novel allosteric inhibitors of p53/MDM2 and p53/MDM4 interactions.
Collapse
Affiliation(s)
- Vera V. Grinkevich
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Aparna Vema
- Division of Organic Pharmaceutical Chemistry, Department of Medicinal Chemistry, Uppsala University, Uppsala, Sweden
| | - Karin Fawkner
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Natalia Issaeva
- Department of Otolaryngology/Head and Neck Surgery, UNC-Chapel Hill, Chapel Hill, NC, United States
| | - Virginia Andreotti
- IRCCS Ospedale Policlinico San Martino, Genetics of Rare Cancers, Genoa, Italy
| | - Eleanor R. Dickinson
- Manchester Institute of Biotechnology, The School of Chemistry, The University of Manchester, Manchester, United Kingdom
| | - Elisabeth Hedström
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Clemens Spinnler
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Alberto Inga
- Department CIBIO, University of Trento, Trento, Italy
| | - Lars-Gunnar Larsson
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Anders Karlén
- Division of Organic Pharmaceutical Chemistry, Department of Medicinal Chemistry, Uppsala University, Uppsala, Sweden
| | - Margareta Wilhelm
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - Perdita E. Barran
- Manchester Institute of Biotechnology, The School of Chemistry, The University of Manchester, Manchester, United Kingdom
| | - Andrei L. Okorokov
- Wolfson Institute for Biomedical Research, University College London, London, United Kingdom
| | - Galina Selivanova
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden,*Correspondence: Galina Selivanova, ; Joanna E. Zawacka-Pankau,
| | - Joanna E. Zawacka-Pankau
- Department of Medicine, Huddinge, Center for Hematology and Regenerative Medicine, Karolinska Institute, Stockholm, Sweden,*Correspondence: Galina Selivanova, ; Joanna E. Zawacka-Pankau,
| |
Collapse
|
5
|
Cretin G, Galochkina T, Vander Meersche Y, de Brevern AG, Postic G, Gelly JC. SWORD2: hierarchical analysis of protein 3D structures. Nucleic Acids Res 2022; 50:W732-W738. [PMID: 35580056 PMCID: PMC9252838 DOI: 10.1093/nar/gkac370] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/19/2022] [Accepted: 04/29/2022] [Indexed: 11/27/2022] Open
Abstract
Understanding the functions and origins of proteins requires splitting these macromolecules into fragments that could be independent in terms of folding, activity, or evolution. For that purpose, structural domains are the typical level of analysis, but shorter segments, such as subdomains and supersecondary structures, are insightful as well. Here, we propose SWORD2, a web server for exploring how an input protein structure may be decomposed into ‘Protein Units’ that can be hierarchically assembled to delimit structural domains. For each partitioning solution, the relevance of the identified substructures is estimated through different measures. This multilevel analysis is achieved by integrating our previous work on domain delineation, ‘protein peeling’ and model quality assessment. We hope that SWORD2 will be useful to biologists searching for key regions in their proteins of interest and to bioinformaticians building datasets of protein structures. The web server is freely available online: https://www.dsimb.inserm.fr/SWORD2.
Collapse
Affiliation(s)
- Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Guillaume Postic
- Université Paris-Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France.,Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| |
Collapse
|
6
|
Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021; 17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Pegah Golchin
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
7
|
Li G, Zhou X, Li Z, Liu Y, Liu D, Miao Y, Wan Q, Zhang R. Significantly improving the thermostability of a hyperthermophilic GH10 family xylanase XynAF1 by semi-rational design. Appl Microbiol Biotechnol 2021; 105:4561-4576. [PMID: 34014347 DOI: 10.1007/s00253-021-11340-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/16/2021] [Accepted: 05/09/2021] [Indexed: 11/28/2022]
Abstract
Xylanases have a broad range of applications in industrial biotechnologies, which require the enzymes to resist the high-temperature environments. The majority of xylanases have maximum activity at moderate temperatures, which limited their potential applications in industries. In this study, a thermophilic GH10 family xylanase XynAF1 from the high-temperature composting strain Aspergillus fumigatus Z5 was characterized and engineered to further improve its thermostability. XynAF1 has the optimal reaction temperature of 90 °C. The crystal structure of XynAF1 was obtained by X-ray diffraction after heterologous expression, purification, and crystallization. The high-resolution X-ray crystallographic structure of the protein-product complex was obtained by soaking the apo-state crystal with xylotetraose. Structure analysis indicated that XynAF1 has a rigid skeleton, which helps to maintain the hyperthermophilic characteristic. The homologous structure analysis and the catalytic center mutant construction of XynAF1 indicated the conserved catalytic center contributed to the high optimum catalytic temperature. The amino acids in the surface of xylanase XynAF1 which might influence the enzyme thermostability were identified by the structure analysis. Combining the rational design with the saturation mutation at the high B-value regions, the integrative mutant XynAF1-AC with a 6-fold increase of thermostability was finally obtained. This study efficiently improved the thermostability of a GH10 family xylanase by semi-rational design, which provided a new biocatalyst for high-temperature biotechnological applications. KEY POINTS: • Obtained the crystal structure of GH10 family hyperthermophilic xylanase XynAF1. • Shed light on the understanding of the GH10 family xylanase thermophilic mechanism. • Constructed a 6-fold increased thermostability recombinant xylanase.
Collapse
Affiliation(s)
- Guangqi Li
- Key Laboratory of Microbial Resources Collection and Preservation, Ministry of Agriculture, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, People's Republic of China.,Jiangsu Provincial Key Lab for Organic Solid Waste Utilization, National Engineering Research Center for Organic-based Fertilizers, Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China
| | - Xuan Zhou
- National Agricultural Technology Extension and Service Center, Beijing, 100125, People's Republic of China
| | - Zhihong Li
- College of Science, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China
| | - Yunpeng Liu
- Key Laboratory of Microbial Resources Collection and Preservation, Ministry of Agriculture, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, People's Republic of China
| | - Dongyang Liu
- Jiangsu Provincial Key Lab for Organic Solid Waste Utilization, National Engineering Research Center for Organic-based Fertilizers, Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China
| | - Youzhi Miao
- Jiangsu Provincial Key Lab for Organic Solid Waste Utilization, National Engineering Research Center for Organic-based Fertilizers, Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China
| | - Qun Wan
- College of Science, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China. .,The Key Laboratory of Plant Immunity, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China.
| | - Ruifu Zhang
- Key Laboratory of Microbial Resources Collection and Preservation, Ministry of Agriculture, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, People's Republic of China. .,Jiangsu Provincial Key Lab for Organic Solid Waste Utilization, National Engineering Research Center for Organic-based Fertilizers, Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, Nanjing Agricultural University, Nanjing, 210095, People's Republic of China.
| |
Collapse
|
8
|
Wang Y, Zhang H, Zhong H, Xue Z. Protein domain identification methods and online resources. Comput Struct Biotechnol J 2021; 19:1145-1153. [PMID: 33680357 PMCID: PMC7895673 DOI: 10.1016/j.csbj.2021.01.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open
Abstract
Protein domains are the basic units of proteins that can fold, function, and evolve independently. Knowledge of protein domains is critical for protein classification, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Thus, over the past two decades, a number of protein domain identification approaches have been developed, and a variety of protein domain databases have also been constructed. This review divides protein domain prediction methods into two categories, namely sequence-based and structure-based. These methods are introduced in detail, and their advantages and limitations are compared. Furthermore, this review also provides a comprehensive overview of popular online protein domain sequence and structure databases. Finally, we discuss potential improvements of these prediction methods.
Collapse
Affiliation(s)
- Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical College, Yantai, Shandong 264003, China
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Hang Zhang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
9
|
Zheng W, Zhou X, Wuyun Q, Pearce R, Li Y, Zhang Y. FUpred: detecting protein domains through deep-learning-based contact map prediction. Bioinformatics 2020; 36:3749-3757. [PMID: 32227201 DOI: 10.1093/bioinformatics/btaa217] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/27/2020] [Accepted: 03/25/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. RESULTS We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew's correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/FUpred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Qiqige Wuyun
- Computer Science and Engineering Department, Michigan State University, East Lansing, MI 48824, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.,School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Shi Q, Chen W, Huang S, Jin F, Dong Y, Wang Y, Xue Z. DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network. Bioinformatics 2020; 35:5128-5136. [PMID: 31197306 DOI: 10.1093/bioinformatics/btz464] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. RESULTS This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units' models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. AVAILABILITY AND IMPLEMENTATION The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Weiya Chen
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Siqi Huang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Fanglin Jin
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yinghao Dong
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yan Wang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhidong Xue
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
11
|
Joseph AM, Pohl AE, Ball TJ, Abram TG, Johnson DK, Geisbrecht BV, Shames SR. The Legionella pneumophila Metaeffector Lpg2505 (MesI) Regulates SidI-Mediated Translation Inhibition and Novel Glycosyl Hydrolase Activity. Infect Immun 2020; 88:e00853-19. [PMID: 32122942 DOI: 10.1128/IAI.00853-19] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 02/27/2020] [Indexed: 12/19/2022] Open
Abstract
Legionella pneumophila, the etiological agent of Legionnaires' disease, employs an arsenal of hundreds of Dot/Icm-translocated effector proteins to facilitate replication within eukaryotic phagocytes. Several effectors, called metaeffectors, function to regulate the activity of other Dot/Icm-translocated effectors during infection. The metaeffector Lpg2505 is essential for L. pneumophila intracellular replication only when its cognate effector, SidI, is present. SidI is a cytotoxic effector that interacts with the host translation factor eEF1A and potently inhibits eukaryotic protein translation by an unknown mechanism. Here, we evaluated the impact of Lpg2505 on SidI-mediated phenotypes and investigated the mechanism of SidI function. We determined that Lpg2505 binds with nanomolar affinity to SidI and suppresses SidI-mediated inhibition of protein translation. SidI binding to eEF1A and Lpg2505 is not mutually exclusive, and the proteins bind distinct regions of SidI. We also discovered that SidI possesses GDP-dependent glycosyl hydrolase activity and that this activity is regulated by Lpg2505. We have therefore renamed Lpg2505 MesI (metaeffector of SidI). This work reveals novel enzymatic activity for SidI and provides insight into how intracellular replication of L. pneumophila is regulated by a metaeffector.
Collapse
|
12
|
Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat Commun 2019; 10:3977. [PMID: 31484923 PMCID: PMC6726615 DOI: 10.1038/s41467-019-11994-0] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/14/2019] [Indexed: 01/30/2023] Open
Abstract
The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
| |
Collapse
|
13
|
Dulcey CE, López de Los Santos Y, Létourneau M, Déziel E, Doucet N. Semi-rational evolution of the 3-(3-hydroxyalkanoyloxy)alkanoate (HAA) synthase RhlA to improve rhamnolipid production in Pseudomonas aeruginosa and Burkholderia glumae. FEBS J 2019; 286:4036-4059. [PMID: 31177633 DOI: 10.1111/febs.14954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 04/12/2019] [Accepted: 06/06/2019] [Indexed: 12/15/2022]
Abstract
The 3-(3-hydroxyalkanoyloxy)alkanoate (HAA) synthase RhlA is an essential enzyme involved in the biosynthesis of HAAs in Pseudomonas and Burkholderia species. RhlA modulates the aliphatic chain length in rhamnolipids, conferring distinct physicochemical properties to these biosurfactants exhibiting promising industrial and pharmaceutical value. A detailed molecular understanding of substrate specificity and catalytic performance in RhlA could offer protein engineering tools to develop designer variants involved in the synthesis of novel rhamnolipid mixtures for tailored eco-friendly products. However, current directed evolution progress remains limited due to the absence of high-throughput screening methodologies and lack of an experimentally resolved RhlA structure. In the present work, we used comparative modeling and chimeric-based approaches to perform a comprehensive semi-rational mutagenesis of RhlA from Pseudomonas aeruginosa. Our extensive RhlA mutational variants and chimeric hybrids between the Pseudomonas and Burkholderia homologs illustrate selective modulation of rhamnolipid alkyl chain length in both Pseudomonas aeruginosa and Burkholderia glumae. Our results also demonstrate the implication of a putative cap-domain motif that covers the catalytic site of the enzyme and provides substrate specificity to RhlA. This semi-rational mutant-based survey reveals promising 'hot-spots' for the modulation of RL congener patterns and potential control of enzyme activity, in addition to uncovering residue positions that modulate substrate selectivity between the Pseudomonas and Burkholderia functional homologs. DATABASE: Model data are available in the PMDB database under the accession number PM0081867.
Collapse
Affiliation(s)
- Carlos Eduardo Dulcey
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Yossef López de Los Santos
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Myriam Létourneau
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Eric Déziel
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada
| | - Nicolas Doucet
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, Laval, Canada.,PROTEO, the Québec Network for Research on Protein Function, Engineering, and Applications, Université Laval, Canada
| |
Collapse
|
14
|
Wang Y, Wang J, Li R, Shi Q, Xue Z, Zhang Y. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic Acids Res 2019; 45:W400-W407. [PMID: 28498994 PMCID: PMC5793814 DOI: 10.1093/nar/gkx410] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 04/28/2017] [Indexed: 12/21/2022] Open
Abstract
We develop a hierarchical pipeline, ThreaDomEx, for both continuous domain (CD) and discontinuous domain (DCD) structure predictions. Starting from a query sequence, ThreaDomEx first threads it through the PDB to identify multiple structure templates, where a profile of domain conservation score (DC-score) is derived for domain-segment assignment. To further detect DCDs that consist of separated segments along the sequence, a boundary-clustering algorithm is used to refine the DCD-linker locations. In case that the templates do not contain DCDs, a domain-segment assembly process, guided by symmetry comparison, is applied for further DCD detections. ThreaDomEx was tested a set of 1111 proteins and achieved a normalized domain overlap score of 89.3% compared to experimental data, which is significantly higher than other state-of-the-art methods. It also recalls 26.7% of DCDs with 72.7% precision on the proteins for which threading failed to detect any DCDs. The server provides facilities for users to interactively refine the domain models by adjusting DC-score threshold, deleting and adding domain linkers, and assembling domain segments, which are particularly helpful for the hard targets for which current methods have a low accuracy while human-expert knowledge and experimental insights can be used for refining models. ThreaDomEX server is available at http://zhanglab.ccmb.med.umich.edu/ThreaDomEx.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jian Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Ruiming Li
- School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Qiang Shi
- School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
15
|
García-Mauriño SM, Díaz-Quintana A, Rivero-Rodríguez F, Cruz-Gallardo I, Grüttner C, Hernández-Vellisca M, Díaz-Moreno I. A putative RNA binding protein from Plasmodium vivax apicoplast. FEBS Open Bio 2017; 8:177-188. [PMID: 29435408 PMCID: PMC5794462 DOI: 10.1002/2211-5463.12351] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 11/03/2017] [Accepted: 11/14/2017] [Indexed: 01/30/2023] Open
Abstract
Malaria is caused by Apicomplexa protozoans from the Plasmodium genus entering the bloodstream of humans and animals through the bite of the female mosquitoes. The annotation of the Plasmodium vivax genome revealed a putative RNA binding protein (apiRBP) that was predicted to be trafficked into the apicoplast, a plastid organelle unique to Apicomplexa protozoans. Although a 3D structural model of the apiRBP corresponds to a noncanonical RNA recognition motif with an additional C‐terminal α‐helix (α3), preliminary protein production trials were nevertheless unsuccessful. Theoretical solvation analysis of the apiRBP model highlighted an exposed hydrophobic region clustering α3. Hence, we used a C‐terminal GFP‐fused chimera to stabilize the highly insoluble apiRBP and determined its ability to bind U‐rich stretches of RNA. The affinity of apiRBP toward such RNAs is highly dependent on ionic strength, suggesting that the apiRBP–RNA complex is driven by electrostatic interactions. Altogether, apiRBP represents an attractive tool for apicoplast transcriptional studies and for antimalarial drug design.
Collapse
Affiliation(s)
- Sofía M García-Mauriño
- Instituto de Investigaciones Químicas (IIQ) Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja) Universidad de Sevilla Consejo Superior de Investigaciones Científicas (CSIC) Sevilla Spain
| | - Antonio Díaz-Quintana
- Instituto de Investigaciones Químicas (IIQ) Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja) Universidad de Sevilla Consejo Superior de Investigaciones Científicas (CSIC) Sevilla Spain
| | - Francisco Rivero-Rodríguez
- Instituto de Investigaciones Químicas (IIQ) Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja) Universidad de Sevilla Consejo Superior de Investigaciones Científicas (CSIC) Sevilla Spain
| | | | - Christian Grüttner
- Instituto de Investigaciones Químicas (IIQ) Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja) Universidad de Sevilla Consejo Superior de Investigaciones Científicas (CSIC) Sevilla Spain
| | - Marian Hernández-Vellisca
- Instituto de Investigaciones Químicas (IIQ) Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja) Universidad de Sevilla Consejo Superior de Investigaciones Científicas (CSIC) Sevilla Spain
| | - Irene Díaz-Moreno
- Instituto de Investigaciones Químicas (IIQ) Centro de Investigaciones Científicas Isla de la Cartuja (cicCartuja) Universidad de Sevilla Consejo Superior de Investigaciones Científicas (CSIC) Sevilla Spain
| |
Collapse
|
16
|
Gamage DG, Varma Y, Meitzler JL, Morissette R, Ness TJ, Hendrickson TL. The soluble domains of Gpi8 and Gaa1, two subunits of glycosylphosphatidylinositol transamidase (GPI-T), assemble into a complex. Arch Biochem Biophys 2017; 633:58-67. [DOI: 10.1016/j.abb.2017.09.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 09/06/2017] [Accepted: 09/07/2017] [Indexed: 11/23/2022]
|
17
|
Sanders K, Lin CL, Smith AJ, Cronin N, Fisher G, Eftychidis V, McGlynn P, Savery NJ, Wigley DB, Dillingham MS. The structure and function of an RNA polymerase interaction domain in the PcrA/UvrD helicase. Nucleic Acids Res 2017; 45:3875-3887. [PMID: 28160601 PMCID: PMC5397179 DOI: 10.1093/nar/gkx074] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 01/25/2017] [Indexed: 11/14/2022] Open
Abstract
The PcrA/UvrD helicase functions in multiple pathways that promote bacterial genome stability including the suppression of conflicts between replication and transcription and facilitating the repair of transcribed DNA. The reported ability of PcrA/UvrD to bind and backtrack RNA polymerase (1,2) might be relevant to these functions, but the structural basis for this activity is poorly understood. In this work, we define a minimal RNA polymerase interaction domain in PcrA, and report its crystal structure at 1.5 Å resolution. The domain adopts a Tudor-like fold that is similar to other RNA polymerase interaction domains, including that of the prototype transcription-repair coupling factor Mfd. Removal or mutation of the interaction domain reduces the ability of PcrA/UvrD to interact with and to remodel RNA polymerase complexes in vitro. The implications of this work for our understanding of the role of PcrA/UvrD at the interface of DNA replication, transcription and repair are discussed.
Collapse
Affiliation(s)
- Kelly Sanders
- DNA:Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| | - Chia-Liang Lin
- Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London SW3 6JB, UK and Section of Structural Biology, Department of Medicine, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Abigail J Smith
- DNA:Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| | - Nora Cronin
- Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London SW3 6JB, UK and Section of Structural Biology, Department of Medicine, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Gemma Fisher
- DNA:Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| | | | - Peter McGlynn
- Department of Biology, University of York, Wentworth Way, York YO10 5DD, UK
| | - Nigel J Savery
- DNA:Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| | - Dale B Wigley
- Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London SW3 6JB, UK and Section of Structural Biology, Department of Medicine, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Mark S Dillingham
- DNA:Protein Interactions Unit, School of Biochemistry, Biomedical Sciences Building, University of Bristol, Bristol BS8 1TD, UK
| |
Collapse
|
18
|
Sheu MJ, Hsieh MJ, Chou YE, Wang PH, Yeh CB, Yang SF, Lee HL, Liu YF. Effects of ADAMTS14 genetic polymorphism and cigarette smoking on the clinicopathologic development of hepatocellular carcinoma. PLoS One 2017; 12:e0172506. [PMID: 28231306 PMCID: PMC5322915 DOI: 10.1371/journal.pone.0172506] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/05/2017] [Indexed: 01/12/2023] Open
Abstract
Background ADAMTS14 is a member of the ADAMTS (adisintegrin and metalloproteinase with thrombospondin motifs), which are proteolytic enzymes with a variety of further ancillary domain in the C-terminal region for substrate specificity and enzyme localization via extracellular matrix association. However, whether ADAMTS14 genetic variants play a role in hepatocellular carcinoma (HCC) susceptibility remains unknown. Methodology/Principal findings Four non-synonymous single-nucleotide polymorphisms (nsSNPs) of the ADAMTS14 gene were examined from 680 controls and 340 patients with HCC. Among 141 HCC patients with smoking behaviour, we found significant associations of the rs12774070 (CC+AA vs CC) and rs61573157 (CT+TT vs CC) variants with a clinical stage of HCC (OR: 2.500 and 2.767; 95% CI: 1.148–5.446 and 1.096–6.483; P = 0.019 and 0.026, respectively) and tumour size (OR: 2.387 and 2.659; 95% CI: 1.098–5.188 and 1.055–6.704; P = 0.026 and 0.034, respectively), but not with lymph node metastasis or other clinical statuses. Moreover, an additional integrated in silico analysis proposed that rs12774070 and rs61573157 affected essential post-translation O-glycosylation site within the 3rd thrombospondin type 1 repeat and a novel proline-rich region embedded within the C-terminal extension, respectively. Conclusions Taken together, our results suggest an involvement of ADAMTS14 SNP rs12774070 and rs61573157 in the liver tumorigenesis and implicate the ADAMTS14 gene polymorphism as a predict factor during the progression of HCC.
Collapse
Affiliation(s)
- Ming-Jen Sheu
- Department of Gastroenterology and Hepatology, Chi Mei Medical Center, Tainan, Taiwan
| | - Ming-Ju Hsieh
- Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Cancer Research Center, Changhua Christian Hospital, Changhua, Taiwan
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan
| | - Ying-Erh Chou
- School of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Department of Medical Research, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Po-Hui Wang
- Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Department of Obstetrics and Gynecology, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Chao-Bin Yeh
- School of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Department of Emergency Medicine, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Shun-Fa Yang
- Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Department of Medical Research, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Hsiang-Lin Lee
- School of Medicine, Chung Shan Medical University, Taichung, Taiwan
- Deptartment of Surgery, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Yu-Fan Liu
- Department of Biomedical Sciences, College of Medicine Sciences and Technology, Chung Shan Medical University, Taichung, Taiwan
- Division of Allergy, Department of Pediatrics, Chung-Shan Medical University Hospital, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
19
|
Ovchinnikov S, Kim DE, Wang RYR, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 2016; 84 Suppl 1:67-75. [PMID: 26677056 PMCID: PMC5490371 DOI: 10.1002/prot.24974] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 11/27/2015] [Accepted: 12/12/2015] [Indexed: 12/19/2022]
Abstract
We describe CASP11 de novo blind structure predictions made using the Rosetta structure prediction methodology with both automatic and human assisted protocols. Model accuracy was generally improved using coevolution derived residue-residue contact information as restraints during Rosetta conformational sampling and refinement, particularly when the number of sequences in the family was more than three times the length of the protein. The highlight was the human assisted prediction of T0806, a large and topologically complex target with no homologs of known structure, which had unprecedented accuracy-<3.0 Å root-mean-square deviation (RMSD) from the crystal structure over 223 residues. For this target, we increased the amount of conformational sampling over our fully automated method by employing an iterative hybridization protocol. Our results clearly demonstrate, in a blind prediction scenario, that coevolution derived contacts can considerably increase the accuracy of template-free structure modeling. Proteins 2016; 84(Suppl 1):67-75. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - David E Kim
- Institute for Protein Design, University of Washington, Washington, Seattle 98195.,Howard Hughes Medical Institute, University of Washington, Washington, Seattle 98195
| | - Ray Yu-Ruei Wang
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - Yuan Liu
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Washington, Seattle 98195.,Institute for Protein Design, University of Washington, Washington, Seattle 98195
| | - David Baker
- Department of Biochemistry, University of Washington, Washington, Seattle 98195. .,Institute for Protein Design, University of Washington, Washington, Seattle 98195. .,Howard Hughes Medical Institute, University of Washington, Washington, Seattle 98195.
| |
Collapse
|
20
|
Xue Z, Jang R, Govindarajoo B, Huang Y, Wang Y. Extending Protein Domain Boundary Predictors to Detect Discontinuous Domains. PLoS One 2015; 10:e0141541. [PMID: 26502173 PMCID: PMC4621036 DOI: 10.1371/journal.pone.0141541] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Accepted: 10/10/2015] [Indexed: 11/18/2022] Open
Abstract
A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx.
Collapse
Affiliation(s)
- Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- * E-mail: (ZX); (YW)
| | - Richard Jang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, United States of America
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, United States of America
| | - Yichu Huang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Yan Wang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- * E-mail: (ZX); (YW)
| |
Collapse
|
21
|
Morissette R, Chen W, Perritt AF, Dreiling JL, Arai AE, Sachdev V, Hannoush H, Mallappa A, Xu Z, McDonnell NB, Quezado M, Merke DP. Broadening the Spectrum of Ehlers Danlos Syndrome in Patients With Congenital Adrenal Hyperplasia. J Clin Endocrinol Metab 2015; 100:E1143-52. [PMID: 26075496 PMCID: PMC4525000 DOI: 10.1210/jc.2015-2232] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
CONTEXT The contiguous gene deletion syndrome (CAH-X) was described in a subset (7%) of congenital adrenal hyperplasia (CAH) patients with a TNXA/TNXB chimera, resulting in deletions of CYP21A2, encoding 21-hydroxylase necessary for cortisol biosynthesis, and TNXB, encoding the extracellular matrix glycoprotein tenascin-X (TNX). This TNXA/TNXB chimera is characterized by a 120-bp deletion in exon 35 and results in TNXB haploinsufficiency, disrupted TGF-β signaling, and an Ehlers Danlos syndrome phenotype. OBJECTIVE The objective of the study was to determine the genetic status of TNXB and resulting protein defects in CAH patients with a CAH-X phenotype but not the previously described TNXA/TNXB chimera. Design, Settings, Participants, and Intervention: A total of 246 unrelated CAH patients were screened for TNXB defects. Genetic defects were investigated by Southern blotting, multiplex ligation-dependent probe amplification, Sanger, and next-generation sequencing. Dermal fibroblasts and tissue were used for immunoblotting, immunohistochemical, and coimmunoprecipitation experiments. MAIN OUTCOME MEASURES The genetic and protein status of tenascin-X in phenotypic CAH-X patients was measured. RESULTS Seven families harbor a novel TNXB missense variant c.12174C>G (p.C4058W) and a clinical phenotype consistent with hypermobility-type Ehlers Danlos syndrome. Fourteen CAH probands carry previously described TNXA/TNXB chimeras, and seven unrelated patients carry the novel TNXB variant, resulting in a CAH-X prevalence of 8.5%. This highly conserved pseudogene-derived variant in the TNX fibrinogen-like domain is predicted to be deleterious and disulfide bonded, results in reduced dermal elastin and fibrillin-1 staining and altered TGF-β1 binding, and represents a novel TNXA/TNXB chimera. Tenascin-X protein expression was normal in dermal fibroblasts, suggesting a dominant-negative effect. CONCLUSIONS CAH-X syndrome is commonly found in CAH due to 21-hydroxylase deficiency and may result from various etiological mechanisms.
Collapse
|
22
|
Jing R, Sun J, Wang Y, Li M. Domain position prediction based on sequence information by using fuzzy mean operator. Proteins 2015; 83:1462-9. [PMID: 26009844 DOI: 10.1002/prot.24833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Revised: 04/23/2015] [Accepted: 05/17/2015] [Indexed: 11/09/2022]
Abstract
The prediction of protein domain region is an advantageous process on the study of protein structure and function. In this study, we proposed a new method, which is composed of fuzzy mean operator and region division, to predict the particular positions of domains in a target protein based on its sequence. The whole sequence is aligned and scored by using fuzzy mean operator, and the final determination of domain region position is realized by region division. A published benchmark is used for the comparison with previous researches. In addition, we generate two extra datasets to examine the stability of this method. Finally, the prediction accuracy of independent test dataset achieved by our method was up to 84.13%. We wish that this method could be useful for related researches.
Collapse
Affiliation(s)
- Runyu Jing
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Jing Sun
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Yuelong Wang
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Menglong Li
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| |
Collapse
|
23
|
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) do not adopt a well-defined folded structure under physiological conditions. Instead, these proteins exist as heterogeneous and dynamical conformational ensembles. IDPs are widespread in eukaryotic proteomes and are involved in fundamental biological processes, mostly related to regulation and signaling. At the same time, disordered regions often pose significant challenges to the structure determination process, which generally requires highly homogeneous proteins samples. In this book chapter, we provide a brief overview of protein disorder, describe various bioinformatics resources that have been developed in recent years for their characterization, and give a general outline of their applications in various types of structural genomics projects. Traditionally, disordered segments were filtered out to optimize the yield of structure determination pipelines. However, it is becoming increasingly clear that the structural characterization of proteins cannot be complete without the incorporation of intrinsically disordered regions.
Collapse
Affiliation(s)
- Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | | |
Collapse
|
24
|
Abstract
Motivation: Protein domains are subunits that can fold and evolve independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis, which has low accuracy. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequence regions. As template-based methods are most efficient for protein 3D structure modeling, combining multiple threading alignment information should increase the accuracy and reliability of computational domain predictions. Result: We developed a new protein domain predictor, ThreaDom, which deduces domain boundary locations based on multiple threading alignments. The core of the method development is the derivation of a domain conservation score that combines information from template domain structures and terminal and internal alignment gaps. Tested on 630 non-redundant sequences, without using homologous templates, ThreaDom generates correct single- and multi-domain classifications in 81% of cases, where 78% have the domain linker assigned within ±20 residues. In a second test on 486 proteins with discontinuous domains, ThreaDom achieves an average precision 84% and recall 65% in domain boundary prediction. Finally, ThreaDom was examined on 56 targets from CASP8 and had a domain overlap rate 73, 87 and 85% with the target for Free Modeling, Hard multiple-domain and discontinuous domain proteins, respectively, which are significantly higher than most domain predictors in the CASP8. Similar results were achieved on the targets from the most recently CASP9 and CASP10 experiments. Availability:http://zhanglab.ccmb.med.umich.edu/ThreaDom/. Contact:zhng@umich.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | |
Collapse
|
25
|
Gwynn EJ, Smith AJ, Guy CP, Savery NJ, McGlynn P, Dillingham MS. The conserved C-terminus of the PcrA/UvrD helicase interacts directly with RNA polymerase. PLoS One 2013; 8:e78141. [PMID: 24147116 PMCID: PMC3797733 DOI: 10.1371/journal.pone.0078141] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 09/13/2013] [Indexed: 12/31/2022] Open
Abstract
UvrD-like helicases play diverse roles in DNA replication, repair and recombination pathways. An emerging body of evidence suggests that their different cellular functions are directed by interactions with partner proteins that target unwinding activity to appropriate substrates. Recent studies in E. coli have shown that UvrD can act as an accessory replicative helicase that resolves conflicts between the replisome and transcription complexes, but the mechanism is not understood. Here we show that the UvrD homologue PcrA interacts physically with B. subtilis RNA polymerase, and that an equivalent interaction is conserved in E. coli where UvrD, but not the closely related helicase Rep, also interacts with RNA polymerase. The PcrA-RNAP interaction is direct and independent of nucleic acids or additional mediator proteins. A disordered but highly conserved C-terminal region of PcrA, which distinguishes PcrA/UvrD from otherwise related enzymes such as Rep, is both necessary and sufficient for interaction with RNA polymerase.
Collapse
Affiliation(s)
- Emma J. Gwynn
- DNA:Protein Interactions Unit, School of Biochemistry, University of Bristol, Bristol, United Kingdom
| | - Abigail J. Smith
- DNA:Protein Interactions Unit, School of Biochemistry, University of Bristol, Bristol, United Kingdom
| | - Colin P. Guy
- School of Medical Sciences, University of Aberdeen, Aberdeen, United Kingdom
| | - Nigel J. Savery
- DNA:Protein Interactions Unit, School of Biochemistry, University of Bristol, Bristol, United Kingdom
| | - Peter McGlynn
- School of Medical Sciences, University of Aberdeen, Aberdeen, United Kingdom
- Department of Biology, University of York, York, United Kingdom
| | - Mark S. Dillingham
- DNA:Protein Interactions Unit, School of Biochemistry, University of Bristol, Bristol, United Kingdom
- * E-mail:
| |
Collapse
|
26
|
Abstract
In recent years, there have been numerous unprecedented technological advances in the field of molecular biology; these include DNA sequencing, mass spectrometry of proteins, and microarray analysis of mRNA transcripts. Perhaps, however, it is the area of genomics, which has now generated the complete genome sequences of more than 100 poxviruses, that has had the greatest impact on the average virology researcher because the DNA sequence data is in constant use in many different ways by almost all molecular virologists. As this data resource grows, so does the importance of the availability of databases and software tools to enable the bench virologist to work with and make use of this (valuable/expensive) DNA sequence information. Thus, providing researchers with intuitive software to first select and reformat genomics data from large databases, second, to compare/analyze genomics data, and third, to view and interpret large and complex sets of results has become pivotal in enabling progress to be made in modern virology. This chapter is directed at the bench virologist and describes the software required for a number of common bioinformatics techniques that are useful for comparing and analyzing poxvirus genomes. In a number of examples, we also highlight the Viral Orthologous Clusters database system and integrated tools that we developed for the management and analysis of complete viral genomes.
Collapse
Affiliation(s)
- Melissa Da Silva
- Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
| | | |
Collapse
|
27
|
Kint CI, Verstraeten N, Wens I, Liebens VR, Hofkens J, Versées W, Fauvart M, Michiels J. The Escherichia coli GTPase ObgE modulates hydroxyl radical levels in response to DNA replication fork arrest. FEBS J 2012; 279:3692-3704. [PMID: 22863262 DOI: 10.1111/j.1742-4658.2012.08731.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Obg proteins are universally conserved GTP-binding proteins that are essential for viability in bacteria. Homologs in different organisms are involved in various cellular processes, including DNA replication. The goal of this study was to analyse the structure-function relationship of Escherichia coli ObgE with regard to DNA replication in general and sensitivity to stalled replication forks in particular. Defined C-terminal chromosomal deletion mutants of obgE were constructed and tested for sensitivity to the replication inhibitor hydroxyurea. The ObgE C-terminal domain was shown to be dispensable for normal growth of E.coli. However, a region within this domain is involved in the cellular response to replication fork stress. In addition, a mutant obgE over-expression library was constructed by error-prone PCR and screened for increased hydroxyurea sensitivity. ObgE proteins with substitutions L159Q, G163V, P168V, G216A or R237C, located within distinct domains of ObgE, display dominant-negative effects leading to hydroxyurea hypersensitivity when over-expressed. These effects are abolished in strains with a single deletion of the iron transporter TonB or combined deletions the toxin/antitoxin modules RelBE/MazEF, strains both of which have been shown to be involved in a pathway that stimulates hydroxyl radical formation following hydroxyurea treatment. Moreover, the observed dominant-negative effects are lost in the presence of the hydroxyl radical scavenger thiourea. Together, these results indicate involvement of hydroxyl radical toxicity in ObgE-mediated protection against replication fork stress.
Collapse
Affiliation(s)
- Cyrielle I Kint
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Natalie Verstraeten
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Inez Wens
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Veerle R Liebens
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Johan Hofkens
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Wim Versées
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Maarten Fauvart
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| | - Jan Michiels
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, Heverlee, Belgium Department of Chemistry, Katholieke Universiteit Leuven, Heverlee, Belgium Structural Biology Brussels, Vrije Universiteit Brussel, Belgium Department of Structural Biology, Vlaams Instituut voor Biotechnologie, Brussels, Belgium
| |
Collapse
|
28
|
Law YS, Gudimella R, Song BK, Ratnam W, Harikrishna JA. Molecular characterization and comparative sequence analysis of defense-related gene, Oryza rufipogon receptor-like protein kinase 1. Int J Mol Sci 2012; 13:9343-9362. [PMID: 22942769 PMCID: PMC3430300 DOI: 10.3390/ijms13079343] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Revised: 07/06/2012] [Accepted: 07/06/2012] [Indexed: 11/16/2022] Open
Abstract
Many of the plant leucine rich repeat receptor-like kinases (LRR-RLKs) have been found to regulate signaling during plant defense processes. In this study, we selected and sequenced an LRR-RLK gene, designated as Oryza rufipogon receptor-like protein kinase 1 (OrufRPK1), located within yield QTL yld1.1 from the wild rice Oryza rufipogon (accession IRGC105491). A 2055 bp coding region and two exons were identified. Southern blotting determined OrufRPK1 to be a single copy gene. Sequence comparison with cultivated rice orthologs (OsI219RPK1, OsI9311RPK1 and OsJNipponRPK1, respectively derived from O. sativa ssp. indica cv. MR219, O. sativa ssp. indica cv. 9311 and O. sativa ssp. japonica cv. Nipponbare) revealed the presence of 12 single nucleotide polymorphisms (SNPs) with five non-synonymous substitutions, and 23 insertion/deletion sites. The biological role of the OrufRPK1 as a defense related LRR-RLK is proposed on the basis of cDNA sequence characterization, domain subfamily classification, structural prediction of extra cellular domains, cluster analysis and comparative gene expression.
Collapse
Affiliation(s)
- Yee-Song Law
- Centre for Research in Biotechnology for Agriculture (CEBAR) and Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, 50603, Malaysia; E-Mails: (Y.-S.L.); (R.G.)
| | - Ranganath Gudimella
- Centre for Research in Biotechnology for Agriculture (CEBAR) and Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, 50603, Malaysia; E-Mails: (Y.-S.L.); (R.G.)
| | - Beng-Kah Song
- School of Science, Monash University Sunway Campus, Jalan Lagoon Selatan, Bandar Sunway, Selangor 46150, Malaysia; E-Mail:
| | - Wickneswari Ratnam
- School of Environmental and Natural Resource Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia; E-Mail:
| | - Jennifer Ann Harikrishna
- Centre for Research in Biotechnology for Agriculture (CEBAR) and Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, 50603, Malaysia; E-Mails: (Y.-S.L.); (R.G.)
| |
Collapse
|
29
|
Abstract
The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28–40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine.
Collapse
Affiliation(s)
- Bi-Qing Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Le-Le Hu
- Institute of Systems Biology, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Kai-Yan Feng
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, China
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail: (YDC) (YC); (KCC) (KC)
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
- * E-mail: (YDC) (YC); (KCC) (KC)
| |
Collapse
|
30
|
Gupta AB, Wee LE, Zhou YT, Hortsch M, Low BC. Cross-species analyses identify the BNIP-2 and Cdc42GAP homology (BCH) domain as a distinct functional subclass of the CRAL_TRIO/Sec14 superfamily. PLoS One 2012; 7:e33863. [PMID: 22479462 PMCID: PMC3313917 DOI: 10.1371/journal.pone.0033863] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 02/18/2012] [Indexed: 11/19/2022] Open
Abstract
The CRAL_TRIO protein domain, which is unique to the Sec14 protein superfamily, binds to a diverse set of small lipophilic ligands. Similar domains are found in a range of different proteins including neurofibromatosis type-1, a Ras GTPase-activating Protein (RasGAP) and Rho guanine nucleotide exchange factors (RhoGEFs). Proteins containing this structural protein domain exhibit a low sequence similarity and ligand specificity while maintaining an overall characteristic three-dimensional structure. We have previously demonstrated that the BNIP-2 and Cdc42GAP Homology (BCH) protein domain, which shares a low sequence homology with the CRAL_TRIO domain, can serve as a regulatory scaffold that binds to Rho, RhoGEFs and RhoGAPs to control various cell signalling processes. In this work, we investigate 175 BCH domain-containing proteins from a wide range of different organisms. A phylogenetic analysis with ∼100 CRAL_TRIO and similar domains from eight representative species indicates a clear distinction of BCH-containing proteins as a novel subclass within the CRAL_TRIO/Sec14 superfamily. BCH-containing proteins contain a hallmark sequence motif R(R/K)h(R/K)(R/K)NL(R/K)xhhhhHPs (‘h’ is large and hydrophobic residue and ‘s’ is small and weekly polar residue) and can be further subdivided into three unique subtypes associated with BNIP-2-N, macro- and RhoGAP-type protein domains. A previously unknown group of genes encoding ‘BCH-only’ domains is also identified in plants and arthropod species. Based on an analysis of their gene-structure and their protein domain context we hypothesize that BCH domain-containing genes evolved through gene duplication, intron insertions and domain swapping events. Furthermore, we explore the point of divergence between BCH and CRAL-TRIO proteins in relation to their ability to bind small GTPases, GAPs and GEFs and lipid ligands. Our study suggests a need for a more extensive analysis of previously uncharacterized BCH, ‘BCH-like’ and CRAL_TRIO-containing proteins and their significance in regulating signaling events involving small GTPases.
Collapse
Affiliation(s)
- Anjali Bansal Gupta
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Mechanobiology Institute, National University of Singapore, Singapore, Singapore
| | - Liang En Wee
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Yi Ting Zhou
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Michael Hortsch
- Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Boon Chuan Low
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Mechanobiology Institute, National University of Singapore, Singapore, Singapore
- * E-mail:
| |
Collapse
|
31
|
Pentony MM, Winters P, Penfold-Brown D, Drew K, Narechania A, DeSalle R, Bonneau R, Purugganan MD. The plant proteome folding project: structure and positive selection in plant protein families. Genome Biol Evol 2012; 4:360-71. [PMID: 22345424 PMCID: PMC3318447 DOI: 10.1093/gbe/evs015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Despite its importance, relatively little is known about the relationship between the structure, function, and evolution of proteins, particularly in land plant species. We have developed a database with predicted protein domains for five plant proteomes (http://pfp.bio.nyu.edu) and used both protein structural fold recognition and de novo Rosetta-based protein structure prediction to predict protein structure for Arabidopsis and rice proteins. Based on sequence similarity, we have identified ∼15,000 orthologous/paralogous protein family clusters among these species and used codon-based models to predict positive selection in protein evolution within 175 of these sequence clusters. Our results show that codons that display positive selection appear to be less frequent in helical and strand regions and are overrepresented in amino acid residues that are associated with a change in protein secondary structure. Like in other organisms, disordered protein regions also appear to have more selected sites. Structural information provides new functional insights into specific plant proteins and allows us to map positively selected amino acid sites onto protein structures and view these sites in a structural and functional context.
Collapse
Affiliation(s)
- M M Pentony
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Little NS, Quon T, Upton C. Prediction of a novel RNA binding domain in crocodilepox Zimbabwe Gene 157. Microb Inform Exp 2011; 1:12. [PMID: 22587704 PMCID: PMC3372294 DOI: 10.1186/2042-5783-1-12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2011] [Accepted: 11/21/2011] [Indexed: 11/30/2022]
Abstract
Background Although the crocodilepox virus (CRV) is currently unclassified, phylogenetic analyses suggest that its closest known relatives are molluscum contagiosum virus (MCV) and the avipox viruses. The CRV genome is approximately 190 kb and contains a large number of unique genes in addition to the set of conserved Chordopoxvirus genes found in all such viruses. Upon sequencing the viral genome, others noted that this virus was also unusual because of the lack of a series of common immuno-suppressive genes. However, the genome contains multiple genes of unknown function that are likely to function in reducing the anti-viral response of the host. Results By using sensitive database searches for similarity, we observed that gene 157 of CRV-strain Zimbabwe (CRV-ZWE) encodes a protein with a domain that is predicted to bind dsRNA. Domain characterization supported this prediction, therefore, we tested the ability of the Robetta protein structure prediction server to model the amino acid sequence of this protein on a well-characterized RNA binding domain. The model generated by Robetta suggests that CRV-ZWE-157 does indeed contain an RNA binding domain; the model could be overlaid on the template protein structure with high confidence. Conclusion We hypothesize that CRV-ZWE-157 encodes a novel poxvirus RNA binding protein and suggest that as a non-core gene it may play a role in host-range determination or function to dampen host anti-viral responses. Potential targets for this CRV protein include the host interferon response and miRNA pathways.
Collapse
Affiliation(s)
- Nicole S Little
- Biochemistry and Microbiology, University of Victoria, 213 Petch Building, Ring Road, Victoria, B.C., V8W 3P6, Canada
| | - Taylor Quon
- Biochemistry and Microbiology, University of Victoria, 213 Petch Building, Ring Road, Victoria, B.C., V8W 3P6, Canada
| | - Chris Upton
- Biochemistry and Microbiology, University of Victoria, 213 Petch Building, Ring Road, Victoria, B.C., V8W 3P6, Canada
| |
Collapse
|
33
|
Drew K, Winters P, Butterfoss GL, Berstis V, Uplinger K, Armstrong J, Riffle M, Schweighofer E, Bovermann B, Goodlett DR, Davis TN, Shasha D, Malmström L, Bonneau R. The Proteome Folding Project: proteome-scale prediction of structure and function. Genome Res 2011; 21:1981-94. [PMID: 21824995 DOI: 10.1101/gr.121475.111] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.
Collapse
Affiliation(s)
- Kevin Drew
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Eickholt J, Deng X, Cheng J. DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 2011; 12:43. [PMID: 21284866 PMCID: PMC3036623 DOI: 10.1186/1471-2105-12-43] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 02/01/2011] [Indexed: 11/17/2022] Open
Abstract
Background Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved. Results We developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines. Conclusions The good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at http://sysbio.rnet.missouri.edu/dobo/.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
35
|
Motono C, Nakata J, Koike R, Shimizu K, Shirota M, Amemiya T, Tomii K, Nagano N, Sakaya N, Misoo K, Sato M, Kidera A, Hiroaki H, Shirai T, Kinoshita K, Noguchi T, Ota M. SAHG, a comprehensive database of predicted structures of all human proteins. Nucleic Acids Res 2010; 39:D487-93. [PMID: 21051360 PMCID: PMC3013665 DOI: 10.1093/nar/gkq1057] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.
Collapse
Affiliation(s)
- Chie Motono
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Edwards TE, Phan I, Abendroth J, Dieterich SH, Masoudi A, Guo W, Hewitt SN, Kelley A, Leibly D, Brittnacher MJ, Staker BL, Miller SI, Van Voorhis WC, Myler PJ, Stewart LJ. Structure of a Burkholderia pseudomallei trimeric autotransporter adhesin head. PLoS One 2010; 5. [PMID: 20862217 PMCID: PMC2942831 DOI: 10.1371/journal.pone.0012803] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2010] [Accepted: 08/18/2010] [Indexed: 02/04/2023] Open
Abstract
Background Pathogenic bacteria adhere to the host cell surface using a family of outer membrane proteins called Trimeric Autotransporter Adhesins (TAAs). Although TAAs are highly divergent in sequence and domain structure, they are all conceptually comprised of a C-terminal membrane anchoring domain and an N-terminal passenger domain. Passenger domains consist of a secretion sequence, a head region that facilitates binding to the host cell surface, and a stalk region. Methodology/Principal Findings Pathogenic species of Burkholderia contain an overabundance of TAAs, some of which have been shown to elicit an immune response in the host. To understand the structural basis for host cell adhesion, we solved a 1.35 Å resolution crystal structure of a BpaA TAA head domain from Burkholderia pseudomallei, the pathogen that causes melioidosis. The structure reveals a novel fold of an intricately intertwined trimer. The BpaA head is composed of structural elements that have been observed in other TAA head structures as well as several elements of previously unknown structure predicted from low sequence homology between TAAs. These elements are typically up to 40 amino acids long and are not domains, but rather modular structural elements that may be duplicated or omitted through evolution, creating molecular diversity among TAAs. Conclusions/Significance The modular nature of BpaA, as demonstrated by its head domain crystal structure, and of TAAs in general provides insights into evolution of pathogen-host adhesion and may provide an avenue for diagnostics.
Collapse
Affiliation(s)
- Thomas E Edwards
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Abstract
The tertiary structure of proteins can reveal information that is hard to detect in a linear sequence. Knowing the tertiary structure is valuable when generating hypothesis and interpreting data. Unfortunately, the gap between the number of known protein sequences and their associated structures is widening. One way to bridge this gap is to use computer-generated structure models of proteins. Here we present concepts and online resources that can be used to identify structural domains in proteins and to create structure models of those domains.
Collapse
Affiliation(s)
- Lars Malmström
- Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | | |
Collapse
|
38
|
Lenhart TR, Akins DR. Borrelia burgdorferi locus BB0795 encodes a BamA orthologue required for growth and efficient localization of outer membrane proteins. Mol Microbiol 2009; 75:692-709. [PMID: 20025662 DOI: 10.1111/j.1365-2958.2009.07015.x] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The outer membrane (OM) of the pathogenic diderm spirochete, Borrelia burgdorferi, contains integral beta-barrel outer membrane proteins (OMPs) in addition to its numerous outer surface lipoproteins. Very few OMPs have been identified in B. burgdorferi, and the protein machinery required for OMP assembly and OM localization is currently unknown. Essential OM BamA proteins have recently been characterized in Gram-negative bacteria that are central components of an OM beta-barrel assembly machine and are required for proper localization and insertion of bacterial OMPs. In the present study, we characterized a putative B. burgdorferi BamA orthologue encoded by open reading frame bb0795. Structural model predictions and cellular localization data indicate that the B. burgdorferi BB0795 protein contains an N-terminal periplasmic domain and a C-terminal, surface-exposed beta-barrel domain. Additionally, assays with an IPTG-regulatable bb0795 mutant revealed that BB0795 is required for B. burgdorferi growth. Furthermore, depletion of BB0795 results in decreased amounts of detectable OMPs in the B. burgdorferi OM. Interestingly, a decrease in the levels of surface-exposed lipoproteins was also observed in the mutant OMs. Collectively, our structural, cellular localization and functional data are consistent with the characteristics of other BamA proteins, indicating that BB0795 is a B. burgdorferi BamA orthologue.
Collapse
Affiliation(s)
- Tiffany R Lenhart
- Department of Microbiology and Immunology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | | |
Collapse
|
39
|
Dosztanyi Z, Meszaros B, Simon I. Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins. Brief Bioinform 2009; 11:225-43. [DOI: 10.1093/bib/bbp061] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
40
|
Walsh I, Martin AJM, Mooney C, Rubagotti E, Vullo A, Pollastri G. Ab initio and homology based prediction of protein domains by recursive neural networks. BMC Bioinformatics 2009; 10:195. [PMID: 19558651 PMCID: PMC2711945 DOI: 10.1186/1471-2105-10-195] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 06/26/2009] [Indexed: 11/10/2022] Open
Abstract
Background Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures. Results We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality. We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly. Conclusion The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: and we plan on running them on a multi-genomic scale and make the results public in the near future.
Collapse
Affiliation(s)
- Ian Walsh
- School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland.
| | | | | | | | | | | |
Collapse
|
41
|
Salipante SJ, Rojas ME, Korkmaz B, Duan Z, Wechsler J, Benson KF, Person RE, Grimes HL, Horwitz MS. Contributions to neutropenia from PFAAP5 (N4BP2L2), a novel protein mediating transcriptional repressor cooperation between Gfi1 and neutrophil elastase. Mol Cell Biol 2009; 29:4394-405. [PMID: 19506020 DOI: 10.1128/MCB.00596-09] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
"Neutropenia" refers to deficient numbers of neutrophils, the most abundant type of white blood cell. Two main forms of inherited neutropenia are cyclic neutropenia, in which neutrophil counts oscillate with a 21-day frequency, and severe congenital neutropenia, in which static neutropenia may evolve at times into leukemia. Mutations of ELA2, encoding the protease neutrophil elastase, can cause both disorders. Among other genes, severe congenital neutropenia can also result from mutations affecting the transcriptional repressor Gfi1, one of whose genetic targets is ELA2, suggesting that the two act through similar mechanisms. In order to identify components of a common pathway regulating neutrophil production, we conducted yeast two-hybrid screens with Gfi1 and neutrophil elastase and detected a novel protein, PFAAP5 (also known as N4BP2L2), interacting with both. Expression of PFAAP5 allows neutrophil elastase to potentiate the repression of Gfi1 target genes, as determined by reporter assays, RNA interference, chromatin immunoprecipitation, and impairment of neutrophil differentiation in HSCs with PFAAP5 depletion, thus delineating a mechanism through which neutrophil elastase could regulate its own synthesis. Our findings are consistent with theoretical models of cyclic neutropenia proposing that its periodicity can be explained through disturbance of a feedback circuit in which mature neutrophils inhibit cell proliferation, thereby homeostatically regulating progenitor populations.
Collapse
|
42
|
Kirillova S, Kumar S, Carugo O. Protein domain boundary predictions: a structural biology perspective. Open Biochem J 2009; 3:1-8. [PMID: 19401756 PMCID: PMC2669640 DOI: 10.2174/1874091x00903010001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2008] [Revised: 11/27/2008] [Accepted: 11/29/2008] [Indexed: 11/22/2022] Open
Abstract
One of the important fields to apply computational tools for domain boundaries prediction is structural biology. They can be used to design protein constructs that must be expressed in a stable and functional form and must produce diffraction-quality crystals. However, prediction of protein domain boundaries on the basis of amino acid sequences is still very problematical. In present study the performance of several computational approaches are compared. It is observed that the statistical significance of most of the predictions is rather poor. Nevertheless, when the right number of domains is correctly predicted, domain boundaries are predicted within very few residues from their real location. It can be concluded that prediction methods cannot be used yet as routine tools in structural biology, though some of them are rather promising.
Collapse
Affiliation(s)
- Svetlana Kirillova
- Department of Biomolecular Structural Chemistry, Max F. Pertuz Laboratories, Vienna University, Campus Vienna, Biocenter 5, A-1030, Vienna
| | | | | |
Collapse
|
43
|
Abstract
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.
Collapse
Affiliation(s)
- Rajkumar Bondugula
- Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD 21702, USA.
| | | | | |
Collapse
|
44
|
Wu Y, Dousis AD, Chen M, Li J, Ma J. OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries. J Mol Biol 2008; 385:1314-29. [PMID: 19026662 DOI: 10.1016/j.jmb.2008.10.093] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2008] [Revised: 10/29/2008] [Accepted: 10/31/2008] [Indexed: 10/21/2022]
Abstract
In this article, we present a de novo method for predicting protein domain boundaries, called OPUS-Dom. The core of the method is a novel coarse-grained folding method, VECFOLD, which constructs low-resolution structural models from a target sequence by folding a chain of vectors representing the predicted secondary-structure elements. OPUS-Dom generates a large ensemble of folded structure decoys by VECFOLD and labels the domain boundaries of each decoy by a domain parsing algorithm. Consensus domain boundaries are then derived from the statistical distribution of the putative boundaries and three empirical sequence-based domain profiles. OPUS-Dom generally outperformed several state-of-the-art domain prediction algorithms over various benchmark protein sets. Even though each VECFOLD-generated structure contains large errors, collectively these structures provide a more robust delineation of domain boundaries. The success of OPUS-Dom suggests that the arrangement of protein domains is more a consequence of limited coordination patterns per domain arising from tertiary packing of secondary-structure segments, rather than sequence-specific constraints.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | | | | | | |
Collapse
|
45
|
Trevisan S, Borsa P, Botton A, Varotto S, Malagoli M, Ruperti B, Quaggiotti S. Expression of two maize putative nitrate transporters in response to nitrate and sugar availability. Plant Biol (Stuttg) 2008; 10:462-75. [PMID: 18557906 DOI: 10.1111/j.1438-8677.2008.00041.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
A full-length cDNA encoding a putative high-affinity nitrate transporter (ZmNrt2.2) from maize was isolated and characterised, together with another previously identified transporter (ZmNrt2.1), in terms of phylogenesis, protein structure prediction and regulation of transcript accumulation in response to nitrate and sugar availability. The expression of both genes was evaluated by quantitative and semi-quantitative RT-PCR in response to nitrate and sugar supply and the in planta localisation of mRNA was studied by in situ hybridisation. Data obtained suggested similar genetic evolution and identical transmembrane structure prediction between the two deduced proteins, and differences in both regulation of their expression and mRNA localisation in response to nitrate, leading us to hypothesise a principal role for ZmNRT2.1 in the influx activity and the major involvement of ZmNRT2.2 in the xylem loading process. Our data suggest opposing sugar regulation by ZmNrt2.1 and ZmNrt2.2 transcription in the presence or absence of nitrate and the existence of both hexokinase-dependent and hexokinase-independent transduction mechanisms for the regulation of ZmNrt2.1 and ZmNrt2.2 expression by sugars.
Collapse
Affiliation(s)
- S Trevisan
- Department of Agricultural Biotechnology, University of Padova, Legnaro, Italy
| | | | | | | | | | | | | |
Collapse
|
46
|
Hu X, Murata LB, Weichsel A, Brailey JL, Roberts SA, Nighorn A, Montfort WR. Allostery in recombinant soluble guanylyl cyclase from Manduca sexta. J Biol Chem 2008; 283:20968-77. [PMID: 18515359 DOI: 10.1074/jbc.m801501200] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Soluble guanylyl/guanylate cyclase (sGC), the primary biological receptor for nitric oxide, is required for proper development and health in all animals. We have expressed heterodimeric full-length and N-terminal fragments of Manduca sexta sGC in Escherichia coli, the first time this has been accomplished for any sGC, and have performed the first functional analyses of an insect sGC. Manduca sGC behaves much like its mammalian counterparts, displaying a 170-fold stimulation by NO and sensitivity to compound YC-1. YC-1 reduces the NO and CO off-rates for the approximately 100-kDa N-terminal heterodimeric fragment and increases the CO affinity by approximately 50-fold to 1.7 microm. Binding of NO leads to a transient six-coordinate intermediate, followed by release of the proximal histidine to yield a five-coordinate nitrosyl complex (k(6-5) = 12.8 s(-1)). The conversion rate is insensitive to nucleotides, YC-1, and changes in NO concentration up to approximately 30 microm. NO release is biphasic in the absence of YC-1 (k(off1) = 0.10 s(-1) and k(off2) = 0.0015 s(-1)); binding of YC-1 eliminates the fast phase but has little effect on the slower phase. Our data are consistent with a model for allosteric activation in which sGC undergoes a simple switch between two conformations, with an open or a closed heme pocket, integrating the influence of numerous effectors to give the final catalytic rate. Importantly, YC-1 binding occurs in the N-terminal two-thirds of the protein. Homology modeling and mutagenesis experiments suggest the presence of an H-NOX domain in the alpha subunit with importance for heme binding.
Collapse
Affiliation(s)
- Xiaohui Hu
- Department of Biochemistry and Molecular Biophysics, and Arizona Research Laboratories, Division of Neurobiology, University of Arizona, Tucson, AZ 85721, USA
| | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
Given the rapid growth in the number of sequences without known structures, it is becoming increasingly important to not only accurately define protein structural domains but also predict domain boundaries from the amino-acid sequence alone. In this article, we present a Back-Propagation (BP) neural network method using 9 different sequence profiles, based on chemical, physical, and statistical properties, to predict the domain boundary of two-domain proteins from one dimensional sequences. We have achieved an accuracy of 69% with a 10-fold cross validation on a 238 nonredundant two-domain protein dataset that we built based on a common set from both SCOP and CATH classifications. The method has also been applied to a larger third-party dataset with 522 proteins; and an accuracy of 62% has been achieved. Our prediction results on both datasets are found to be significantly better than those from some other methods, such as DomCut and DGS on the same datasets, and also comparable to that from the PPRODO method, upon which the larger dataset was based. Our cross validation results are also noticeably better than previous ones from other BP neural network methods, probably because we have used more property descriptors with significantly more training nodes in our neural network. The integration with PPRODO method also indicates that the information obtained from our current approach is complementary to that available through multiple sequence alignments. Moreover, the relative importance of each property profile has been analyzed in detail.
Collapse
Affiliation(s)
- Lei Ye
- Department of Computer Science, Zhejiang University, Hangzhou, China
| | | | | | | |
Collapse
|
48
|
Kolmos E, Schoof H, Plümer M, Davis SJ. Structural insights into the function of the core-circadian factor TIMING OF CAB2 EXPRESSION 1 (TOC1). J Circadian Rhythms 2008; 6:3. [PMID: 18298828 PMCID: PMC2292679 DOI: 10.1186/1740-3391-6-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2007] [Accepted: 02/25/2008] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The plant circadian clock has at its core a feedback loop that includes TIMING OF CAB2 EXPRESSION 1 (TOC1). This protein has an as of yet unknown biochemical activity. It has been noted that the extreme amino-terminus of this protein is distantly related in sequence to response regulators (RR), and thus TOC1 is a member of the so-called pseudo response regulator (PRR) family. As well, the extreme carboxy-terminus has a small sequence stretch related to the other PRRs and CONSTANS (CO)-like proteins, and this peptide stretch has been termed the CCT (for CONSTANS, CONSTANS-LIKE, TOC1) domain. METHODS To extend further our understanding of the TOC1 protein, we performed a ROSETTA structural prediction on TOC1 orthologues from four plant species. Phylogenetic interpretations assisted in model construction. RESULTS From our models, we suggest that TOC1 is a three-domain protein: TOC1 has an amino-terminal signaling-domain related to response receivers, a carboxy-terminal domain that could participate both in metal binding and in transcriptional regulation, and a linker domain that connects the two. CONCLUSION The models we present should prove useful in future hypothesis-driven biochemical analyses to test the predictions that TOC1 is a multi-domain signaling component of the plant circadian clock.
Collapse
Affiliation(s)
- Elsebeth Kolmos
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, D-50829 Cologne, Germany
| | - Heiko Schoof
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, D-50829 Cologne, Germany
| | - Michael Plümer
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, D-50829 Cologne, Germany
| | - Seth J Davis
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, D-50829 Cologne, Germany
| |
Collapse
|
49
|
Tress M, Cheng J, Baldi P, Joo K, Lee J, Seo JH, Lee J, Baker D, Chivian D, Kim D, Ezkurdia I. Assessment of predictions submitted for the CASP7 domain prediction category. Proteins 2008; 69 Suppl 8:137-51. [PMID: 17680686 DOI: 10.1002/prot.21675] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This paper details the assessment process and evaluation results for the Critical Assessment of Protein Structure Prediction (CASP7) domain prediction category. Domain predictions were assessed using the Normalized Domain Overlap score introduced in CASP6 and the accuracy of prediction of domain break points. The results of the analysis clearly demonstrate that the best methods are able to make consistently reliable predictions when the target has a structural template, although they are less good when the domain break occurs in a region not covered by a template. The conditions of the experiment meant that it was impossible to draw any conclusions about domain prediction for free modeling targets and it was also difficult to draw many distinctions between the best groups. Two thirds of the targets submitted were single domains and hence regarded as easy to predict. Even those targets defined as having multiple domains always had at least one domain with a similar template structure.
Collapse
Affiliation(s)
- Michael Tress
- Structural and Biological Computation Programme, Spanish National Cancer Research Centre, Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
With each round of CASP (Critical Assessment of Techniques for Protein Structure Prediction), automated prediction servers have played an increasingly important role. Today, most protein structure prediction approaches in some way depend on automated methods for fold recognition or model building. The accuracy of server predictions has significantly increased over the last years, and, in CASP7, we observed a continuation of this trend. In the template-based modeling category, the best prediction server was ranked third overall, i.e. it outperformed all but two of the human participating groups. This server also ranked among the very best predictors in the free modeling category as well, being clearly beaten by only one human group. In the high accuracy (HA) subset of TBM, two of the top five groups were servers. This article summarizes the contribution of automated structure prediction servers in the CASP7 experiment, with emphasis on 3D structure prediction, as well as information on their prediction scope and public availability.
Collapse
|