51
|
Fan X, Wang H, Zhao Y, Li Y, Tsui KL. An Adaptive Weight Learning-Based Multitask Deep Network for Continuous Blood Pressure Estimation Using Electrocardiogram Signals. SENSORS 2021; 21:s21051595. [PMID: 33668778 PMCID: PMC7956522 DOI: 10.3390/s21051595] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 01/27/2021] [Accepted: 02/07/2021] [Indexed: 11/16/2022]
Abstract
Estimating blood pressure via combination analysis with electrocardiogram and photoplethysmography signals has attracted growing interest in continuous monitoring patients’ health conditions. However, most wearable/portal monitoring devices generally acquire only one kind of physiological signals due to the consideration of energy cost, device weight and size, etc. In this study, a novel adaptive weight learning-based multitask deep learning framework based on single lead electrocardiogram signals is proposed for continuous blood pressure estimation. Specifically, the proposed method utilizes a 2-layer bidirectional long short-term memory network as the sharing layer, followed by three identical architectures of 2-layer fully connected networks for task-specific blood pressure estimation. To learn the importance of task-specific losses automatically, an adaptive weight learning scheme based on the trend of validation loss is proposed. Extensive experiment results on Physionet Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II waveform database demonstrate that the proposed method using electrocardiogram signals obtains estimating performance of 0.12±10.83 mmHg, 0.13±5.90 mmHg, and 0.08±6.47 mmHg for systolic blood pressure, diastolic blood pressure, and mean arterial pressure, respectively. It can meet the requirements of the British Hypertension Society standard and US Association of Advancement of Medical Instrumentation standard with a considerable margin. Combined with a wearable/portal electrocardiogram device, the proposed model can be deployed to a healthcare system to provide a long-term continuous blood pressure monitoring service, which would help to reduce the incidence of malignant complications to hypertension.
Collapse
Affiliation(s)
- Xiaomao Fan
- School of Computer Science, South China Normal University, Guangzhou 510631, China;
| | - Hailiang Wang
- School of Design, Hong Kong Polytechnic University, Hong Kong, China;
| | - Yang Zhao
- School of Data Science, City University of Hong Kong, Hong Kong, China;
- Correspondence: or
| | - Ye Li
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Kwok Leung Tsui
- School of Data Science, City University of Hong Kong, Hong Kong, China;
| |
Collapse
|
52
|
Synergistic role of nucleotides and lipids for the self-assembly of Shs1 septin oligomers. Biochem J 2021; 477:2697-2714. [PMID: 32726433 DOI: 10.1042/bcj20200199] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/25/2022]
Abstract
Budding yeast septins are essential for cell division and polarity. Septins assemble as palindromic linear octameric complexes. The function and ultra-structural organization of septins are finely governed by their molecular polymorphism. In particular, in budding yeast, the end subunit can stand either as Shs1 or Cdc11. We have dissected, here, for the first time, the behavior of the Shs1 protomer bound to membranes at nanometer resolution, in complex with the other septins. Using electron microscopy, we have shown that on membranes, Shs1 protomers self-assemble into rings, bundles, filaments or two-dimensional gauzes. Using a set of specific mutants we have demonstrated a synergistic role of both nucleotides and lipids for the organization and oligomerization of budding yeast septins. Besides, cryo-electron tomography assays show that vesicles are deformed by the interaction between Shs1 oligomers and lipids. The Shs1-Shs1 interface is stabilized by the presence of phosphoinositides, allowing the visualization of micrometric long filaments formed by Shs1 protomers. In addition, molecular modeling experiments have revealed a potential molecular mechanism regarding the selectivity of septin subunits for phosphoinositide lipids.
Collapse
|
53
|
Bian Y, Xie XQ. Generative chemistry: drug discovery with deep learning generative models. J Mol Model 2021; 27:71. [PMID: 33543405 PMCID: PMC10984615 DOI: 10.1007/s00894-021-04674-8] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 01/13/2021] [Indexed: 12/15/2022]
Abstract
The de novo design of molecular structures using deep learning generative models introduces an encouraging solution to drug discovery in the face of the continuously increased cost of new drug development. From the generation of original texts, images, and videos, to the scratching of novel molecular structures the creativity of deep learning generative models exhibits the height machine intelligence can achieve. The purpose of this paper is to review the latest advances in generative chemistry which relies on generative modeling to expedite the drug discovery process. This review starts with a brief history of artificial intelligence in drug discovery to outline this emerging paradigm. Commonly used chemical databases, molecular representations, and tools in cheminformatics and machine learning are covered as the infrastructure for generative chemistry. The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compound generation are focused. Challenges and future perspectives follow.
Collapse
Affiliation(s)
- Yuemin Bian
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- NIH National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- Drug Discovery Institute, University of Pittsburgh, 335 Sutherland Drive, 206 Salk Pavilion, Pittsburgh, PA, 15261, USA.
- Departments of Computational Biology and Structural Biology, School of Medicine, University of Pittsburgh, PA, 15261, Pittsburgh, USA.
| |
Collapse
|
54
|
Leitner PD, Vietor I, Huber LA, Valovka T. Fluorescent thermal shift-based method for detection of NF-κB binding to double-stranded DNA. Sci Rep 2021; 11:2331. [PMID: 33504856 PMCID: PMC7840993 DOI: 10.1038/s41598-021-81743-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 01/07/2021] [Indexed: 12/18/2022] Open
Abstract
The nuclear factor kappa B (NF-κB) family of dimeric transcription factors regulates a wide range of genes by binding to their specific DNA regulatory sequences. NF-κB is an important therapeutic target linked to a number of cancers as well as autoimmune and inflammatory diseases. Therefore, effective high-throughput methods for the detection of NF-κB DNA binding are essential for studying its transcriptional activity and for inhibitory drug screening. We describe here a novel fluorescence-based assay for quantitative detection of κB consensus double-stranded (ds) DNA binding by measuring the thermal stability of the NF-κB proteins. Specifically, DNA binding proficient NF-κB probes, consisting of the N-terminal p65/RelA (aa 1-306) and p50 (aa 1-367) regions, were designed using bioinformatic analysis of protein hydrophobicity, folding and sequence similarities. By measuring the SYPRO Orange fluorescence during thermal denaturation of the probes, we detected and quantified a shift in the melting temperatures (ΔTm) of p65/RelA and p50 produced by the dsDNA binding. The increase in Tm was proportional to the concentration of dsDNA with apparent dissociation constants (KD) of 2.228 × 10-6 M and 0.794 × 10-6 M, respectively. The use of withaferin A (WFA), dimethyl fumarate (DMF) and p-xyleneselenocyanate (p-XSC) verified the suitability of this assay for measuring dose-dependent antagonistic effects on DNA binding. In addition, the assay can be used to analyse the direct binding of inhibitors and their effects on structural stability of the protein probe. This may facilitate the identification and rational design of new drug candidates interfering with NF-κB functions.
Collapse
Affiliation(s)
- Peter D Leitner
- Institute of Cell Biology, Biocenter, Medical University of Innsbruck, Innrain 80-82, 6020, Innsbruck, Austria
- Austrian Drug Screening Institute, ADSI, Innsbruck, Austria
- Department of Biotechnology and Food Engineering, MCI Technik, Innsbruck, Austria
| | - Ilja Vietor
- Institute of Cell Biology, Biocenter, Medical University of Innsbruck, Innrain 80-82, 6020, Innsbruck, Austria
| | - Lukas A Huber
- Institute of Cell Biology, Biocenter, Medical University of Innsbruck, Innrain 80-82, 6020, Innsbruck, Austria
- Austrian Drug Screening Institute, ADSI, Innsbruck, Austria
| | - Taras Valovka
- Institute of Cell Biology, Biocenter, Medical University of Innsbruck, Innrain 80-82, 6020, Innsbruck, Austria.
- Department of Pediatrics I, Medical University of Innsbruck, Anichstrasse 35, 6020, Innsbruck, Austria.
| |
Collapse
|
55
|
Abstract
Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.
Collapse
Affiliation(s)
- Gal Almog
- Department of Pathology & Laboratory Medicine, Western University, Dental Sciences Building, Rm. 4044 London, Ontario, Canada, N6A 5C1
| | - Abayomi S Olabode
- Department of Pathology & Laboratory Medicine, Western University, Dental Sciences Building, Rm. 4044 London, Ontario, Canada, N6A 5C1
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, Dental Sciences Building, Rm. 4044 London, Ontario, Canada, N6A 5C1.,Department of Applied Mathematics, Western University, Middlesex College Room 255, 1151 Richmond Street London, Ontario, Canada, N6A 5B7.,Department of Microbiology & Immunology, Western University, 1151 Richmond Street London, Ontario, Canada, N6A 3K
| |
Collapse
|
56
|
Xu G, Ren T, Chen Y, Che W. A One-Dimensional CNN-LSTM Model for Epileptic Seizure Recognition Using EEG Signal Analysis. Front Neurosci 2021; 14:578126. [PMID: 33390878 PMCID: PMC7772824 DOI: 10.3389/fnins.2020.578126] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 11/10/2020] [Indexed: 11/13/2022] Open
Abstract
Frequent epileptic seizures cause damage to the human brain, resulting in memory impairment, mental decline, and so on. Therefore, it is important to detect epileptic seizures and provide medical treatment in a timely manner. Currently, medical experts recognize epileptic seizure activity through the visual inspection of electroencephalographic (EEG) signal recordings of patients based on their experience, which takes much time and effort. In view of this, this paper proposes a one-dimensional convolutional neural network-long short-term memory (1D CNN-LSTM) model for automatic recognition of epileptic seizures through EEG signal analysis. Firstly, the raw EEG signal data are pre-processed and normalized. Then, a 1D convolutional neural network (CNN) is designed to effectively extract the features of the normalized EEG sequence data. In addition, the extracted features are then processed by the LSTM layers in order to further extract the temporal features. After that, the output features are fed into several fully connected layers for final epileptic seizure recognition. The performance of the proposed 1D CNN-LSTM model is verified on the public UCI epileptic seizure recognition data set. Experiments results show that the proposed method achieves high recognition accuracies of 99.39% and 82.00% on the binary and five-class epileptic seizure recognition tasks, respectively. Comparing results with traditional machine learning methods including k-nearest neighbors, support vector machines, and decision trees, other deep learning methods including standard deep neural network and CNN further verify the superiority of the proposed method.
Collapse
Affiliation(s)
- Gaowei Xu
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Tianhe Ren
- School of Informatics, Xiamen University, Xiamen, China
| | - Yu Chen
- Department of Dermatology & STD, Nantong First People's Hospital, Nantong, China
| | - Wenliang Che
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| |
Collapse
|
57
|
Anbo H, Amagai H, Fukuchi S. NeProc predicts binding segments in intrinsically disordered regions without learning binding region sequences. Biophys Physicobiol 2020; 17:147-154. [PMID: 33304713 PMCID: PMC7692026 DOI: 10.2142/biophysico.bsj-2020026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 10/29/2020] [Indexed: 12/01/2022] Open
Abstract
Intrinsically disordered proteins are those proteins with intrinsically disordered regions. One of the unique characteristics of intrinsically disordered proteins is the existence of functional segments in intrinsically dis-ordered regions. These segments are involved in binding to partner molecules, such as protein and DNA, and play important roles in signaling pathways and/or transcriptional regulation. Although there are databases that gather information on such disordered binding regions, data remain limited. Therefore, it is desirable to develop programs to predict the disordered binding regions without using data for the binding regions. We developed a program, NeProc, to predict the disordered binding regions, which can be regarded as intrinsically disordered regions with a structural propensity. We only used data for the structural domains and intrinsically disordered regions to detect such regions. NeProc accepts a query amino acid sequence converted into a position specific score matrix, and uses two neural networks that employ different window sizes, a neural network of short windows, and a neural network of long windows. The performance of NeProc was comparable to that of existing programs of the disordered binding region prediction. This result presents the possibility to overcome the shortage of the disordered binding region data in the development of the prediction programs for these binding regions. NeProc is available at http://flab.neproc.org/neproc/index.html.
Collapse
Affiliation(s)
- Hiroto Anbo
- Department of Life Science and Informatics, Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Gunma 371-0816, Japan
| | - Hiroki Amagai
- Department of Life Science and Informatics, Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Gunma 371-0816, Japan
| | - Satoshi Fukuchi
- Department of Life Science and Informatics, Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Gunma 371-0816, Japan
| |
Collapse
|
58
|
Katuwawala A, Kurgan L. Comparative Assessment of Intrinsic Disorder Predictions with a Focus on Protein and Nucleic Acid-Binding Proteins. Biomolecules 2020; 10:E1636. [PMID: 33291838 PMCID: PMC7762010 DOI: 10.3390/biom10121636] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 01/18/2023] Open
Abstract
With over 60 disorder predictors, users need help navigating the predictor selection task. We review 28 surveys of disorder predictors, showing that only 11 include assessment of predictive performance. We identify and address a few drawbacks of these past surveys. To this end, we release a novel benchmark dataset with reduced similarity to the training sets of the considered predictors. We use this dataset to perform a first-of-its-kind comparative analysis that targets two large functional families of disordered proteins that interact with proteins and with nucleic acids. We show that limiting sequence similarity between the benchmark and the training datasets has a substantial impact on predictive performance. We also demonstrate that predictive quality is sensitive to the use of the well-annotated order and inclusion of the fully structured proteins in the benchmark datasets, both of which should be considered in future assessments. We identify three predictors that provide favorable results using the new benchmark set. While we find that VSL2B offers the most accurate and robust results overall, ESpritz-DisProt and SPOT-Disorder perform particularly well for disordered proteins. Moreover, we find that predictions for the disordered protein-binding proteins suffer low predictive quality compared to generic disordered proteins and the disordered nucleic acids-binding proteins. This can be explained by the high disorder content of the disordered protein-binding proteins, which makes it difficult for the current methods to accurately identify ordered regions in these proteins. This finding motivates the development of a new generation of methods that would target these difficult-to-predict disordered proteins. We also discuss resources that support users in collecting and identifying high-quality disorder predictions.
Collapse
Affiliation(s)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| |
Collapse
|
59
|
Izumi H, Nafie LA, Dukor RK. SSSCPreds: Deep Neural Network-Based Software for the Prediction of Conformational Variability and Application to SARS-CoV-2. ACS OMEGA 2020; 5:30556-30567. [PMID: 33283104 PMCID: PMC7687297 DOI: 10.1021/acsomega.0c04472] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 11/05/2020] [Indexed: 05/05/2023]
Abstract
Amino acid mutations that improve protein stability and rigidity can accompany increases in binding affinity. Therefore, conserved amino acids located on a protein surface may be successfully targeted by antibodies. The quantitative deep mutational scanning approach is an excellent technique to understand viral evolution, and the obtained data can be utilized to develop a vaccine. However, the application of the approach to all of the proteins in general is difficult in terms of cost. To address this need, we report the construction of a deep neural network-based program for sequence-based prediction of supersecondary structure codes (SSSCs), called SSSCPrediction (SSSCPred). Further, to predict conformational flexibility or rigidity in proteins, a comparison program called SSSCPreds that consists of three deep neural network-based prediction systems (SSSCPred, SSSCPred100, and SSSCPred200) has also been developed. Using our algorithms we calculated here shows the degree of flexibility for the receptor-binding motif of SARS-CoV-2 spike protein and the rigidity of the unique motif (SSSC: SSSHSSHHHH) at the S2 subunit and has a value independent of the X-ray and Cryo-EM structures. The fact that the sequence flexibility/rigidity map of SARS-CoV-2 RBD resembles the sequence-to-phenotype maps of ACE2-binding affinity and expression, which were experimentally obtained by deep mutational scanning, suggests that the identical SSSC sequences among the ones predicted by three deep neural network-based systems correlate well with the sequences with both lower ACE2-binding affinity and lower expression. The combined analysis of predicted and observed SSSCs with keyword-tagged datasets would be helpful in understanding the structural correlation to the examined system.
Collapse
Affiliation(s)
- Hiroshi Izumi
- National
Institute of Advanced Industrial Science and Technology (AIST), AIST
Tsukuba West, 16-1 Onogawa, Tsukuba, Ibaraki 305-8569, Japan
| | - Laurence A. Nafie
- Department
of Chemistry, Syracuse University, Syracuse, New York 13244-4100, United States
- BioTools
Inc., 17546 SR 710 (Bee
Line Hwy), Jupiter, Florida 33458, United States
| | - Rina K. Dukor
- BioTools
Inc., 17546 SR 710 (Bee
Line Hwy), Jupiter, Florida 33458, United States
| |
Collapse
|
60
|
Abstract
Chemometrics play a critical role in biosensors-based detection, analysis, and diagnosis. Nowadays, as a branch of artificial intelligence (AI), machine learning (ML) have achieved impressive advances. However, novel advanced ML methods, especially deep learning, which is famous for image analysis, facial recognition, and speech recognition, has remained relatively elusive to the biosensor community. Herein, how ML can be beneficial to biosensors is systematically discussed. The advantages and drawbacks of most popular ML algorithms are summarized on the basis of sensing data analysis. Specially, deep learning methods such as convolutional neural network (CNN) and recurrent neural network (RNN) are emphasized. Diverse ML-assisted electrochemical biosensors, wearable electronics, SERS and other spectra-based biosensors, fluorescence biosensors and colorimetric biosensors are comprehensively discussed. Furthermore, biosensor networks and multibiosensor data fusion are introduced. This review will nicely bridge ML with biosensors, and greatly expand chemometrics for detection, analysis, and diagnosis.
Collapse
Affiliation(s)
- Feiyun Cui
- Department of Chemical Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, Massachusetts 01609, United States
| | - Yun Yue
- Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - Yi Zhang
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Ziming Zhang
- Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, United States
| | - H. Susan Zhou
- Department of Chemical Engineering, Worcester Polytechnic Institute, 100 Institute Road, Worcester, Massachusetts 01609, United States
| |
Collapse
|
61
|
HACS1 signaling adaptor protein recognizes a motif in the paired immunoglobulin receptor B cytoplasmic domain. Commun Biol 2020; 3:672. [PMID: 33188360 PMCID: PMC7666139 DOI: 10.1038/s42003-020-01397-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 10/22/2020] [Indexed: 12/30/2022] Open
Abstract
Hematopoietic adaptor containing SH3 and SAM domains-1 (HACS1) is a signaling protein with two juxtaposed protein–protein interaction domains and an intrinsically unstructured region that spans half the sequence. Here, we describe the interaction between the HACS1 SH3 domain and a sequence near the third immunoreceptor tyrosine-based inhibition motif (ITIM3) of the paired immunoglobulin receptor B (PIRB). From surface plasmon resonance binding assays using a mouse and human PIRB ITIM3 phosphopeptides as ligands, the HACS1 SH3 domain and SHP2 N-terminal SH2 domain demonstrated comparable affinities in the micromolar range. Since the PIRB ITIM3 sequence represents an atypical ligand for an SH3 domain, we determined the NMR structure of the HACS1 SH3 domain and performed a chemical shift mapping study. This study showed that the binding site on the HACS1 SH3 domain for PIRB shares many of the same amino acids found in a canonical binding cleft normally associated with polyproline ligands. Molecular modeling suggests that the respective binding sites in PIRB ITIM3 for the HACS1 SH3 domain and the SHP2 SH2 domain are too close to permit simultaneous binding. As a result, the HACS1-PIRB partnership has the potential to amalgamate signaling pathways that influence both immune and neuronal cell fate. Kwan et al. show the interaction between the HACS1 SH3 domain and a sequence near the third immunoreceptor tyrosine-based inhibition motif of the Paired immunoglobulin receptor B (PIRB). This study suggests that the HACS1-PIRB partnership has the potential to unite signaling pathways that regulate both immune and neuronal cell fate.
Collapse
|
62
|
Chui AJ, Griswold AR, Taabazuing CY, Orth EL, Gai K, Rao SD, Ball DP, Hsiao JC, Bachovchin DA. Activation of the CARD8 Inflammasome Requires a Disordered Region. Cell Rep 2020; 33:108264. [PMID: 33053349 PMCID: PMC7594595 DOI: 10.1016/j.celrep.2020.108264] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 06/25/2020] [Accepted: 09/22/2020] [Indexed: 12/23/2022] Open
Abstract
Several cytosolic pattern-recognition receptors (PRRs) form multiprotein complexes called canonical inflammasomes in response to intracellular danger signals. Canonical inflammasomes recruit and activate caspase-1 (CASP1), which in turn cleaves and activates inflammatory cytokines and gasdermin D (GSDMD), inducing pyroptotic cell death. Inhibitors of the dipeptidyl peptidases DPP8 and DPP9 (DPP8/9) activate both the human NLRP1 and CARD8 inflammasomes. NLRP1 and CARD8 have different N-terminal regions but have similar C-terminal regions that undergo autoproteolysis to generate two non-covalently associated fragments. Here, we show that DPP8/9 inhibition activates a proteasomal degradation pathway that targets disordered and misfolded proteins for destruction. CARD8’s N terminus contains a disordered region of ~160 amino acids that is recognized and destroyed by this degradation pathway, thereby freeing its C-terminal fragment to activate CASP1 and induce pyroptosis. Thus, CARD8 serves as an alarm to signal the activation of a degradation pathway for disordered and misfolded proteins. Inflammasomes are multiprotein complexes that detect intracellular danger signals and stimulate powerful immune responses. DPP8/9 inhibitors activate the CARD8 inflammasome through an unknown mechanism. Here, Chui et al. show that DPP8/9 inhibitors induce the degradation of many disordered and misfolded proteins. CARD8 has an N-terminal disordered region that is degraded upon DPP8/9 inhibition, triggering inflammasome formation.
Collapse
Affiliation(s)
- Ashley J Chui
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Andrew R Griswold
- Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program, New York, NY 10065, USA
| | - Cornelius Y Taabazuing
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Elizabeth L Orth
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Kuo Gai
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Sahana D Rao
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Daniel P Ball
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jeffrey C Hsiao
- Pharmacology Program of the Weill Cornell Graduate School of Medical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Daniel A Bachovchin
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Pharmacology Program of the Weill Cornell Graduate School of Medical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
63
|
Pan Y, Zhou S, Guan J. Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach. BMC Bioinformatics 2020; 21:384. [PMID: 32938375 PMCID: PMC7495898 DOI: 10.1186/s12859-020-03675-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods. RESULTS Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones. CONCLUSIONS PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, No. 220 Handan Road, Shanghai, 200433, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China.
| |
Collapse
|
64
|
Hiraoka M, Ishikawa A, Matsuzawa F, Aikawa SI, Sakurai A. A variant in the RP1L1 gene in a family with occult macular dystrophy in a predicted intrinsically disordered region. Ophthalmic Genet 2020; 41:599-605. [PMID: 32940107 DOI: 10.1080/13816810.2020.1821383] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
SIGNIFICANCE The responsible genetic variants for occult macular dystrophy (OMD) were found at the predicted intrinsically disordered region (IDR) of the RP1L1 gene. PURPOSE We examined the phenotypes and genotypes of family members from OMD. In addition, the genetic characteristics of the RP1L1 gene in OMD were investigated. METHODS Whole-exome sequencing was applied on two affected family members, and Sanger sequencing was performed on three members. The structural property of RP1L1 and pathogenic variants was analyzed using predictor of natural disordered regions (PONDR). RESULTS Two affected members showed moderate visual impairment and relative central scotoma. The spectral domain optical coherence tomography (SD-OCT) images showed an absence of the interdigitation zone (IZ) and ellipsoid zone (EZ) in one case, and an obscure EZ line in the other case. A RP1L1 variant (c.3593 C > T, p.Ser1198Phe) was identified in two affected members but not in the unaffected member. The PONDR analysis showed that the region from p.1189 to p.1248 could be predicted to be an IDR in the RP1L1 molecule. And the p. Ser1198Phe variant showed significant reduction of PONDR score. CONCLUSIONS Although, the major pathogenic variant of OMD is p.Arg45Trp, multiple reports indicate that the region between p.1194 and p.1201 is another hot spot of OMD. The PONDR analysis predicted that the RP1L1 molecule is one of the intrinsically disordered proteins. It is speculated that the region around p.1200 is essential for the normal function of the RP1L1 molecule, and the missense variants of that area cause the development of OMD.
Collapse
Affiliation(s)
- Miki Hiraoka
- Department of Ophthalmology, Health Sciences University of Hokkaido , Sapporo, Hokkaido, Japan
| | - Aki Ishikawa
- Department of Medical Genetics and Genomics, Sapporo Medical University , Sapporo, Hokkaido Japan
| | | | | | - Akihiro Sakurai
- Department of Medical Genetics and Genomics, Sapporo Medical University , Sapporo, Hokkaido Japan
| |
Collapse
|
65
|
ODiNPred: comprehensive prediction of protein order and disorder. Sci Rep 2020; 10:14780. [PMID: 32901090 PMCID: PMC7479119 DOI: 10.1038/s41598-020-71716-1] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Accepted: 08/10/2020] [Indexed: 12/13/2022] Open
Abstract
Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.
Collapse
|
66
|
Draberova H, Janusova S, Knizkova D, Semberova T, Pribikova M, Ujevic A, Harant K, Knapkova S, Hrdinka M, Fanfani V, Stracquadanio G, Drobek A, Ruppova K, Stepanek O, Draber P. Systematic analysis of the IL-17 receptor signalosome reveals a robust regulatory feedback loop. EMBO J 2020; 39:e104202. [PMID: 32696476 PMCID: PMC7459424 DOI: 10.15252/embj.2019104202] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 06/13/2020] [Accepted: 06/17/2020] [Indexed: 12/24/2022] Open
Abstract
IL-17 mediates immune protection from fungi and bacteria, as well as it promotes autoimmune pathologies. However, the regulation of the signal transduction from the IL-17 receptor (IL-17R) remained elusive. We developed a novel mass spectrometry-based approach to identify components of the IL-17R complex followed by analysis of their roles using reverse genetics. Besides the identification of linear ubiquitin chain assembly complex (LUBAC) as an important signal transducing component of IL-17R, we established that IL-17 signaling is regulated by a robust negative feedback loop mediated by TBK1 and IKKε. These kinases terminate IL-17 signaling by phosphorylating the adaptor ACT1 leading to the release of the essential ubiquitin ligase TRAF6 from the complex. NEMO recruits both kinases to the IL-17R complex, documenting that NEMO has an unprecedented negative function in IL-17 signaling, distinct from its role in NF-κB activation. Our study provides a comprehensive view of the molecular events of the IL-17 signal transduction and its regulation.
Collapse
Affiliation(s)
- Helena Draberova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Sarka Janusova
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Daniela Knizkova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Tereza Semberova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Michaela Pribikova
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Andrea Ujevic
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Karel Harant
- Laboratory of Mass SpectrometryBIOCEVFaculty of ScienceCharles UniversityPragueCzech Republic
| | - Sofija Knapkova
- Department of HaematooncologyUniversity Hospital OstravaOstravaCzech Republic
- Faculty of MedicineUniversity of OstravaOstravaCzech Republic
| | - Matous Hrdinka
- Department of HaematooncologyUniversity Hospital OstravaOstravaCzech Republic
- Faculty of MedicineUniversity of OstravaOstravaCzech Republic
| | - Viola Fanfani
- Institute of Quantitative Biology, Biochemistry, and BiotechnologySynthSysSchool of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Giovanni Stracquadanio
- Institute of Quantitative Biology, Biochemistry, and BiotechnologySynthSysSchool of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Ales Drobek
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Klara Ruppova
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Ondrej Stepanek
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| | - Peter Draber
- Laboratory of Immunity & Cell CommunicationBIOCEVFirst Faculty of MedicineCharles UniversityVestecCzech Republic
- Laboratory of Adaptive ImmunityInstitute of Molecular Genetics of the Czech Academy of SciencesPragueCzech Republic
| |
Collapse
|
67
|
Hernández-Segura T, Pastor N. Identification of an α-MoRF in the Intrinsically Disordered Region of the Escargot Transcription Factor. ACS OMEGA 2020; 5:18331-18341. [PMID: 32743208 PMCID: PMC7392517 DOI: 10.1021/acsomega.0c02051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Accepted: 07/02/2020] [Indexed: 06/11/2023]
Abstract
Molecular recognition features (MoRFs) are common in intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). MoRFs are in constant order-disorder structural transitions and adopt well-defined structures once they are bound to their targets. Here, we study Escargot (Esg), a transcription factor in Drosophila melanogaster that regulates multiple cellular functions, and consists of a disordered N-terminal domain and a group of zinc fingers at its C-terminal domain. We analyzed the N-terminal domain of Esg with disorder predictors and identified a region of 45 amino acids with high probability to form ordered structures, which we named S2. Through 54 μs of molecular dynamics (MD) simulations using CHARMM36 and implicit solvent (generalized Born/surface area (GBSA)), we characterized the conformational landscape of S2 and found an α-MoRF of ∼16 amino acids stabilized by key contacts within the helix. To test the importance of these contacts in the stability of the α-MoRF, we evaluated the effect of point mutations that would impair these interactions, running 24 μs of MD for each mutation. The mutations had mild effects on the MoRF, and in some cases, led to gain of residual structure through long-range contacts of the α-MoRF and the rest of the S2 region. As this could be an effect of the force field and solvent model we used, we benchmarked our simulation protocol by carrying out 32 μs of MD for the (AAQAA)3 peptide. The results of the benchmark indicate that the global amount of helix in shorter peptides like (AAQAA)3 is reasonably predicted. Careful analysis of the runs of S2 and its mutants suggests that the mutation to hydrophobic residues may have nucleated long-range hydrophobic and aromatic interactions that stabilize the MoRF. Finally, we have identified a set of residues that stabilize an α-MoRF in a region still without functional annotations in Esg.
Collapse
Affiliation(s)
- Teresa Hernández-Segura
- Laboratorio
de Dinámica de Proteínas, Centro de Investigación
en Dinámica Celular-IICBA, Universidad
Autónoma del Estado de Morelos, Av. Universidad 1001, Chamilpa, 62209 Cuernavaca, México
- Doctorado
en Ciencias CIDC-IICBA, Universidad Autónoma
del Estado de Morelos, Cuernavaca 62209, Morelos, México
| | - Nina Pastor
- Laboratorio
de Dinámica de Proteínas, Centro de Investigación
en Dinámica Celular-IICBA, Universidad
Autónoma del Estado de Morelos, Av. Universidad 1001, Chamilpa, 62209 Cuernavaca, México
| |
Collapse
|
68
|
Pei J, Kinch LN, Otwinowski Z, Grishin NV. Mutation severity spectrum of rare alleles in the human genome is predictive of disease type. PLoS Comput Biol 2020; 16:e1007775. [PMID: 32413045 PMCID: PMC7255613 DOI: 10.1371/journal.pcbi.1007775] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 05/28/2020] [Accepted: 03/06/2020] [Indexed: 12/19/2022] Open
Abstract
The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Zbyszek Otwinowski
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|
69
|
SLX4 interacts with RTEL1 to prevent transcription-mediated DNA replication perturbations. Nat Struct Mol Biol 2020; 27:438-449. [PMID: 32398829 DOI: 10.1038/s41594-020-0419-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Accepted: 03/17/2020] [Indexed: 12/20/2022]
Abstract
The SLX4 tumor suppressor is a scaffold that plays a pivotal role in several aspects of genome protection, including homologous recombination, interstrand DNA crosslink repair and the maintenance of common fragile sites and telomeres. Here, we unravel an unexpected direct interaction between SLX4 and the DNA helicase RTEL1, which, until now, were viewed as having independent and antagonistic functions. We identify cancer and Hoyeraal-Hreidarsson syndrome-associated mutations in SLX4 and RTEL1, respectively, that abolish SLX4-RTEL1 complex formation. We show that both proteins are recruited to nascent DNA, tightly co-localize with active RNA pol II, and that SLX4, in complex with RTEL1, promotes FANCD2/RNA pol II co-localization. Importantly, disrupting the SLX4-RTEL1 interaction leads to DNA replication defects in unstressed cells, which are rescued by inhibiting transcription. Our data demonstrate that SLX4 and RTEL1 interact to prevent replication-transcription conflicts and provide evidence that this is independent of the nuclease scaffold function of SLX4.
Collapse
|
70
|
Niemeyer M, Moreno Castillo E, Ihling CH, Iacobucci C, Wilde V, Hellmuth A, Hoehenwarter W, Samodelov SL, Zurbriggen MD, Kastritis PL, Sinz A, Calderón Villalobos LIA. Flexibility of intrinsically disordered degrons in AUX/IAA proteins reinforces auxin co-receptor assemblies. Nat Commun 2020; 11:2277. [PMID: 32385295 PMCID: PMC7210949 DOI: 10.1038/s41467-020-16147-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 04/17/2020] [Indexed: 12/31/2022] Open
Abstract
Cullin RING-type E3 ubiquitin ligases SCFTIR1/AFB1-5 and their AUX/IAA targets perceive the phytohormone auxin. The F-box protein TIR1 binds a surface-exposed degron in AUX/IAAs promoting their ubiquitylation and rapid auxin-regulated proteasomal degradation. Here, by adopting biochemical, structural proteomics and in vivo approaches we unveil how flexibility in AUX/IAAs and regions in TIR1 affect their conformational ensemble allowing surface accessibility of degrons. We resolve TIR1·auxin·IAA7 and TIR1·auxin·IAA12 complex topology, and show that flexible intrinsically disordered regions (IDRs) in the degron’s vicinity, cooperatively position AUX/IAAs on TIR1. We identify essential residues at the TIR1 N- and C-termini, which provide non-native interaction interfaces with IDRs and the folded PB1 domain of AUX/IAAs. We thereby establish a role for IDRs in modulating auxin receptor assemblies. By securing AUX/IAAs on two opposite surfaces of TIR1, IDR diversity supports locally tailored positioning for targeted ubiquitylation, and might provide conformational flexibility for a multiplicity of functional states. Auxin-mediated recruitment of AUX/IAAs by the F-box protein TIR1 prompts rapid AUX/IAA ubiquitylation and degradation. By resolving auxin receptor topology, the authors show that intrinsically disordered regions near the degrons of two Aux/IAA proteins reinforce complex assembly and position Aux/IAAs for ubiquitylation.
Collapse
Affiliation(s)
- Michael Niemeyer
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Elena Moreno Castillo
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Christian H Ihling
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Charles Tanford Protein Center, Kurt-Mothes-Straße 3a, 06120, Halle (Saale), Germany
| | - Claudio Iacobucci
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Charles Tanford Protein Center, Kurt-Mothes-Straße 3a, 06120, Halle (Saale), Germany
| | - Verona Wilde
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Antje Hellmuth
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Wolfgang Hoehenwarter
- Proteome Analytics, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany
| | - Sophia L Samodelov
- Institute of Synthetic Biology & Cluster of Excellence on Plant Science (CEPLAS), Heinrich-Heine University of Düsseldorf, Universitätsstrasse 1, 40225, Düsseldorf, Germany
| | - Matias D Zurbriggen
- Institute of Synthetic Biology & Cluster of Excellence on Plant Science (CEPLAS), Heinrich-Heine University of Düsseldorf, Universitätsstrasse 1, 40225, Düsseldorf, Germany
| | - Panagiotis L Kastritis
- ZIK HALOMEM & Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Biozentrum, Weinbergweg 22, 06120, Halle (Saale), Germany
| | - Andrea Sinz
- Department of Pharmaceutical Chemistry & Bioanalytics, Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Charles Tanford Protein Center, Kurt-Mothes-Straße 3a, 06120, Halle (Saale), Germany
| | - Luz Irina A Calderón Villalobos
- Molecular Signal Processing Department, Leibniz Institute of Plant Biochemistry (IPB), Weinberg 3, 06120, Halle (Saale), Germany.
| |
Collapse
|
71
|
Julien M, Miron S, Carreira A, Theillet FX, Zinn-Justin S. 1H, 13C and 15N backbone resonance assignment of the human BRCA2 N-terminal region. BIOMOLECULAR NMR ASSIGNMENTS 2020; 14:79-85. [PMID: 31900740 DOI: 10.1007/s12104-019-09924-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/20/2019] [Indexed: 06/10/2023]
Abstract
The Breast Cancer susceptibility protein 2 (BRCA2) is involved in mechanisms that maintain genome stability, including DNA repair, replication and cell division. These functions are ensured by the folded C-terminal DNA binding domain of BRCA2 but also by its large regions predicted to be disordered. Several studies have shown that disordered regions of BRCA2 are subjected to phosphorylation, thus regulating BRCA2 interactions through the cell cycle. The N-terminal region of BRCA2 contains two highly conserved clusters of phosphorylation sites between amino acids 75 and 210. Upon phosphorylation by CDK, the cluster 1 is known to become a docking site for the kinase PLK1. The cluster 2 is phosphorylated by PLK1 at least at two positions. Both of these phosphorylation clusters are important for mitosis progression, in particular for chromosome segregation and cytokinesis. In order to identify the phosphorylated residues and to characterize the phosphorylation sites preferences and their functional consequences within BRCA2 N-terminus, we have produced and analyzed the BRCA2 fragment from amino acid 48 to amino acid 284 (BRCA248-284). Here, we report the assignment of 1H, 15N, 13CO, 13Cα and 13Cβ NMR chemical shifts of this region. Analysis of these chemical shifts confirmed that BRCA248-284 shows no stable fold: it is intrinsically disordered, with only short, transient α-helices.
Collapse
Affiliation(s)
- Manon Julien
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France
- Paris Sud University, Paris-Saclay University CNRS, UMR3348, 91405, Orsay, France
| | - Simona Miron
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France
| | - Aura Carreira
- Paris Sud University, Paris-Saclay University CNRS, UMR3348, 91405, Orsay, France
- Institut Curie, PSL Research University, UMR3348, 91405, Orsay, France
- CNRS, UMR3348, 91405, Orsay, France
| | - François-Xavier Theillet
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France
| | - Sophie Zinn-Justin
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette Cedex, France.
| |
Collapse
|
72
|
Lv X, Chen J, Lu Y, Chen Z, Xiao N, Yang Y. Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting. J Chem Inf Model 2020; 60:2388-2395. [PMID: 32203653 DOI: 10.1021/acs.jcim.0c00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.
Collapse
Affiliation(s)
- Xuan Lv
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Jianwen Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Zhiguang Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Nong Xiao
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, China.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong 510275, China.,Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University, Ministry of Education, Guangzhou, Guangdong 510275, China
| |
Collapse
|
73
|
Hanson J, Paliwal KK, Litfin T, Zhou Y. SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:645-656. [PMID: 32173600 PMCID: PMC7212484 DOI: 10.1016/j.gpb.2019.01.004] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/18/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023]
Abstract
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Kuldip K Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane 4111, Australia
| | - Thomas Litfin
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia; Institute for Glycomics, Griffith University, Gold Coast 4222, Australia.
| |
Collapse
|
74
|
Liu Y, Wang X, Liu B. RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins. Brief Bioinform 2020; 22:2000-2011. [PMID: 32112084 PMCID: PMC7986600 DOI: 10.1093/bib/bbaa018] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
As an important type of proteins, intrinsically disordered proteins/regions (IDPs/IDRs) are related to many crucial biological functions. Accurate prediction of IDPs/IDRs is beneficial to the prediction of protein structures and functions. Most of the existing methods ignore the fully ordered proteins without IDRs during training and test processes. As a result, the corresponding predictors prefer to predict the fully ordered proteins as disordered proteins. Unfortunately, these methods were only evaluated on datasets consisting of disordered proteins without or with only a few fully ordered proteins, and therefore, this problem escapes the attention of the researchers. However, most of the newly sequenced proteins are fully ordered proteins in nature. These predictors fail to accurately predict the ordered and disordered proteins in real-world applications. In this regard, we propose a new method called RFPR-IDP trained with both fully ordered proteins and disordered proteins, which is constructed based on the combination of convolution neural network (CNN) and bidirectional long short-term memory (BiLSTM). The experimental results show that although the existing predictors perform well for predicting the disordered proteins, they tend to predict the fully ordered proteins as disordered proteins. In contrast, the RFPR-IDP predictor can correctly predict the fully ordered proteins and outperform the other 10 state-of-the-art methods when evaluated on a test dataset with both fully ordered proteins and disordered proteins. The web server and datasets of RFPR-IDP are freely available at http://bliulab.net/RFPR-IDP/server.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.,School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
75
|
Zhu L, Zheng H. Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks. BMC Bioinformatics 2020; 21:47. [PMID: 32028883 PMCID: PMC7006190 DOI: 10.1186/s12859-020-3376-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 01/20/2020] [Indexed: 11/10/2022] Open
Abstract
Background Biomedical event extraction is a fundamental and in-demand technology that has attracted substantial interest from many researchers. Previous works have heavily relied on manual designed features and external NLP packages in which the feature engineering is large and complex. Additionally, most of the existing works use the pipeline process that breaks down a task into simple sub-tasks but ignores the interaction between them. To overcome these limitations, we propose a novel event combination strategy based on hybrid deep neural networks to settle the task in a joint end-to-end manner. Results We adapted our method to several annotated corpora of biomedical event extraction tasks. Our method achieved state-of-the-art performance with noticeable overall F1 score improvement compared to that of existing methods for all of these corpora. Conclusions The experimental results demonstrated that our method is effective for biomedical event extraction. The combination strategy can reconstruct complex events from the output of deep neural networks, while the deep neural networks effectively capture the feature representation from the raw text. The biomedical event extraction implementation is available online at http://www.predictor.xin/event_extraction.
Collapse
Affiliation(s)
- Lvxing Zhu
- School of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China. .,Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China. .,Anhui Province Key Lab. of Big Data Analysis and Application, University of Science and Technology of China, Huangshan Road, Hefei, 230026, People's Republic of China.
| |
Collapse
|
76
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 132] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
77
|
Katuwawala A, Oldfield CJ, Kurgan L. DISOselect: Disorder predictor selection at the protein level. Protein Sci 2020; 29:184-200. [PMID: 31642118 PMCID: PMC6933862 DOI: 10.1002/pro.3756] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/27/2022]
Abstract
The intense interest in the intrinsically disordered proteins in the life science community, together with the remarkable advancements in predictive technologies, have given rise to the development of a large number of computational predictors of intrinsic disorder from protein sequence. While the growing number of predictors is a positive trend, we have observed a considerable difference in predictive quality among predictors for individual proteins. Furthermore, variable predictor performance is often inconsistent between predictors for different proteins, and the predictor that shows the best predictive performance depends on the unique properties of each protein sequence. We propose a computational approach, DISOselect, to estimate the predictive performance of 12 selected predictors for individual proteins based on their unique sequence-derived properties. This estimation informs the users about the expected predictive quality for a selected disorder predictor and can be used to recommend methods that are likely to provide the best quality predictions. Our solution does not depend on the results of any disorder predictor; the estimations are made based solely on the protein sequence. Our solution significantly improves predictive performance, as judged with a test set of 1,000 proteins, when compared to other alternatives. We have empirically shown that by using the recommended methods the overall predictive performance for a given set of proteins can be improved by a statistically significant margin. DISOselect is freely available for non-commercial users through the webserver at http://biomine.cs.vcu.edu/servers/DISOselect/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| | | | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginia
| |
Collapse
|
78
|
Khanh Le NQ, Nguyen QH, Chen X, Rahardja S, Nguyen BP. Classification of adaptor proteins using recurrent neural networks and PSSM profiles. BMC Genomics 2019; 20:966. [PMID: 31874633 PMCID: PMC6929330 DOI: 10.1186/s12864-019-6335-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 11/25/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Adaptor proteins are carrier proteins that play a crucial role in signal transduction. They commonly consist of several modular domains, each having its own binding activity and operating by forming complexes with other intracellular-signaling molecules. Many studies determined that the adaptor proteins had been implicated in a variety of human diseases. Therefore, creating a precise model to predict the function of adaptor proteins is one of the vital tasks in bioinformatics and computational biology. Few computational biology studies have been conducted to predict the protein functions, and in most of those studies, position specific scoring matrix (PSSM) profiles had been used as the features to be fed into the neural networks. However, the neural networks could not reach the optimal result because the sequential information in PSSMs has been lost. This study proposes an innovative approach by incorporating recurrent neural networks (RNNs) and PSSM profiles to resolve this problem. RESULTS Compared to other state-of-the-art methods which had been applied successfully in other problems, our method achieves enhancement in all of the common measurement metrics. The area under the receiver operating characteristic curve (AUC) metric in prediction of adaptor proteins in the cross-validation and independent datasets are 0.893 and 0.853, respectively. CONCLUSIONS This study opens a research path that can promote the use of RNNs and PSSM profiles in bioinformatics and computational biology. Our approach is reproducible by scientists that aim to improve the performance results of different protein function prediction problems. Our source code and datasets are available at https://github.com/ngphubinh/adaptors.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Keelung Road, Da'an Distric, Taipei City 106, Taiwan (R.O.C.)
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Xuan Chen
- Beijing Genomics Institute, 21 Hongan 3rd Street, Shenzhen 518083, China
| | - Susanto Rahardja
- School of Marine Science and Technology, Northwestern Polytechnical University, 127 West Youyi Road, Xi'an 710072, China.
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington 6140, New Zealand
| |
Collapse
|
79
|
Chen S, Sun Z, Lin L, Liu Z, Liu X, Chong Y, Lu Y, Zhao H, Yang Y. To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map. J Chem Inf Model 2019; 60:391-399. [PMID: 31800243 DOI: 10.1021/acs.jcim.9b00438] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2, has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one-dimensional (1D) structural properties that are not sufficient to represent three-dimensional (3D) structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profiles based on an image captioning learning frame. To our best knowledge, this is the first method to employ a 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long-range information from the 2D distance map. Thus, such network architecture using a 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. The online server and the source code is available at http://biomed.nscc-gz.cn and https://github.com/biomed-AI/SPROF , respectively.
Collapse
Affiliation(s)
- Sheng Chen
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zhe Sun
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Lihua Lin
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zifeng Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Xun Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutian Chong
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutong Lu
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital , Sun Yat-sen University , Guangzhou 510000 , China
| | - Yuedong Yang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of the Ministry of Education , Guangzhou 510000 , China
| |
Collapse
|
80
|
Rodriguez G, Orris B, Majumdar A, Bhat S, Stivers JT. Macromolecular crowding induces compaction and DNA binding in the disordered N-terminal domain of hUNG2. DNA Repair (Amst) 2019; 86:102764. [PMID: 31855846 DOI: 10.1016/j.dnarep.2019.102764] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 11/25/2019] [Accepted: 12/04/2019] [Indexed: 11/15/2022]
Abstract
Many human DNA repair proteins have disordered domains at their N- or C-termini with poorly defined biological functions. We recently reported that the partially structured N-terminal domain (NTD) of human uracil DNA glycosylase 2 (hUNG2), functions to enhance DNA translocation in crowded environments and also targets the enzyme to single-stranded/double-stranded DNA junctions. To understand the structural basis for these effects we now report high-resolution heteronuclear NMR studies of the isolated NTD in the presence and absence of an inert macromolecular crowding agent (PEG8K). Compared to dilute buffer, we find that crowding reduces the degrees of freedom for the structural ensemble, increases the order of a PCNA binding motif and dramatically promotes binding of the NTD for DNA through a conformational selection mechanism. These findings shed new light on the function of this disordered domain in the context of the crowded nuclear environment.
Collapse
Affiliation(s)
- Gaddiel Rodriguez
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - Benjamin Orris
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - Ananya Majumdar
- Biomolecular NMR Center, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Shridhar Bhat
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States
| | - James T Stivers
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, United States.
| |
Collapse
|
81
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
82
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
83
|
Liu JH, Yang JY, Hsu DW, Lai YH, Li YP, Tsai YR, Hou MH. Crystal Structure-Based Exploration of Arginine-Containing Peptide Binding in the ADP-Ribosyltransferase Domain of the Type III Effector XopAI Protein. Int J Mol Sci 2019; 20:ijms20205085. [PMID: 31615004 PMCID: PMC6829252 DOI: 10.3390/ijms20205085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 10/11/2019] [Accepted: 10/12/2019] [Indexed: 02/07/2023] Open
Abstract
Plant pathogens secrete proteins called effectors into the cells of their host to modulate the host immune response against colonization. Effectors can either modify or arrest host target proteins to sabotage the signaling pathway, and therefore are considered potential drug targets for crop disease control. In earlier research, the Xanthomonas type III effector XopAI was predicted to be a member of the arginine-specific mono-ADP-ribosyltransferase family. However, the crystal structure of XopAI revealed an altered active site that is unsuitable to bind the cofactor NAD+, but with the capability to capture an arginine-containing peptide from XopAI itself. The arginine peptide consists of residues 60 through 69 of XopAI, and residue 62 (R62) is key to determining the protein–peptide interaction. The crystal structure and the molecular dynamics simulation results indicate that specific arginine recognition is mediated by hydrogen bonds provided by the backbone oxygen atoms from residues W154, T155, and T156, and a salt bridge provided by the E265 sidechain. In addition, a protruding loop of XopAI adopts dynamic conformations in response to arginine peptide binding and is probably involved in target protein recognition. These data suggest that XopAI binds to its target protein by the peptide-binding ability, and therefore, it promotes disease progression. Our findings reveal an unexpected and intriguing function of XopAI and pave the way for further investigation on the role of XopAI in pathogen invasion.
Collapse
Affiliation(s)
- Jyung-Hurng Liu
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
- Department of Life Science, NCHU, Taichung 40227, Taiwan.
- Graduate Institute of Biotechnology, NCHU, Taichung 40227, Taiwan.
- PhD Program in Medical Biotechnology, NCHU, Taichung 40227, Taiwan.
| | - Jun-Yi Yang
- Graduate Institute of Biotechnology, NCHU, Taichung 40227, Taiwan.
- Graduate Institute of Biochemistry, NCHU, Taichung 40227, Taiwan.
| | - Duen-Wei Hsu
- Department of Biotechnology, National Kaohsiung Normal University, Kaohsiung 80201, Taiwan.
| | - Yi-Hua Lai
- Department of Life Science, NCHU, Taichung 40227, Taiwan.
| | - Yun-Pei Li
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
| | - Yi-Rung Tsai
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
| | - Ming-Hon Hou
- Institute of Genomics and Bioinformatics, National Chung Hsing University (NCHU), Taichung 40227, Taiwan.
- Department of Life Science, NCHU, Taichung 40227, Taiwan.
- Graduate Institute of Biotechnology, NCHU, Taichung 40227, Taiwan.
- PhD Program in Medical Biotechnology, NCHU, Taichung 40227, Taiwan.
| |
Collapse
|
84
|
Tobias-Santos V, Guerra-Almeida D, Mury F, Ribeiro L, Berni M, Araujo H, Logullo C, Feitosa NM, de Souza-Menezes J, Pessoa Costa E, Nunes-da-Fonseca R. Multiple Roles of the Polycistronic Gene Tarsal-less/Mille-Pattes/Polished-Rice During Embryogenesis of the Kissing Bug Rhodnius prolixus. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
85
|
A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9173538] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The prediction of protein secondary structure continues to be an active area of research in bioinformatics. In this paper, a Bi-LSTM based ensemble model is developed for the prediction of protein secondary structure. The ensemble model with dual loss function consists of five sub-models, which are finally joined by a Bi-LSTM layer. In contrast to existing ensemble methods, which generally train each sub-model and then join them as a whole, this ensemble model and sub-models can be trained simultaneously and the performance of each model can be observed and compared during the training process. Three independent test sets (e.g., data1199, 513 protein Cuff & Barton set (CB513) and 203 proteins from Critical Appraisals Skills Programme (CASP203)) are employed to test the method. On average, the ensemble model achieved 84.3% in Q 3 accuracy and 81.9% in segment overlap measure ( SOV ) score by using 10-fold cross validation. There is an improvement of up to 1% over some state-of-the-art prediction methods of protein secondary structure.
Collapse
|
86
|
Hadley B, Litfin T, Day CJ, Haselhorst T, Zhou Y, Tiralongo J. Nucleotide Sugar Transporter SLC35 Family Structure and Function. Comput Struct Biotechnol J 2019; 17:1123-1134. [PMID: 31462968 PMCID: PMC6709370 DOI: 10.1016/j.csbj.2019.08.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 08/05/2019] [Accepted: 08/05/2019] [Indexed: 12/22/2022] Open
Abstract
The covalent attachment of sugars to growing glycan chains is heavily reliant on a specific family of solute transporters (SLC35), the nucleotide sugar transporters (NSTs) that connect the synthesis of activated sugars in the nucleus or cytosol, to glycosyltransferases that reside in the lumen of the endoplasmic reticulum (ER) and/or Golgi apparatus. This review provides a timely update on recent progress in the NST field, specifically we explore several NSTs of the SLC35 family whose substrate specificity and function have been poorly understood, but where recent significant progress has been made. This includes SLC35 A4, A5 and D3, as well as progress made towards understanding the association of SLC35A2 with SLC35A3 and how this relates to their potential regulation, and how the disruption to the dilysine motif in SLC35B4 causes mislocalisation, calling into question multisubstrate NSTs and their subcellular localisation and function. We also report on the recently described first crystal structure of an NST, the SLC35D2 homolog Vrg-4 from yeast. Using this crystal structure, we have generated a new model of SLC35A1, (CMP-sialic acid transporter, CST), with structural and mechanistic predictions based on all known CST-related data, and includes an overview of reported mutations that alter transport and/or substrate recognition (both de novo and site-directed). We also present a model of the CST-del177 isoform that potentially explains why the human CST isoform remains active while the hamster CST isoform is inactive, and we provide a possible alternate access mechanism that accounts for the CST being functional as either a monomer or a homodimer. Finally we provide an update on two NST crystal structures that were published subsequent to the submission and during review of this report.
Collapse
Affiliation(s)
- Barbara Hadley
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| | - Thomas Litfin
- School of Information and Communication Technology, Griffith University, Gold Coast Campus, Queensland 4212, Australia
| | - Chris J. Day
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| | - Thomas Haselhorst
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
- School of Information and Communication Technology, Griffith University, Gold Coast Campus, Queensland 4212, Australia
| | - Joe Tiralongo
- Institute for Glycomics, Griffith University, Gold Coast Campus, Queensland 4222, Australia
| |
Collapse
|
87
|
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]
Abstract
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
Collapse
|
88
|
Bai F, Hong D, Lu Y, Liu H, Xu C, Yao X. Prediction of the Antioxidant Response Elements' Response of Compound by Deep Learning. Front Chem 2019; 7:385. [PMID: 31214568 PMCID: PMC6554289 DOI: 10.3389/fchem.2019.00385] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 05/14/2019] [Indexed: 11/13/2022] Open
Abstract
The antioxidant response elements (AREs) play a significant role in occurrence of oxidative stress and may cause multitudinous toxicity effects in the pathogenesis of a variety of diseases. Determining if one compound can activate AREs is crucial for the assessment of potential risk of compound. Here, a series of predictive models by applying multiple deep learning algorithms including deep neural networks (DNN), convolution neural networks (CNN), recurrent neural networks (RNN), and highway networks (HN) were constructed and validated based on Tox21 challenge dataset and applied to predict whether the compounds are the activators or inactivators of AREs. The built models were evaluated by various of statistical parameters, such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and receiver operating characteristic (ROC) curve. The DNN prediction model based on fingerprint features has best prediction ability, with accuracy of 0.992, 0.914, and 0.917 for the training set, test set, and validation set, respectively. Consequently, these robust models can be adopted to predict the ARE response of molecules fast and accurately, which is of great significance for the evaluation of safety of compounds in the process of drug discovery and development.
Collapse
Affiliation(s)
- Fang Bai
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Ding Hong
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Yingying Lu
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou, China
| | - Cunlu Xu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
| | - Xiaojun Yao
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou, China
| |
Collapse
|
89
|
Lee Y, Pei J, Baumhardt JM, Chook YM, Grishin NV. Structural prerequisites for CRM1-dependent nuclear export signaling peptides: accessibility, adapting conformation, and the stability at the binding site. Sci Rep 2019; 9:6627. [PMID: 31036839 PMCID: PMC6488578 DOI: 10.1038/s41598-019-43004-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 04/11/2019] [Indexed: 01/08/2023] Open
Abstract
Nuclear export signal (NES) motifs function as essential regulators of the subcellular location of proteins by interacting with the major nuclear exporter protein, CRM1. Prediction of NES is of great interest in many aspects of research including cancer, but currently available methods, which are mostly based on the sequence-based approaches, have been suffered from high false positive rates since the NES consensus patterns are quite commonly observed in protein sequences. Therefore, finding a feature that can distinguish real NES motifs from false positives is desired to improve the prediction power, but it is quite challenging when only using the sequence. Here, we provide a comprehensive table for the validated cargo proteins, containing the location of the NES consensus patterns with the disordered propensity plots, known protein domain information, and the predicted secondary structures. It could be useful for determining the most plausible NES region in the context of the whole protein sequence and suggests possibilities for some non-binders of the annotated regions. In addition, using the currently available crystal structures of CRM1 bound to various classes of NES peptides, we adopted, for the first time, the structure-based prediction of the NES motifs bound to the CRM1's binding groove. Combining sequence-based and structure-based predictions, we suggest a novel and more straight-forward approach to identify CRM1-binding NES sequences by analysis of their structural prerequisites and energetic evaluation of the stability at the CRM1's binding site.
Collapse
Affiliation(s)
- Yoonji Lee
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jordan M Baumhardt
- Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Yuh Min Chook
- Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
90
|
Katuwawala A, Peng Z, Yang J, Kurgan L. Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions. Comput Struct Biotechnol J 2019; 17:454-462. [PMID: 31007871 PMCID: PMC6453775 DOI: 10.1016/j.csbj.2019.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 03/22/2019] [Accepted: 03/23/2019] [Indexed: 12/28/2022] Open
Abstract
Molecular recognition features (MoRFs) are short protein-binding regions that undergo disorder-to-order transitions (induced folding) upon binding protein partners. These regions are abundant in nature and can be predicted from protein sequences based on their distinctive sequence signatures. This first-of-its-kind survey covers 14 MoRF predictors and six related methods for the prediction of short protein-binding linear motifs, disordered protein-binding regions and semi-disordered regions. We show that the development of MoRF predictors has accelerated in the recent years. These predictors depend on machine learning-derived models that were generated using training datasets where MoRFs are annotated using putative disorder. Our analysis reveals that they generate accurate predictions. We identified eight methods that offer area under the ROC curve (AUC) ≥ 0.7 on experimentally-validated test datasets. We show that modern MoRF predictors accurately find experimentally annotated MoRFs even though they were trained using the putative disorder annotations. They are relatively highly-cited, particularly the methods available as webservers that on average secure three times more citations than methods without this option. MoRF predictions contribute to the experimental discovery of protein-protein interactions, annotation of protein functions and computational analysis of a variety of proteomes, protein families, and pathways. We outline future development and application directions for these tools, stressing the importance to develop novel tools that would target interactions of disordered regions with other types of partners.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
| |
Collapse
|
91
|
Nielsen JT, Mulder FAA. Quality and bias of protein disorder predictors. Sci Rep 2019; 9:5137. [PMID: 30914747 PMCID: PMC6435736 DOI: 10.1038/s41598-019-41644-w] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/13/2019] [Indexed: 02/03/2023] Open
Abstract
Disorder in proteins is vital for biological function, yet it is challenging to characterize. Therefore, methods for predicting protein disorder from sequence are fundamental. Currently, predictors are trained and evaluated using data from X-ray structures or from various biochemical or spectroscopic data. However, the prediction accuracy of disordered predictors is not calibrated, nor is it established whether predictors are intrinsically biased towards one of the extremes of the order-disorder axis. We therefore generated and validated a comprehensive experimental benchmarking set of site-specific and continuous disorder, using deposited NMR chemical shift data. This novel experimental data collection is fully appropriate and represents the full spectrum of disorder. We subsequently analyzed the performance of 26 widely-used disorder prediction methods and found that these vary noticeably. At the same time, a distinct bias for over-predicting order was identified for some algorithms. Our analysis has important implications for the validity and the interpretation of protein disorder, as utilized, for example, in assessing the content of disorder in proteomes.
Collapse
Affiliation(s)
- Jakob T Nielsen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| | - Frans A A Mulder
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Gustav Wieds Vej 14, 8000, Aarhus C, Denmark.
- Department of Chemistry, Aarhus University, Langelandsgade 140, 8000, Aarhus C, Denmark.
| |
Collapse
|
92
|
Dean S, Moreira-Leite F, Gull K. Basalin is an evolutionarily unconstrained protein revealed via a conserved role in flagellum basal plate function. eLife 2019; 8:42282. [PMID: 30810527 PMCID: PMC6392502 DOI: 10.7554/elife.42282] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 02/11/2019] [Indexed: 01/15/2023] Open
Abstract
Most motile flagella have an axoneme that contains nine outer microtubule doublets and a central pair (CP) of microtubules. The CP coordinates the flagellar beat and defects in CP projections are associated with motility defects and human disease. The CP nucleate near a ‘basal plate’ at the distal end of the transition zone (TZ). Here, we show that the trypanosome TZ protein ‘basalin’ is essential for building the basal plate, and its loss is associated with CP nucleation defects, inefficient recruitment of CP assembly factors to the TZ, and flagellum paralysis. Guided by synteny, we identified a highly divergent basalin ortholog in the related Leishmania species. Basalins are predicted to be highly unstructured, suggesting they may act as ‘hubs’ facilitating many protein-protein interactions. This raises the general concept that proteins involved in cytoskeletal functions and appearing organism-specific, may have highly divergent and cryptic orthologs in other species.
Collapse
Affiliation(s)
- Samuel Dean
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Flavia Moreira-Leite
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Keith Gull
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
93
|
Features of a novel protein, rusticalin, from the ascidian Styela rustica reveal ancestral horizontal gene transfer event. Mob DNA 2019; 10:4. [PMID: 30675192 PMCID: PMC6339383 DOI: 10.1186/s13100-019-0146-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 01/02/2019] [Indexed: 12/18/2022] Open
Abstract
Background The transfer of genetic material from non-parent organisms is called horizontal gene transfer (HGT). One of the most conclusive cases of HGT in metazoans was previously described for the cellulose synthase gene in ascidians. Results In this study we identified a new protein, rusticalin, from the ascidian Styela rustica and presented evidence for its likely origin by HGT. Discernible homologues of rusticalin were found in placozoans, coral, and basal Chordates. Rusticalin was predicted to consist of two distinct regions, an N-terminal domain and a C-terminal domain. The N-terminal domain comprises two cysteine-rich repeats and shows remote similarity to the tick carboxypeptidase inhibitor. The C-terminal domain shares significant sequence similarity with bacterial MD peptidases and bacteriophage A500 L-alanyl-D-glutamate peptidase. A possible transfer of the C-terminal domain by bacteriophage was confirmed by an analysis of noncoding sequences of C. intestinalis rusticalin-like gene, which was found to contain a sequence similar to the bacteriophage A500 recombination site. Moreover, a sequence similar to the bacteriophage recombination site was found to be adjacent to the cellulose synthase catalytic subunit gene in the genome of Streptomices sp., the donor of ascidian cellulose synthase. Conclusions The C-terminal domain of rusticalin and rusticalin-like proteins is likely to be horizontally transferred by the bacteriophage A500. A common mechanism involving bacteriophage mediated gene transfer can be proposed for at least two HGT events in ascidians.
Collapse
|
94
|
Chen Z, He N, Huang Y, Qin WT, Liu X, Li L. Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2019; 16:451-459. [PMID: 30639696 PMCID: PMC6411950 DOI: 10.1016/j.gpb.2018.08.004] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 06/20/2018] [Accepted: 08/08/2018] [Indexed: 12/27/2022]
Abstract
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medicine, Qingdao University, Qingdao 266021, China
| | - Ningning He
- School of Basic Medicine, Qingdao University, Qingdao 266021, China
| | - Yu Huang
- School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China
| | - Wen Tao Qin
- Department of Biochemistry, Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario N6A 5C1, Canada
| | - Xuhan Liu
- Department of Information Technology, Beijing Oriental Yamei Gene Technology Institute Co. Ltd., Beijing 100078, China.
| | - Lei Li
- School of Basic Medicine, Qingdao University, Qingdao 266021, China; School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China; Qingdao Cancer Institute, Qingdao University, Qingdao 266021, China.
| |
Collapse
|
95
|
Jung Y, El-Manzalawy Y, Dobbs D, Honavar VG. Partner-specific prediction of RNA-binding residues in proteins: A critical assessment. Proteins 2018; 87:198-211. [PMID: 30536635 PMCID: PMC6389706 DOI: 10.1002/prot.25639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/10/2018] [Accepted: 11/29/2018] [Indexed: 01/06/2023]
Abstract
RNA-protein interactions play essential roles in regulating gene expression. While some RNA-protein interactions are "specific", that is, the RNA-binding proteins preferentially bind to particular RNA sequence or structural motifs, others are "non-RNA specific." Deciphering the protein-RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein-RNA interfaces, there is a need for computational methods to identify RNA-binding residues in proteins. While most of the existing computational methods for predicting RNA-binding residues in RNA-binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner-specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner-specific protein-RNA interface prediction tools, PS-PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA-specificity metric (RSM), for quantifying the RNA-specificity of the RNA binding residues predicted by such tools. Our results show that the RNA-binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner-agnostic metrics, RNA partner-specific methods are outperformed by the state-of-the-art partner-agnostic methods. We conjecture that either (a) the protein-RNA complexes in PDB are not representative of the protein-RNA interactions in nature, or (b) the current methods for partner-specific prediction of RNA-binding residues in proteins fail to account for the differences in RNA partner-specific versus partner-agnostic protein-RNA interactions, or both.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania
| | - Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa
| | - Vasant G Honavar
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, Pennsylvania.,Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, Pennsylvania.,Institute for Cyberscience, Pennsylvania State University, University Park, Pennsylvania.,Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, Pennsylvania.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania.,College of Information Sciences and Technology, Pennsylvania State University, Pennsylvania
| |
Collapse
|
96
|
Mediated nuclear import and export of TAZ and the underlying molecular requirements. Nat Commun 2018; 9:4966. [PMID: 30470756 PMCID: PMC6251892 DOI: 10.1038/s41467-018-07450-0] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 10/26/2018] [Indexed: 12/14/2022] Open
Abstract
Nucleocytoplasmic distribution of Yap/TAZ is regulated by the Hippo pathway and the cytoskeleton. While interactions with cytosolic and nuclear “retention factors” (14–3–3 and TEAD) are known to control their localization, fundamental aspects of Yap/TAZ shuttling remain undefined. It is unclear if translocation occurs only by passive diffusion or via mediated transport, and neither the potential nuclear localization and efflux signals (NLS, NES) nor their putative regulation have been identified. Here we show that TAZ cycling is a mediated process and identify the underlying NLS and NES. The C-terminal NLS, representing a new class of import motifs, is necessary and sufficient for efficient nuclear uptake via a RAN-independent mechanism. RhoA activity directly stimulates this import. The NES lies within the TEAD-binding domain and can be masked by TEAD, thereby preventing efflux. Thus, we describe a RhoA-regulated NLS, a TEAD-regulated NES and propose an improved model of nucleocytoplasmic TAZ shuttling beyond "retention". The transcriptional co-factors Yap and TAZ are regulated by Hippo signalling and mechanical forces via their nucleocytoplasmic shuttling. Here the authors identify a RhoA-regulated C-terminal nuclear localization signal and a TEAD-regulated N-terminal nuclear export signal of TAZ in an epithelial cell line.
Collapse
|
97
|
Abstract
In order to solve the problem that, in complex and wide traffic scenes, the accuracy and speed of multi-object detection can hardly be balanced by the existing object detection algorithms that are based on deep learning and big data, we improve the object detection framework SSD (Single Shot Multi-box Detector) and propose a new detection framework AP-SSD (Adaptive Perceive). We design a feature extraction convolution kernel library composed of multi-shape Gabor and color Gabor and then we train and screen the optimal feature extraction convolution kernel to replace the low-level convolution kernel of the original network to improve the detection accuracy. After that, we combine the single image detection framework with convolution long-term and short-term memory networks and by using the Bottle Neck-LSTM memory layer to refine and propagate the feature mapping between frames, we realize the temporal association of network frame-level information, reduce the calculation cost, succeed in tracking and identifying the targets affected by strong interference in video and reduce the missed alarm rate and false alarm rate by adding an adaptive threshold strategy. Moreover, we design a dynamic region amplification network framework to improve the detection and recognition accuracy of low-resolution small objects. Therefore, experiments on the improved AP-SSD show that this new algorithm can achieve better detection results when small objects, multiple objects, cluttered background and large-area occlusion are involved, thus ensuring this algorithm a good engineering application prospect.
Collapse
|
98
|
Hanson J, Paliwal K, Zhou Y. Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures. J Chem Inf Model 2018; 58:2369-2376. [DOI: 10.1021/acs.jcim.8b00636] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland 4122, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland 4222, Australia
| |
Collapse
|
99
|
Chakraborty C, Clayton C. Stress susceptibility in Trypanosoma brucei lacking the RNA-binding protein ZC3H30. PLoS Negl Trop Dis 2018; 12:e0006835. [PMID: 30273340 PMCID: PMC6181440 DOI: 10.1371/journal.pntd.0006835] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 10/11/2018] [Accepted: 09/11/2018] [Indexed: 01/17/2023] Open
Abstract
Trypanosomes rely on post-transcriptional mechanisms and mRNA-binding proteins for control of gene expression. Trypanosoma brucei ZC3H30 is an mRNA-binding protein that is expressed in both the bloodstream form (which grows in mammals) and the procyclic form (which grows in the tsetse fly midgut). Attachment of ZC3H30 to an mRNA causes degradation of that mRNA. Cells lacking ZC3H30 showed no growth defect under normal culture conditions; but they were more susceptible than wild-type cells to heat shock, starvation, and treatment with DTT, arsenite or ethanol. Transcriptomes of procyclic-form trypanosomes lacking ZC3H30 were indistinguishable from those of cells in which ZC3H30 had been re-expressed, but un-stressed bloodstream forms lacking ZC3H30 had about 2-fold more HSP70 mRNA. Results from pull-downs suggested that ZC3H30 mRNA binding may not be very specific. ZC3H30 was found in stress-induced granules and co-purified with another stress granule protein, Tb927.8.3820; but RNAi targeting Tb927.8.3820 did not affect either ZC3H30 granule association or stress resistance. The conservation of the ZC3H30 gene in both monogenetic and digenetic kinetoplastids, combined with the increased stress susceptibility of cells lacking it, suggests that ZC3H30 confers a selective advantage in the wild, where the parasites are subject to temperature fluctuations and immune attack in both the insect and mammalian hosts.
Collapse
Affiliation(s)
| | - Christine Clayton
- Zentrum für Molekular Biologie, Universität Heidelberg, Heidelberg, Germany
- * E-mail:
| |
Collapse
|
100
|
Luo TJ, Zhou CL, Chao F. Exploring spatial-frequency-sequential relationships for motor imagery classification with recurrent neural network. BMC Bioinformatics 2018; 19:344. [PMID: 30268089 PMCID: PMC6162908 DOI: 10.1186/s12859-018-2365-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 09/10/2018] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Conventional methods of motor imagery brain computer interfaces (MI-BCIs) suffer from the limited number of samples and simplified features, so as to produce poor performances with spatial-frequency features and shallow classifiers. METHODS Alternatively, this paper applies a deep recurrent neural network (RNN) with a sliding window cropping strategy (SWCS) to signal classification of MI-BCIs. The spatial-frequency features are first extracted by the filter bank common spatial pattern (FB-CSP) algorithm, and such features are cropped by the SWCS into time slices. By extracting spatial-frequency-sequential relationships, the cropped time slices are then fed into RNN for classification. In order to overcome the memory distractions, the commonly used gated recurrent unit (GRU) and long-short term memory (LSTM) unit are applied to the RNN architecture, and experimental results are used to determine which unit is more suitable for processing EEG signals. RESULTS Experimental results on common BCI benchmark datasets show that the spatial-frequency-sequential relationships outperform all other competing spatial-frequency methods. In particular, the proposed GRU-RNN architecture achieves the lowest misclassification rates on all BCI benchmark datasets. CONCLUSION By introducing spatial-frequency-sequential relationships with cropping time slice samples, the proposed method gives a novel way to construct and model high accuracy and robustness MI-BCIs based on limited trials of EEG signals.
Collapse
Affiliation(s)
- Tian-jian Luo
- Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, 422 Siming South Road, Siming District, Xiamen, 361005 China
| | - Chang-le Zhou
- Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, 422 Siming South Road, Siming District, Xiamen, 361005 China
| | - Fei Chao
- Department of Cognitive Science, School of Information Science and Engineering, Xiamen University, 422 Siming South Road, Siming District, Xiamen, 361005 China
- Department of Computer Science, Institute of Mathematics, Physics and Computer Science, Aberystwyth University, Aberystwyth, Wales, SY23 3DB UK
| |
Collapse
|