51
|
Non-synonymous variation and protein structure of candidate genes associated with selection in farm and wild populations of turbot (Scophthalmus maximus). Sci Rep 2023; 13:3019. [PMID: 36810752 PMCID: PMC9944912 DOI: 10.1038/s41598-023-29826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Non-synonymous variation (NSV) of protein coding genes represents raw material for selection to improve adaptation to the diverse environmental scenarios in wild and livestock populations. Many aquatic species face variations in temperature, salinity and biological factors throughout their distribution range that is reflected by the presence of allelic clines or local adaptation. The turbot (Scophthalmus maximus) is a flatfish of great commercial value with a flourishing aquaculture which has promoted the development of genomic resources. In this study, we developed the first atlas of NSVs in the turbot genome by resequencing 10 individuals from Northeast Atlantic Ocean. More than 50,000 NSVs where detected in the ~ 21,500 coding genes of the turbot genome, and we selected 18 NSVs to be genotyped using a single Mass ARRAY multiplex on 13 wild populations and three turbot farms. We detected signals of divergent selection on several genes related to growth, circadian rhythms, osmoregulation and oxygen binding in the different scenarios evaluated. Furthermore, we explored the impact of NSVs identified on the 3D structure and functional relationship of the correspondent proteins. In summary, our study provides a strategy to identify NSVs in species with consistently annotated and assembled genomes to ascertain their role in adaptation.
Collapse
|
52
|
Manavi F, Sharma A, Sharma R, Tsunoda T, Shatabda S, Dehzangi I. CNN-Pred: Prediction of single-stranded and double-stranded DNA-binding protein using convolutional neural networks. Gene X 2023; 853:147045. [PMID: 36503892 DOI: 10.1016/j.gene.2022.147045] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 10/10/2022] [Accepted: 11/08/2022] [Indexed: 11/27/2022] Open
Abstract
DNA-binding proteins play a vital role in biological activity including DNA replication, DNA packing, and DNA reparation. DNA-binding proteins can be classified into single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Determining whether a protein is DSB or SSB helps determine the protein's function. Therefore, many studies have been conducted to accurately identify DSB and SSB in recent years. Despite all the efforts have been made so far, the DSB and SSB prediction performance remains limited. In this study, we propose a new method called CNN-Pred to accurately predict DSB and SSB. To build CNN-Pred, we first extract evolutionary-based features in the form of mono-gram and bi-gram profiles using position specific scoring matrix (PSSM). We then, use 1D-convolutional neural network (CNN) as the classifier to our extracted features. Our results demonstrate that CNN-Pred can enhance the DSB and SSB prediction accuracies by more than 4%, on the independent test compared to previous studies found in the literature. CNN-pred as a standalone tool and all its source codes are publicly available at: https://github.com/MLBC-lab/CNN-Pred.
Collapse
Affiliation(s)
- Farnoush Manavi
- Computer Science and Engineering and Information Technology Department, Shiraz University, Shiraz, Iran
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD 4111, Australia
| | - Ronesh Sharma
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo 113-0033, Japan; Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 113-0033, Japan
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, USA
| |
Collapse
|
53
|
Gogoi CR, Rahman A, Saikia B, Baruah A. Protein Dihedral Angle Prediction: The State of the Art. ChemistrySelect 2023. [DOI: 10.1002/slct.202203427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | - Aziza Rahman
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Bondeepa Saikia
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Anupaul Baruah
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| |
Collapse
|
54
|
Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023; 8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R 2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
| | - Hilal Tayara
- School
of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea
| | - Kil To Chong
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
- Advanced
Electronics and Information Research Center, Jeonbuk National University, Jeonju54896, South Korea
| |
Collapse
|
55
|
Liu Y, Zhang R, Li T, Jiang J, Ma J, Wang P. MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction. J Mol Graph Model 2023; 118:108344. [PMID: 36242862 DOI: 10.1016/j.jmgm.2022.108344] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/28/2022]
Abstract
Molecular property prediction is a significant task in drug discovery. Most deep learning-based computational methods either develop unique chemical representation or combine complex model. However, researchers are less concerned with the possible advantages of enormous quantities of unlabeled molecular data. Since the obvious limited amount of labeled data available, this task becomes more difficult. In some senses, SMILES of the drug molecule may be regarded of as a language for chemistry, taking inspiration from natural language processing research and current advances in pretrained models. In this paper, we incorporated Rotary Position Embedding(RoPE) efficiently encode the position information of SMILES sequences, ultimately enhancing the capability of the BERT pretrained model to extract potential molecular substructure information for molecular property prediction. We proposed the MolRoPE-BERT framework, an new end-to-end deep learning framework that integrates an efficient position coding approach for capturing sequence position information with a pretrained BERT model for molecular property prediction. To generate useful molecular substructure embeddings, we first exclusively train the MolRoPE-BERT on four million unlabeled drug SMILES(i.e., ZINC 15 and ChEMBL 27). Then, we conduct a series of experiments to evaluate the performance of our proposed MolRoPE-BERT on four well-studied datasets. Compared with conventional and state-of-the-art baselines, our experiment demonstrated comparable or superior performance.
Collapse
Affiliation(s)
- Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Jing Jiang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Ping Wang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| |
Collapse
|
56
|
Shea A, Bartz J, Zhang L, Dong X. Predicting mutational function using machine learning. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 791:108457. [PMID: 36965820 PMCID: PMC10239318 DOI: 10.1016/j.mrrev.2023.108457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/11/2023] [Accepted: 03/20/2023] [Indexed: 03/27/2023]
Abstract
Genetic variations are one of the major causes of phenotypic variations between human individuals. Although beneficial as being the substrate of evolution, germline mutations may cause diseases, including Mendelian diseases and complex diseases such as diabetes and heart diseases. Mutations occurring in somatic cells are a main cause of cancer and likely cause age-related phenotypes and other age-related diseases. Because of the high abundance of genetic variations in the human genome, i.e., millions of germline variations per human subject and thousands of additional somatic mutations per cell, it is technically challenging to experimentally verify the function of every possible mutation and their interactions. Significant progress has been made to solve this problem using computational approaches, especially machine learning (ML). Here, we review the progress and achievements made in recent years in this field of research. We classify the computational models in two ways: one according to their prediction goals including protein structural alterations, gene expression changes, and disease risks, and the other according to their methodologies, including non-machine learning methods, classical machine learning methods, and deep neural network methods. For models in each category, we discuss their architecture, prediction accuracy, and potential limitations. This review provides new insights into the applications and future directions of computational approaches in understanding the role of mutations in aging and disease.
Collapse
Affiliation(s)
- Anthony Shea
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Josh Bartz
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA; Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lei Zhang
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiao Dong
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
57
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
58
|
Choi YN, Cho N, Lee K, Gwon DA, Lee JW, Lee J. Programmable Synthesis of Biobased Materials Using Cell-Free Systems. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2203433. [PMID: 36108274 DOI: 10.1002/adma.202203433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 08/26/2022] [Indexed: 06/15/2023]
Abstract
Motivated by the intricate mechanisms underlying biomolecule syntheses in cells that chemistry is currently unable to mimic, researchers have harnessed biological systems for manufacturing novel materials. Cell-free systems (CFSs) utilizing the bioactivity of transcriptional and translational machineries in vitro are excellent tools that allow supplementation of exogenous materials for production of innovative materials beyond the capability of natural biological systems. Herein, recent studies that have advanced the ability to expand the scope of biobased materials using CFS are summarized and approaches enabling the production of high-value materials, prototyping of genetic parts and modules, and biofunctionalization are discussed. By extending the reach of chemical and enzymatic reactions complementary to cellular materials, CFSs provide new opportunities at the interface of materials science and synthetic biology.
Collapse
Affiliation(s)
- Yun-Nam Choi
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Namjin Cho
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Kanghun Lee
- School of Interdisciplinary Bioscience and Bioengineering (I-Bio), Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Da-Ae Gwon
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Jeong Wook Lee
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
- School of Interdisciplinary Bioscience and Bioengineering (I-Bio), Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Joongoo Lee
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
- School of Interdisciplinary Bioscience and Bioengineering (I-Bio), Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| |
Collapse
|
59
|
Gonzales Martinez R, van Dongen DM. Deep learning algorithms for the early detection of breast cancer: A comparative study with traditional machine learning. INFORMATICS IN MEDICINE UNLOCKED 2023; 41:101317. [DOI: 10.1016/j.imu.2023.101317] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2025] Open
|
60
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
61
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
62
|
Kaushik R, Zhang KY. An Integrated Protein Structure Fitness Scoring Approach for Identifying Native-Like Model Structures. Comput Struct Biotechnol J 2022; 20:6467-6472. [DOI: 10.1016/j.csbj.2022.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/14/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
|
63
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
64
|
Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 173:72-82. [PMID: 35588858 DOI: 10.1016/j.pbiomolbio.2022.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 04/16/2022] [Accepted: 05/11/2022] [Indexed: 11/17/2022]
Abstract
Because of the increase in different types of diseases in human habitats, demands for designing various types of drugs are also increasing. Protein and its structure play a very important role in drug design. Therefore researchers from different areas like mathematics, medicines, and computer science are teaming up for getting better solutions in the said field. In this paper, we have discussed different methods of secondary and tertiary protein structure prediction (PSP), along with the limitations of different approaches. Different types of datasets used in PSP are also discussed here. This paper also tells about different performance measures to evaluate the prediction accuracy of PSP methods. Different software's/servers are available for download, which are used to find the protein structures for the input protein sequence. These softwares will also help to compare the performance of any new algorithm with other available methods. Details of those softwares are also mentioned in this paper.
Collapse
Affiliation(s)
| | - A S Mokhade
- Visvesvaraya National Institute of Technology, Nagpur, India
| |
Collapse
|
65
|
Kekenes-Huskey PM, Burgess DE, Sun B, Bartos DC, Rozmus ER, Anderson CL, January CT, Eckhardt LL, Delisle BP. Mutation-Specific Differences in Kv7.1 ( KCNQ1) and Kv11.1 ( KCNH2) Channel Dysfunction and Long QT Syndrome Phenotypes. Int J Mol Sci 2022; 23:7389. [PMID: 35806392 PMCID: PMC9266926 DOI: 10.3390/ijms23137389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 06/22/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
The electrocardiogram (ECG) empowered clinician scientists to measure the electrical activity of the heart noninvasively to identify arrhythmias and heart disease. Shortly after the standardization of the 12-lead ECG for the diagnosis of heart disease, several families with autosomal recessive (Jervell and Lange-Nielsen Syndrome) and dominant (Romano-Ward Syndrome) forms of long QT syndrome (LQTS) were identified. An abnormally long heart rate-corrected QT-interval was established as a biomarker for the risk of sudden cardiac death. Since then, the International LQTS Registry was established; a phenotypic scoring system to identify LQTS patients was developed; the major genes that associate with typical forms of LQTS were identified; and guidelines for the successful management of patients advanced. In this review, we discuss the molecular and cellular mechanisms for LQTS associated with missense variants in KCNQ1 (LQT1) and KCNH2 (LQT2). We move beyond the "benign" to a "pathogenic" binary classification scheme for different KCNQ1 and KCNH2 missense variants and discuss gene- and mutation-specific differences in K+ channel dysfunction, which can predispose people to distinct clinical phenotypes (e.g., concealed, pleiotropic, severe, etc.). We conclude by discussing the emerging computational structural modeling strategies that will distinguish between dysfunctional subtypes of KCNQ1 and KCNH2 variants, with the goal of realizing a layered precision medicine approach focused on individuals.
Collapse
Affiliation(s)
- Peter M. Kekenes-Huskey
- Department of Cell and Molecular Physiology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL 60153, USA
| | - Don E. Burgess
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Bin Sun
- Department of Pharmacology, Harbin Medical University, Harbin 150081, China;
| | | | - Ezekiel R. Rozmus
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Corey L. Anderson
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Craig T. January
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Lee L. Eckhardt
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Brian P. Delisle
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| |
Collapse
|
66
|
|
67
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|
68
|
Bajaj P, Manjunath K, Varadarajan R. Structural and functional determinants inferred from deep mutational scans. Protein Sci 2022; 31:e4357. [PMID: 35762712 PMCID: PMC9202547 DOI: 10.1002/pro.4357] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/04/2022] [Accepted: 05/11/2022] [Indexed: 11/08/2022]
Abstract
Mutations that affect protein binding to a cognate partner primarily occur either at buried residues or at exposed residues directly involved in partner binding. Distinguishing between these two categories based solely on mutational phenotypes is challenging. The bacterial toxin CcdB kills cells by binding to DNA Gyrase. Cell death is prevented by binding to its cognate antitoxin CcdA, at an extended interface that partially overlaps with the GyrA binding site. Using the CcdAB toxin-antitoxin (TA) system as a model, a comprehensive site-saturation mutagenesis library of CcdB was generated in its native operonic context. The mutational sensitivity of each mutant was estimated by evaluating the relative abundance of each mutant in two strains, one resistant and the other sensitive to the toxic activity of the CcdB toxin, through deep sequencing. The ability to bind CcdA was inferred through a RelE reporter gene assay, since the CcdAB complex binds to its own promoter, repressing transcription. By analyzing mutant phenotypes in the CcdB-sensitive, CcdB-resistant, and RelE reporter strains, it was possible to assign residues to buried, CcdA interacting or GyrA interacting sites. A few mutants were individually constructed, expressed, and biophysically characterized to validate molecular mechanisms responsible for the observed phenotypes. Residues inferred to be important for antitoxin binding, are also likely to be important for rejuvenating CcdB from the CcdB-Gyrase complex. Therefore, even in the absence of structural information, when coupled to appropriate genetic screens, such high-throughput strategies can be deployed for predicting structural and functional determinants of proteins.
Collapse
Affiliation(s)
- Priyanka Bajaj
- Molecular Biophysics UnitIndian Institute of ScienceBangaloreIndia
| | - Kavyashree Manjunath
- Centre for Chemical Biology and TherapeuticsInstitute for Stem Cell Science and Regenerative MedicineBangaloreIndia
| | | |
Collapse
|
69
|
Application of Deep Learning Models and Network Method for Comprehensive Air-Quality Index Prediction. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Accurate pollutant prediction is essential in fields such as meteorology, meteorological disasters, and climate change studies. In this study, long short-term memory (LSTM) and deep neural network (DNN) models were applied to six pollutants and comprehensive air-quality index (CAI) predictions from 2015 to 2020 in Korea. In addition, we used the network method to find the best data sources that provide factors affecting comprehensive air-quality index behaviors. This study had two steps: (1) predicting the six pollutants, including fine dust (PM10), fine particulate matter (PM2.5), ozone (O3), sulfurous acid gas (SO2), nitrogen dioxide (NO2), and carbon monoxide (CO) using the LSTM model; (2) forecasting the CAI using the six predicted pollutants in the first step as predictors of DNNs. The predictive ability of each model for the six pollutants and CAI prediction was evaluated by comparing it with the observed air-quality data. This study showed that combining a DNN model with the network method provided a high predictive power, and this combination could be a remarkable strength in CAI prediction. As the need for disaster management increases, it is anticipated that the LSTM and DNN models with the network method have ample potential to track the dynamics of air pollution behaviors.
Collapse
|
70
|
Molina RS, Rix G, Mengiste AA, Alvarez B, Seo D, Chen H, Hurtado J, Zhang Q, Donato García-García J, Heins ZJ, Almhjell PJ, Arnold FH, Khalil AS, Hanson AD, Dueber JE, Schaffer DV, Chen F, Kim S, Ángel Fernández L, Shoulders MD, Liu CC. In vivo hypermutation and continuous evolution. NATURE REVIEWS. METHODS PRIMERS 2022; 2:37. [PMID: 37073402 PMCID: PMC10108624 DOI: 10.1038/s43586-022-00130-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Rosana S. Molina
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
| | - Gordon Rix
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
| | - Amanuella A. Mengiste
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Beatriz Alvarez
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Daeje Seo
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Haiqi Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Juan Hurtado
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Qiong Zhang
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Jorge Donato García-García
- Tecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo Mexico, C.P. 45138, Zapopan, Jalisco, Mexico
| | - Zachary J. Heins
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Patrick J. Almhjell
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Frances H. Arnold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ahmad S. Khalil
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA
| | - Andrew D. Hanson
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - John E. Dueber
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Biological Systems & Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David V. Schaffer
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Innovative Genomics Institute, University of California Berkeley and San Francisco, Berkeley, CA, USA
- Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seokhee Kim
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Luis Ángel Fernández
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-CSIC), Darwin 3, Campus UAM Cantoblanco, 28049 Madrid, Spain
| | - Matthew D. Shoulders
- Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Chang C. Liu
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
- Department of Chemistry, University of California, Irvine, CA 92617, USA
| |
Collapse
|
71
|
Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS OMEGA 2022; 7:15695-15710. [PMID: 35571767 PMCID: PMC9096921 DOI: 10.1021/acsomega.2c00642] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/11/2022] [Indexed: 05/17/2023]
Abstract
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
Collapse
Affiliation(s)
- Gihan Panapitiya
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Michael Girard
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Aaron Hollas
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Jonathan Sepulveda
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | | | - Wei Wang
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Emily Saldanha
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
72
|
Eliasof M, Boesen T, Haber E, Keasar C, Treister E. Mimetic Neural Networks: A Unified Framework for Protein Design and Folding. FRONTIERS IN BIOINFORMATICS 2022; 2:715006. [PMID: 36304270 PMCID: PMC9580911 DOI: 10.3389/fbinf.2022.715006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 03/29/2022] [Indexed: 03/30/2025] Open
Abstract
Recent advancements in machine learning techniques for protein structure prediction motivate better results in its inverse problem-protein design. In this work we introduce a new graph mimetic neural network, MimNet, and show that it is possible to build a reversible architecture that solves the structure and design problems in tandem, allowing to improve protein backbone design when the structure is better estimated. We use the ProteinNet data set and show that the state of the art results in protein design can be met and even improved, given recent architectures for protein folding.
Collapse
Affiliation(s)
- Moshe Eliasof
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Tue Boesen
- Department of EOAS, The University of British Columbia, Vancouver, BC, Canada
| | - Eldad Haber
- Department of EOAS, The University of British Columbia, Vancouver, BC, Canada
| | - Chen Keasar
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Eran Treister
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
73
|
Abstract
Three-dimensional protein structural data at the molecular level are pivotal for successful precision medicine. Such data are crucial not only for discovering drugs that act to block the active site of the target mutant protein but also for clarifying to the patient and the clinician how the mutations harbored by the patient work. The relative paucity of structural data reflects their cost, challenges in their interpretation, and lack of clinical guidelines for their utilization. Rapid technological advancements in experimental high-resolution structural determination increasingly generate structures. Computationally, modeling algorithms, including molecular dynamics simulations, are becoming more powerful, as are compute-intensive hardware, particularly graphics processing units, overlapping with the inception of the exascale era. Accessible, freely available, and detailed structural and dynamical data can be merged with big data to powerfully transform personalized pharmacology. Here we review protein and emerging genome high-resolution data, along with means, applications, and examples underscoring their usefulness in precision medicine. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA; .,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Guy Nir
- Department of Biochemistry and Molecular Biology, Department of Neuroscience, Cell Biology and Anatomy, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA.,Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
74
|
Nair S, Shrikumar A, Schreiber J, Kundaje A. fastISM: performant in silico saturation mutagenesis for convolutional neural networks. Bioinformatics 2022; 38:2397-2403. [PMID: 35238376 PMCID: PMC9048647 DOI: 10.1093/bioinformatics/btac135] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 02/09/2022] [Accepted: 03/01/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Deep-learning models, such as convolutional neural networks, are able to accurately map biological sequences to associated functional readouts and properties by learning predictive de novo representations. In silico saturation mutagenesis (ISM) is a popular feature attribution technique for inferring contributions of all characters in an input sequence to the model's predicted output. The main drawback of ISM is its runtime, as it involves multiple forward propagations of all possible mutations of each character in the input sequence through the trained model to predict the effects on the output. RESULTS We present fastISM, an algorithm that speeds up ISM by a factor of over 10× for commonly used convolutional neural network architectures. fastISM is based on the observations that the majority of computation in ISM is spent in convolutional layers, and a single mutation only disrupts a limited region of intermediate layers, rendering most computation redundant. fastISM reduces the gap between backpropagation-based feature attribution methods and ISM. It far surpasses the runtime of backpropagation-based methods on multi-output architectures, making it feasible to run ISM on a large number of sequences. AVAILABILITY AND IMPLEMENTATION An easy-to-use Keras/TensorFlow 2 implementation of fastISM is available at https://github.com/kundajelab/fastISM. fastISM can be installed using pip install fastism. A hands-on tutorial can be found at https://colab.research.google.com/github/kundajelab/fastISM/blob/master/notebooks/colab/DeepSEA.ipynb. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Jacob Schreiber
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
75
|
Gu J, Zhang T, Wu C, Liang Y, Shi X. Refined Contact Map Prediction of Peptides Based on GCN and ResNet. Front Genet 2022; 13:859626. [PMID: 35571037 PMCID: PMC9092020 DOI: 10.3389/fgene.2022.859626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
Predicting peptide inter-residue contact maps plays an important role in computational biology, which determines the topology of the peptide structure. However, due to the limited number of known homologous structures, there is still much room for inter-residue contact map prediction. Current models are not sufficient for capturing the high accuracy relationship between the residues, especially for those with a long-range distance. In this article, we developed a novel deep neural network framework to refine the rough contact map produced by the existing methods. The rough contact map is used to construct the residue graph that is processed by the graph convolutional neural network (GCN). GCN can better capture the global information and is therefore used to grasp the long-range contact relationship. The residual convolutional neural network is also applied in the framework for learning local information. We conducted the experiments on four different test datasets, and the inter-residue long-range contact map prediction accuracy demonstrates the effectiveness of our proposed method.
Collapse
Affiliation(s)
- Jiawei Gu
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Tianhao Zhang
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Chunguo Wu
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
| | - Yanchun Liang
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
| | - Xiaohu Shi
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
- *Correspondence: Xiaohu Shi,
| |
Collapse
|
76
|
Vishwakarma P, Vattekatte AM, Shinada N, Diharce J, Martins C, Cadet F, Gardebien F, Etchebest C, Nadaradjane AA, de Brevern AG. V HH Structural Modelling Approaches: A Critical Review. Int J Mol Sci 2022; 23:3721. [PMID: 35409081 PMCID: PMC8998791 DOI: 10.3390/ijms23073721] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/20/2022] Open
Abstract
VHH, i.e., VH domains of camelid single-chain antibodies, are very promising therapeutic agents due to their significant physicochemical advantages compared to classical mammalian antibodies. The number of experimentally solved VHH structures has significantly improved recently, which is of great help, because it offers the ability to directly work on 3D structures to humanise or improve them. Unfortunately, most VHHs do not have 3D structures. Thus, it is essential to find alternative ways to get structural information. The methods of structure prediction from the primary amino acid sequence appear essential to bypass this limitation. This review presents the most extensive overview of structure prediction methods applied for the 3D modelling of a given VHH sequence (a total of 21). Besides the historical overview, it aims at showing how model software programs have been shaping the structural predictions of VHHs. A brief explanation of each methodology is supplied, and pertinent examples of their usage are provided. Finally, we present a structure prediction case study of a recently solved VHH structure. According to some recent studies and the present analysis, AlphaFold 2 and NanoNet appear to be the best tools to predict a structural model of VHH from its sequence.
Collapse
Affiliation(s)
- Poonam Vishwakarma
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Akhila Melarkode Vattekatte
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | | | - Julien Diharce
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
| | - Carla Martins
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Frédéric Cadet
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
- PEACCEL, Artificial Intelligence Department, Square Albin Cachot, F-75013 Paris, France
| | - Fabrice Gardebien
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Catherine Etchebest
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
| | - Aravindan Arun Nadaradjane
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Alexandre G. de Brevern
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| |
Collapse
|
77
|
Choudhury C, Arul Murugan N, Deva Priyakumar U. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov Today 2022; 27:1847-1861. [PMID: 35301148 PMCID: PMC8920090 DOI: 10.1016/j.drudis.2022.03.006] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 02/16/2022] [Accepted: 03/10/2022] [Indexed: 02/08/2023]
Abstract
The current global health emergency in the form of the Coronavirus 2019 (COVID-19) pandemic has highlighted the need for fast, accurate, and efficient drug discovery pipelines. Traditional drug discovery projects relying on in vitro high-throughput screening (HTS) involve large investments and sophisticated experimental set-ups, affordable only to big biopharmaceutical companies. In this scenario, application of efficient state-of-the-art computational methods and modern artificial intelligence (AI)-based algorithms for rapid screening of repurposable chemical space [approved drugs and natural products (NPs) with proven pharmacokinetic profiles] to identify the initial leads is a powerful option to save resources and time. Structure-based drug repurposing is a popular in silico repurposing approach. In this review, we discuss traditional and modern AI-based computational methods and tools applied at various stages for structure-based drug discovery (SBDD) pipelines. Additionally, we highlight the role of generative models in generating molecules with scaffolds from repurposable chemical space. Teaser: This review highlights the importance of repurposable chemical space, and the contributions of conventional in silico approaches and modern machine-learning algorithms for rapid structure-based drug repurposing.
Collapse
Affiliation(s)
- Chinmayee Choudhury
- Department of Experimental Medicine and Biotechnology, Postgraduate Institute of Medical Education and Research, Sector-12, Chandigarh 160012, India
| | - N Arul Murugan
- Department of Computer Science, School of Electrical Engineering and Computer Sciences, KTH Royal Institute of Technology, S-100 44, Stockholm, Sweden; Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi 110020, India.
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
78
|
Yu CH, Chen W, Chiang YH, Guo K, Martin Moldes Z, Kaplan DL, Buehler MJ. End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins. ACS Biomater Sci Eng 2022; 8:1156-1165. [PMID: 35129957 PMCID: PMC9347213 DOI: 10.1021/acsbiomaterials.1c01343] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins' function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing proteins with targeted material functions, especially mechanical properties. Using convolutional and recurrent architectures and natural language models, our deep learning model predicts the content of two essential types of secondary structures, the α-helix and the β-sheet. The training data are collected from the Protein Data Bank and contain many existing protein geometries. We find that our model can learn the hidden features as patterns of input sequences that can then be directly related to secondary structure content. The α-helix and β-sheet content predictions show excellent agreement with training data and newly deposited protein structures that were recently identified and that were not included in the original training set. We further demonstrate the features of the model by a search for de novo protein sequences that optimize max/min α-helix/β-sheet content and compare the predictions with folded models of these sequences based on AlphaFold2. Excellent agreement is found, underscoring that our model has predictive potential for rapidly designing proteins with specific secondary structures and could be widely applied to biomedical industries, including protein biomaterial designs and regenerative medicine applications.
Collapse
Affiliation(s)
- Chi-Hua Yu
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Department of Engineering Science, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Wei Chen
- Department of Engineering Science, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Yu-Hsuan Chiang
- Department of Civil Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - Kai Guo
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Zaira Martin Moldes
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - David L Kaplan
- Department of Biomedical Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.,Center for Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
79
|
Zhao B, Kurgan L. Deep learning in prediction of intrinsic disorder in proteins. Comput Struct Biotechnol J 2022; 20:1286-1294. [PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder prediction is an active area that has developed over 100 predictors. We identify and investigate a recent trend towards the development of deep neural network (DNN)-based methods. The first DNN-based method was released in 2013 and since 2019 deep learners account for majority of the new disorder predictors. We find that the 13 currently available DNN-based predictors are diverse in their topologies, sizes of their networks and the inputs that they utilize. We empirically show that the deep learners are statistically more accurate than other types of disorder predictors using the blind test dataset from the recent community assessment of intrinsic disorder predictions (CAID). We also identify several well-rounded DNN-based predictors that are accurate, fast and/or conveniently available. The popularity, favorable predictive performance and architectural flexibility suggest that deep networks are likely to fuel the development of future disordered predictors. Novel hybrid designs of deep networks could be used to adequately accommodate for diversity of types and flavors of intrinsic disorder. We also discuss scarcity of the DNN-based methods for the prediction of disordered binding regions and the need to develop more accurate methods for this prediction.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
80
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
81
|
Saleh RO, Essia INA, Jasim SA. The Anticancer Effect of a Conjugated Antimicrobial Peptide Against Colorectal Cancer (CRC) Cells. J Gastrointest Cancer 2022; 54:165-170. [PMID: 35217999 DOI: 10.1007/s12029-021-00799-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/30/2021] [Indexed: 01/05/2023]
Abstract
PURPOSE Although antimicrobial peptides (AMPs) were initially known as compounds of the innate immune system to fight microbial pathogens, it has been recently proposed that differences in normal and cancer cell membranes cause the anticancer effect of these peptides. The aim of this study was to evaluate the anticancer effect of MELITININ+BMAP27-conjugated peptide against colorectal cancer (CRC) cells. METHODS The MELITININ+BMAP27-conjugated peptides were designed and the β-naphthylalanine residues were added to the termini to improve the anticancer effect. CRC cancer cell lines including HT29, SW742, HCT-116, and WiDr were used. After preparing concentrations of 5, 10, 25, 50, 100, 150, 200, and 400 μg/mL of peptide solution, the rate of cell death after 12, 24, and 48 h was assessed using MTT test. After confirmation of the 30 µg/mL efficacy and nontoxic concentration, the cells were exposed to this concentration, and the total RNA was extracted. The quantitative real-time PCR (RT-qPCR) technique was performed for the amplification of Bax, caspase3, atg5, and GAPDH (glyceraldehyde 3-phosphate dehydrogenase as the internal control) genes. RESULTS The cytotoxicity of peptide against normal cells exhibited that the IC50 at 24 and 4 h included 80 and 100 µg/mL, respectively. After 24-72 h of treatment, a significant difference in the mean percentage of CRC living cells was observed at concentrations of 50-400 μg/mL of conjugated peptide (p < 0.05). The IC50 of the peptide at 24, 48, and 72 h of exposure was measured as 30, 20, and 10 μg/mL, respectively. The peptide resulted in a significant increase of 2.35-fold in the mean expression of Bax gene in CRC cells (p < 0.001). It also caused a significant increase of 1.75 times (p = 0.0112) of caspase 3 gene and 1.2 times (p = 0.0217) of atg5 gene. There was no significant difference among cell lines regarding the expression of each gene. CONCLUSION The conjugated peptide caused the death of CRC lines via induction of the apoptosis and necrosis mechanisms. More studies are needed in this regard.
Collapse
Affiliation(s)
- Raed Obaid Saleh
- Department of Pharmacy, Al-Maarif University College, Ramadi City, Al-Anbar, Iraq
| | | | | |
Collapse
|
82
|
Cheng J, Xu Y, Zhao Y. Prediction of protein secondary structure based on deep residual convolutional neural network. BIOTECHNOL BIOTEC EQ 2022. [DOI: 10.1080/13102818.2022.2026815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Affiliation(s)
- Jinyong Cheng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, PR China
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| | - Ying Xu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| | - Yunxiang Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| |
Collapse
|
83
|
Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
84
|
Fischer S, Stegmann F, Gnanapragassam VS, Lepenies B. From structure to function – Ligand recognition by myeloid C-type lectin receptors. Comput Struct Biotechnol J 2022; 20:5790-5812. [DOI: 10.1016/j.csbj.2022.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/14/2022] [Accepted: 10/14/2022] [Indexed: 11/29/2022] Open
|
85
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
86
|
Yin Z, Wong STC. Artificial intelligence unifies knowledge and actions in drug repositioning. Emerg Top Life Sci 2021; 5:803-813. [PMID: 34881780 PMCID: PMC8923082 DOI: 10.1042/etls20210223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 11/08/2021] [Accepted: 11/09/2021] [Indexed: 11/17/2022]
Abstract
Drug repositioning aims to reuse existing drugs, shelved drugs, or drug candidates that failed clinical trials for other medical indications. Its attraction is sprung from the reduction in risk associated with safety testing of new medications and the time to get a known drug into the clinics. Artificial Intelligence (AI) has been recently pursued to speed up drug repositioning and discovery. The essence of AI in drug repositioning is to unify the knowledge and actions, i.e. incorporating real-world and experimental data to map out the best way forward to identify effective therapeutics against a disease. In this review, we share positive expectations for the evolution of AI and drug repositioning and summarize the role of AI in several methods of drug repositioning.
Collapse
Affiliation(s)
- Zheng Yin
- Department of Systems Medicine and Bioengineering, Houston Methodist Cancer Center and Ting Tsung & Wei Fong Chao Center for BRAIN, Houston Methodist Research Institute, Weill Cornell Medicine, Houston, TX 77030, U.S.A
| | - Stephen T C Wong
- Department of Systems Medicine and Bioengineering, Houston Methodist Cancer Center and Ting Tsung & Wei Fong Chao Center for BRAIN, Houston Methodist Research Institute, Weill Cornell Medicine, Houston, TX 77030, U.S.A
| |
Collapse
|
87
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
88
|
Gelman S, Fahlberg SA, Heinzelman P, Romero PA, Gitter A. Neural networks to learn protein sequence-function relationships from deep mutational scanning data. Proc Natl Acad Sci U S A 2021; 118:e2104878118. [PMID: 34815338 PMCID: PMC8640744 DOI: 10.1073/pnas.2104878118] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/01/2021] [Indexed: 11/18/2022] Open
Abstract
The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706
- Morgridge Institute for Research, Madison, WI 53715
| | - Sarah A. Fahlberg
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Pete Heinzelman
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Philip A. Romero
- Department of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI 53706
- Morgridge Institute for Research, Madison, WI 53715
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53792
| |
Collapse
|
89
|
Timmons PB, Hewage CM. APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Brief Bioinform 2021; 22:bbab308. [PMID: 34396417 PMCID: PMC8575040 DOI: 10.1093/bib/bbab308] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/29/2023] Open
Abstract
Good knowledge of a peptide's tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5-40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
90
|
Defresne M, Barbe S, Schiex T. Protein Design with Deep Learning. Int J Mol Sci 2021; 22:11741. [PMID: 34769173 PMCID: PMC8584038 DOI: 10.3390/ijms222111741] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 12/21/2022] Open
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.
Collapse
Affiliation(s)
- Marianne Defresne
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| |
Collapse
|
91
|
Walker CC, Meek GA, Fobe TL, Shirts MR. Using a Coarse-Grained Modeling Framework to Identify Oligomeric Motifs with Tunable Secondary Structure. J Chem Theory Comput 2021; 17:6018-6035. [PMID: 34495659 DOI: 10.1021/acs.jctc.1c00528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Coarse-grained modeling can be used to explore general theories that are independent of specific chemical detail. In this paper, we present cg_openmm, a Python-based simulation framework for modeling coarse-grained hetero-oligomers and screening them for structural and thermodynamic characteristics of cooperative secondary structures. cg_openmm facilitates the building of coarse-grained topology and random starting configurations, setup of GPU-accelerated replica exchange molecular dynamics simulations with the OpenMM software package, and features a suite of postprocessing thermodynamic and structural analysis tools. In particular, native contact analysis, heat capacity calculations, and free energy of folding calculations are used to identify and characterize cooperative folding transitions and stable secondary structures. In this work, we demonstrate the capabilities of cg_openmm on a simple 1-1 Lennard-Jones coarse-grained model, in which each residue contains 1 backbone and 1 side-chain bead. By scanning both nonbonded and bonded force-field parameter spaces at the coarse-grained level, we identify and characterize sets of parameters which result in the formation of stable helices through cooperative folding transitions. Moreover, we show that the geometries and stabilities of these helices can be tuned by manipulating the force-field parameters.
Collapse
Affiliation(s)
- Christopher C Walker
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Garrett A Meek
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Theodore L Fobe
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
92
|
AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features. Curr Issues Mol Biol 2021; 43:1489-1501. [PMID: 34698113 PMCID: PMC8928959 DOI: 10.3390/cimb43030105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/16/2022] Open
Abstract
It is of utmost importance to develop a computational method for accurate prediction of antioxidants, as they play a vital role in the prevention of several diseases caused by oxidative stress. In this correspondence, we present an effective computational methodology based on the notion of deep latent space encoding. A deep neural network classifier fused with an auto-encoder learns class labels in a pruned latent space. This strategy has eliminated the need to separately develop classifier and the feature selection model, allowing the standalone model to effectively harness discriminating feature space and perform improved predictions. A thorough analytical study has been presented alongwith the PCA/tSNE visualization and PCA-GCNR scores to show the discriminating power of the proposed method. The proposed method showed a high MCC value of 0.43 and a balanced accuracy of 76.2%, which is superior to the existing models. The model has been evaluated on an independent dataset during which it outperformed the contemporary methods by correctly identifying the novel proteins with an accuracy of 95%.
Collapse
|
93
|
Robson B. Testing machine learning techniques for general application by using protein secondary structure prediction. A brief survey with studies of pitfalls and benefits using a simple progressive learning approach. Comput Biol Med 2021; 138:104883. [PMID: 34598067 DOI: 10.1016/j.compbiomed.2021.104883] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/05/2021] [Accepted: 09/17/2021] [Indexed: 01/05/2023]
Abstract
Many researchers have recently used the prediction of protein secondary structure (local conformational states of amino acid residues) to test advances in predictive and machine learning technology such as Neural Net Deep Learning. Protein secondary structure prediction continues to be a helpful tool in research in biomedicine and the life sciences, but it is also extremely enticing for testing predictive methods such as neural nets that are intended for different or more general purposes. A complication is highlighted here for researchers testing their methods for other applications. Modern protein databases inevitably contain important clues to the answer, so-called "strong buried clues", though often obscurely; they are hard to avoid. This is because most proteins or parts of proteins in a modern protein data base are related to others by biological evolution. For researchers developing machine learning and predictive methods, this can overstate and so confuse understanding of the true quality of a predictive method. However, for researchers using the algorithms as tools, understanding strong buried clues is of great value, because they need to make maximum use of all information available. A simple method related to the GOR methods but with some features of neural nets in the sense of progressive learning of large numbers of weights, is used to explore this. It can acquire tens of millions and hence gigabytes of weights, but they are learned stably by exhaustive sampling. The significance of the findings is discussed in the light of promising recent results from AlphaFold using Google's DeepMind.
Collapse
Affiliation(s)
- Barry Robson
- Ingine Inc. Ohio, USA and the Dirac Foundation Oxfordshire, UK.
| |
Collapse
|
94
|
Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes. Int J Mol Sci 2021; 22:ijms221810019. [PMID: 34576183 PMCID: PMC8468833 DOI: 10.3390/ijms221810019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/13/2021] [Accepted: 09/14/2021] [Indexed: 12/15/2022] Open
Abstract
Functional annotation of unknown function genes reveals unidentified functions that can enhance our understanding of complex genome communications. A common approach for inferring gene function involves the ortholog-based method. However, genetic data alone are often not enough to provide information for function annotation. Thus, integrating other sources of data can potentially increase the possibility of retrieving annotations. Network-based methods are efficient techniques for exploring interactions among genes and can be used for functional inference. In this study, we present an analysis framework for inferring the functions of Plasmodium falciparum genes based on connection profiles in a heterogeneous network between human and Plasmodium falciparum proteins. These profiles were fed into a hybrid deep learning algorithm to predict the orthologs of unknown function genes. The results show high performance of the model's predictions, with an AUC of 0.89. One hundred and twenty-one predicted pairs with high prediction scores were selected for inferring the functions using statistical enrichment analysis. Using this method, PF3D7_1248700 and PF3D7_0401800 were found to be involved with muscle contraction and striated muscle tissue development, while PF3D7_1303800 and PF3D7_1201000 were found to be related to protein dephosphorylation. In conclusion, combining a heterogeneous network and a hybrid deep learning technique can allow us to identify unknown gene functions of malaria parasites. This approach is generalized and can be applied to other diseases that enhance the field of biomedical science.
Collapse
|
95
|
Kell DB. The Transporter-Mediated Cellular Uptake and Efflux of Pharmaceutical Drugs and Biotechnology Products: How and Why Phospholipid Bilayer Transport Is Negligible in Real Biomembranes. Molecules 2021; 26:5629. [PMID: 34577099 PMCID: PMC8470029 DOI: 10.3390/molecules26185629] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 09/03/2021] [Accepted: 09/14/2021] [Indexed: 12/12/2022] Open
Abstract
Over the years, my colleagues and I have come to realise that the likelihood of pharmaceutical drugs being able to diffuse through whatever unhindered phospholipid bilayer may exist in intact biological membranes in vivo is vanishingly low. This is because (i) most real biomembranes are mostly protein, not lipid, (ii) unlike purely lipid bilayers that can form transient aqueous channels, the high concentrations of proteins serve to stop such activity, (iii) natural evolution long ago selected against transport methods that just let any undesirable products enter a cell, (iv) transporters have now been identified for all kinds of molecules (even water) that were once thought not to require them, (v) many experiments show a massive variation in the uptake of drugs between different cells, tissues, and organisms, that cannot be explained if lipid bilayer transport is significant or if efflux were the only differentiator, and (vi) many experiments that manipulate the expression level of individual transporters as an independent variable demonstrate their role in drug and nutrient uptake (including in cytotoxicity or adverse drug reactions). This makes such transporters valuable both as a means of targeting drugs (not least anti-infectives) to selected cells or tissues and also as drug targets. The same considerations apply to the exploitation of substrate uptake and product efflux transporters in biotechnology. We are also beginning to recognise that transporters are more promiscuous, and antiporter activity is much more widespread, than had been realised, and that such processes are adaptive (i.e., were selected by natural evolution). The purpose of the present review is to summarise the above, and to rehearse and update readers on recent developments. These developments lead us to retain and indeed to strengthen our contention that for transmembrane pharmaceutical drug transport "phospholipid bilayer transport is negligible".
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK;
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
- Mellizyme Biotechnology Ltd., IC1, Liverpool Science Park, Mount Pleasant, Liverpool L3 5TF, UK
| |
Collapse
|
96
|
Garagounis C, Delkis N, Papadopoulou KK. Unraveling the roles of plant specialized metabolites: using synthetic biology to design molecular biosensors. THE NEW PHYTOLOGIST 2021; 231:1338-1352. [PMID: 33997999 DOI: 10.1111/nph.17470] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/16/2021] [Indexed: 05/25/2023]
Abstract
Plants are a rich source of specialized metabolites with a broad range of bioactivities and many applications in human daily life. Over the past decades significant progress has been made in identifying many such metabolites in different plant species and in elucidating their biosynthetic pathways. However, the biological roles of plant specialized metabolites remain elusive and proposed functions lack an identified underlying molecular mechanism. Understanding the roles of specialized metabolites frequently is hampered by their dynamic production and their specific spatiotemporal accumulation within plant tissues and organs throughout a plant's life cycle. In this review, we propose the employment of strategies from the field of Synthetic Biology to construct and optimize genetically encoded biosensors that can detect individual specialized metabolites in a standardized and high-throughput manner. This will help determine the precise localization of specialized metabolites at the tissue and single-cell levels. Such information will be useful in developing complete system-level models of specialized plant metabolism, which ultimately will demonstrate how the biosynthesis of specialized metabolites is integrated with the core processes of plant growth and development.
Collapse
Affiliation(s)
- Constantine Garagounis
- Department of Biochemistry and Biotechnology, Plant and Environmental Biotechnology Laboratory, University of Thessaly, Larissa, 41500, Greece
| | - Nikolaos Delkis
- Department of Biochemistry and Biotechnology, Plant and Environmental Biotechnology Laboratory, University of Thessaly, Larissa, 41500, Greece
| | - Kalliope K Papadopoulou
- Department of Biochemistry and Biotechnology, Plant and Environmental Biotechnology Laboratory, University of Thessaly, Larissa, 41500, Greece
| |
Collapse
|
97
|
Li B, Mendenhall J, Capra JA, Meiler J. A Multitask Deep-Learning Method for Predicting Membrane Associations and Secondary Structures of Proteins. J Proteome Res 2021; 20:4089-4100. [PMID: 34236204 PMCID: PMC8650144 DOI: 10.1021/acs.jproteome.1c00410] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Prediction of residue-level structural attributes and protein-level structural classes helps model protein tertiary structures and understand protein functions. Existing methods are either specialized on only one class of proteins or developed to predict only a specific type of residue-level attribute. In this work, we develop a new deep-learning method, named Membrane Association and Secondary Structure Predictor (MASSP), for accurately predicting both residue-level structural attributes (secondary structure, location, orientation, and topology) and protein-level structural classes (bitopic, α-helical, β-barrel, and soluble). MASSP integrates a multilayer two-dimensional convolutional neural network (2D-CNN) with a long short-term memory (LSTM) neural network into a multitasking framework. Our comparison shows that MASSP performs equally well or better than the state-of-the-art methods in predicting residue-level secondary structures, boundaries of transmembrane segments, and topology. Furthermore, it achieves outstanding accuracy in predicting protein-level structural classes. MASSP automatically distinguishes the structural classes of input sequences and identifies transmembrane segments and topologies if present, making it broadly applicable to different classes of proteins. In summary, MASSP's good performance and broad applicability make it well suited for annotating residue-level attributes and protein-level structural classes at the proteome scale.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37203, United States.,Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States
| | - Jeffrey Mendenhall
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37203, United States
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143, United States
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37203, United States.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37203, United States.,Institute for Drug Discovery, University Leipzig Medical School, Leipzig 04109, Germany
| |
Collapse
|
98
|
Park J, Chang S. A Particulate Matter Concentration Prediction Model Based on Long Short-Term Memory and an Artificial Neural Network. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:6801. [PMID: 34202834 PMCID: PMC8297184 DOI: 10.3390/ijerph18136801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 01/12/2023]
Abstract
Many countries are concerned about high particulate matter (PM) concentrations caused by rapid industrial development, which can harm both human health and the environment. To manage PM, the prediction of PM concentrations based on historical data is actively being conducted. Existing technologies for predicting PM mostly assess the model performance for the prediction of existing PM concentrations; however, PM must be forecast in advance, before it becomes highly concentrated and causes damage to the citizens living in the affected regions. Thus, it is necessary to conduct research on an index that can illustrate whether the PM concentration will increase or decrease. We developed a model that can predict whether the PM concentration might increase or decrease after a certain time, specifically for PM2.5 (fine PM) generated by anthropogenic volatile organic compounds. An algorithm that can select a model on an hourly basis, based on the long short-term memory (LSTM) and artificial neural network (ANN) models, was developed. The proposed algorithm exhibited a higher F1-score than the LSTM, ANN, or random forest models alone. The model developed in this study could be used to predict future regional PM concentration levels more effectively.
Collapse
Affiliation(s)
| | - Seongju Chang
- Department of Civil and Environmental Engineering, Korea Advanced Institute of Science and Technology, Deajeon 34141, Korea;
| |
Collapse
|
99
|
Scherer M, Fleishman SJ, Jones PR, Dandekar T, Bencurova E. Computational Enzyme Engineering Pipelines for Optimized Production of Renewable Chemicals. Front Bioeng Biotechnol 2021; 9:673005. [PMID: 34211966 PMCID: PMC8239229 DOI: 10.3389/fbioe.2021.673005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 05/06/2021] [Indexed: 11/13/2022] Open
Abstract
To enable a sustainable supply of chemicals, novel biotechnological solutions are required that replace the reliance on fossil resources. One potential solution is to utilize tailored biosynthetic modules for the metabolic conversion of CO2 or organic waste to chemicals and fuel by microorganisms. Currently, it is challenging to commercialize biotechnological processes for renewable chemical biomanufacturing because of a lack of highly active and specific biocatalysts. As experimental methods to engineer biocatalysts are time- and cost-intensive, it is important to establish efficient and reliable computational tools that can speed up the identification or optimization of selective, highly active, and stable enzyme variants for utilization in the biotechnological industry. Here, we review and suggest combinations of effective state-of-the-art software and online tools available for computational enzyme engineering pipelines to optimize metabolic pathways for the biosynthesis of renewable chemicals. Using examples relevant for biotechnology, we explain the underlying principles of enzyme engineering and design and illuminate future directions for automated optimization of biocatalysts for the assembly of synthetic metabolic pathways.
Collapse
Affiliation(s)
- Marc Scherer
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Patrik R Jones
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Thomas Dandekar
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| | - Elena Bencurova
- Department of Bioinformatics, Julius-Maximilians University of Würzburg, Würzburg, Germany
| |
Collapse
|
100
|
Afify HM, Abdelhalim MB, Mabrouk MS, Sayed AY. Protein secondary structure prediction (PSSP) using different machine algorithms. EGYPTIAN JOURNAL OF MEDICAL HUMAN GENETICS 2021. [DOI: 10.1186/s43042-021-00173-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
The computational biology approach has advanced exponentially in protein secondary structure prediction (PSSP), which is vital for the pharmaceutical industry. Extracting protein structure from the laboratory has insufficient information for PSSP that is used in bioinformatics studies. In this paper, the support vector machine (SVM) model and decision tree are presented on the RS126 dataset to address the problem of PSSP. A decision tree is applied for the SVM outcome to obtain the relevant guidelines possible for PSSP. Furthermore, the number of produced rules was fairly small, and they show a greater degree of comprehensibility compared to other rules. Several of the proposed principles have compelling and relevant biological clarification.
Results
The results confirmed that the existence of a particular amino acid in a protein sequence increases the stability for the forecast of protein secondary structure. The suggested algorithm achieved 85% accuracy for the E|~E classifier.
Conclusions
The proposed rules can be very important in managing wet laboratory experiments intended at determining protein secondary structure. Lastly, future work will focus mainly on large protein datasets without overfitting and expand the amount of extracted regulations for PSSP.
Collapse
|